Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

Download

Total Articles

62
(from Reference Citation Analysis)

Article PDFs (13)

Cited by > 0 (50)

Searched Name

Donald Geman

Ranked By

Results Analysis

Year Published Analysis
Article Type Analysis
Publication Title Analysis
Category Analysis

Results Analysis

Indexed Articles

Year Published

Show more Refine

Article Type

Show more Refine

Article Statistics

Refine

MESH Headings

Show more Refine

First Author

Show more Refine

First Author Affiliations

Show more Refine

Authors

Show more Refine

Publication Titles

Show more Refine

Grant Agencies

Show more Refine

Countries/Regions

Show more Refine

Affiliations

Show more Refine

Corresponding Author Affiliations

Show more Refine

Category

Show more Refine

Number

Citation Analysis

Chattopadhyay A, Slocum S, Haeffele BD, Vidal R, Geman D. Interpretable by Design: Learning Predictors by Composing Interpretable Queries. IEEE Trans Pattern Anal Mach Intell 2023;45:7430-7443. [PMID: 36441893 DOI: 10.1109/tpami.2022.3225162] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/06/2023]

Wang M, Barker PB, Cascella NG, Coughlin JM, Nestadt G, Nucifora FC, Sedlak TW, Kelly A, Younes L, Geman D, Palaniyappan L, Sawa A, Yang K. Longitudinal changes in brain metabolites in healthy controls and patients with first episode psychosis: a 7-Tesla MRS study. Mol Psychiatry 2023;28:2018-2029. [PMID: 36732587 PMCID: PMC10394114 DOI: 10.1038/s41380-023-01969-5] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/14/2021] [Revised: 01/13/2023] [Accepted: 01/17/2023] [Indexed: 02/04/2023]

Affiliation(s)

Min Wang Russell H Morgan Department of Radiology and Radiological Science, Johns Hopkins University School of Medicine, Baltimore, MD, USA College of Biomedical Engineering and Instrument Science, Zhejiang University, Hangzhou, China
Peter B Barker Russell H Morgan Department of Radiology and Radiological Science, Johns Hopkins University School of Medicine, Baltimore, MD, USA. F. M. Kirby Research Center for Functional Brain Imaging, Kennedy Krieger Institute, Baltimore, MD, USA.
Nicola G Cascella Department of Psychiatry and Behavioral Sciences, Johns Hopkins University School of Medicine, Baltimore, MD, USA
Jennifer M Coughlin Department of Psychiatry and Behavioral Sciences, Johns Hopkins University School of Medicine, Baltimore, MD, USA
Gerald Nestadt Department of Psychiatry and Behavioral Sciences, Johns Hopkins University School of Medicine, Baltimore, MD, USA
Frederick C Nucifora Department of Psychiatry and Behavioral Sciences, Johns Hopkins University School of Medicine, Baltimore, MD, USA
Thomas W Sedlak Department of Psychiatry and Behavioral Sciences, Johns Hopkins University School of Medicine, Baltimore, MD, USA
Alexandra Kelly Department of Psychiatry and Behavioral Sciences, Johns Hopkins University School of Medicine, Baltimore, MD, USA
Laurent Younes Department of Applied Mathematics and Statistics, Johns Hopkins University, Baltimore, MD, USA
Donald Geman Department of Applied Mathematics and Statistics, Johns Hopkins University, Baltimore, MD, USA
Lena Palaniyappan Robarts Research Institution, University of Western Ontario, London, ON, Canada Department of Psychiatry, University of Western Ontario, London, ON, Canada Douglas Mental Health University Institute, Department of Psychiatry, McGill University, Montreal, QC, Canada
Akira Sawa Department of Psychiatry and Behavioral Sciences, Johns Hopkins University School of Medicine, Baltimore, MD, USA. Department of Neuroscience, Johns Hopkins University School of Medicine, Baltimore, MD, USA. Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA. Department of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, MD, USA. Department of Pharmacology and Molecular Sciences, Johns Hopkins University School of Medicine, Baltimore, MD, USA. Department of Mental Health, Johns Hopkins University Bloomberg School of Public Health, Baltimore, MD, USA.
Kun Yang Department of Psychiatry and Behavioral Sciences, Johns Hopkins University School of Medicine, Baltimore, MD, USA.

Collapse

Ji L, Wang A, Sonthalia S, Naiman DQ, Younes L, Colantuoni C, Geman D. CellCover Defines Conserved Cell Types and Temporal Progression in scRNA-seq Data across Mammalian Neocortical Development. bioRxiv 2023:2023.04.06.535943. [PMID: 37383947 PMCID: PMC10299349 DOI: 10.1101/2023.04.06.535943] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/30/2023]

Abstract

Accurate identification of cell classes across the tissues of living organisms is central in the analysis of growing atlases of single-cell RNA sequencing (scRNA-seq) data across biomedicine. Such analyses are often based on the existence of highly discriminating "marker genes" for specific cell classes which enables a deeper functional understanding of these classes as well as their identification in new, related datasets. Currently, marker genes are defined by methods that serially assess the level of differential expression (DE) of individual genes across landscapes of diverse cells. This serial approach has been extremely useful, but is limited because it ignores possible redundancy or complementarity across genes, that can only be captured by analyzing several genes at the same time. We wish to identify discriminating panels of genes. To efficiently explore the vast space of possible marker panels, leverage the large number of cells often sequenced, and overcome zero-inflation in scRNA-seq data, we propose viewing panel selection as a variation of the "minimal set-covering problem" in combinatorial optimization which can be solved with integer programming. In this formulation, the covering elements are genes, and the objects to be covered are cells of a particular class, where a cell is covered by a gene if that gene is expressed in that cell. Our method, CellCover, identifies a panel of marker genes in scRNA-seq data that covers one class of cells within a population. We apply this method to generate covering marker gene panels which characterize cells of the developing mouse neocortex as postmitotic neurons are generated from neural progenitor cells (NPCs). We show that CellCover captures cell class-specific signals distinct from those defined by DE methods and that CellCover's compact gene panels can be expanded to explore cell type specific function.Transfer learning experiments exploring these covering panels across in vivo mouse, primate, and human scRNA-seq datasets demonstrate that CellCover identifies markers of conserved cell classes in neurogenesis, as well as markers of temporal progression in the molecular identity of these cell types across development of the mammalian neocortex. The gene covering panels we identify across cell types and developmental time can be freely explored in visualizations across all the public data we use in this report at with NeMo Analytics [1] through https://nemoanalytics.org/p?l=CellCover . The code for CellCover is written in R and the Gurobi R interface and is available at [2].

Collapse

Omar M, Dinalankara W, Mulder L, Coady T, Zanettini C, Imada EL, Younes L, Geman D, Marchionni L. Using biological constraints to improve prediction in precision oncology. iScience 2023;26:106108. [PMID: 36852282 PMCID: PMC9958363 DOI: 10.1016/j.isci.2023.106108] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2022] [Revised: 12/20/2022] [Accepted: 01/28/2023] [Indexed: 02/05/2023] Open

Ke Q, Dinalankara W, Younes L, Geman D, Marchionni L. Abstract 173: Efficient representations of tumor diversity with paired DNA-RNA aberrations. Cancer Res 2021. [DOI: 10.1158/1538-7445.am2021-173] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]

Abstract Abstract In this work we develop a framework which allows for a systematic analysis of joint DNA and putative downstream RNA effects in cancer data cohorts. Using the Reactome database, we extract gene pairs that are linked by known mechanistic connections. Such pairs, which we refer to as 'Source Target Pairs' or STPs, consist of a source gene for which we examine aberrant activity in the DNA profile, and a target gene that is affected by said source gene, for which we examine aberrant activity in the RNA profile. Using TCGA data for six different cancer types (breast, colon, kidney, liver, lung and prostate), we use mutation and copy number variation information to compile DNA aberrant activity data. For the same cancer cohorts, we use RNASeq gene expression data to quantify RNA aberrant activity via the previous 'divergence' method we have developed. In the divergence framework, normal samples from the same cancer are used to estimate a normal range of expression for target genes of interest and deviation from the normal range is assumed to indicate aberrant activity which may result from upstream DNA aberrations. Then for a given sample, an STP can be represented as a binary variable, indicating presence or absence of joint DNA-RNA aberrant activity. We utilize integer programming to discover a small set of such STPs for each cancer type such that every sample displays aberrant activity in at least one STP. We refer to these reduced STP configurations as 'minimal coverings' of that cancer. These configurations then allow for the quantification of heterogeneity for that cancer type, as well as for phenotypical groups of interest. This is made possible due to the fact that sample to sample variability can be compared via the entropy of the distribution of the minimal covering, where the small number of STPs in such a configuration makes the computation more tractable. Our results reveal many known putative drivers of cancer, as well as identify some novel genes of interest for further consideration. Comparison of heterogeneity across phenotypes of interest show higher entropy in more pathological phenotypes, indicating increasing heterogeneity with severity of disease. Citation Format: Qian Ke, Wikum Dinalankara, Laurent Younes, Donald Geman, Luigi Marchionni. Efficient representations of tumor diversity with paired DNA-RNA aberrations [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2021; 2021 Apr 10-15 and May 17-21. Philadelphia (PA): AACR; Cancer Res 2021;81(13_Suppl):Abstract nr 173. Collapse

Baloni P, Dinalankara W, Earls JC, Knijnenburg TA, Geman D, Marchionni L, Price ND. Identifying Personalized Metabolic Signatures in Breast Cancer. Metabolites 2020;11:20. [PMID: 33396819 PMCID: PMC7823382 DOI: 10.3390/metabo11010020] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2020] [Revised: 12/23/2020] [Accepted: 12/28/2020] [Indexed: 01/04/2023] Open

Afsari B, Cope L, Gaykalova DA, Geman D, Puram S, Goff LA, Favorov A, Fertig EJ. Abstract 3399: Uncovering hidden sources of transcriptional dysregulation arising from inter- and intra-tumor heterogeneity. Cancer Res 2019. [DOI: 10.1158/1538-7445.am2019-3399] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]

Abstract Abstract Introduction: This study develops an innovative computational framework, Expression Variation Analysis (EVA), to model transcriptional dysregulation in cancer. Heterogeneity poses a major challenge in translational research. For example, inter-tumor heterogeneity limits the biomarker discovery and intra-tumor heterogeneity enables therapeutic resistance. Moreover, in some cancers driver mutations are insufficient to account for the widespread transcriptional variation responsible for these outcomes. Thus, new computational tools to model transcriptional variation are essential. Methods: EVA is a unified computational framework to model transcriptional variation in cancer. Briefly, EVA quantifies transcriptional heterogeneity for one set of samples or cells from one phenotype using the expected dissimilarity between pairs of expression profiles. U-statistics theory can then quantify the statistical significance of the difference in transcriptional heterogeneity between phenotypes. Results: We apply EVA to perform a comprehensive characterization of transcriptional variation in head and neck squamous cell carcinoma (HNSCC). At a pathway level, transcriptional variation in HNSCC tumors is higher than normal controls. Applying EVA to integrate ChIP-seq data with RNA-seq reveals that these pervasive transcriptional differences occur in enhancers. Similarly, applying EVA at a gene level to model splicing reveals more heterogeneity in transcript usage in tumor samples than normals. HPV- HNSCC tumors are unique in having mutations in genes that regulate the splicing machinery, and the HPV- tumors with these alterations have a greater number of dysregulated splice variants than those without. Nonetheless, the EVA analysis identifies a similar number of alternative splice variants in HPV+ as HPV- tumors suggesting an alternative mechanism of transcriptional heterogeneity in HPV+ disease. Adapting EVA to single cell data demonstrates that increased fibroblast composition is associated with greater variation in immune pathway activity in HNSCC. Moreover, we observe greater transcriptional heterogeneity in HNSCC primary tumors than lymph node metastasis consistent with a clonal outgrowth. Conclusions: We demonstrate that the statistical framework from EVA enables differential heterogeneity analysis in HNSCC ranging from pathway dysregulation, splice variation, epigenetic regulation, and single cell analysis. This algorithm provides a critical framework to model the hidden multi-molecular mechanisms underlying the complex patient outcomes that are pervasive in cancer. Citation Format: Bahman Afsari, Leslie Cope, Daria A. Gaykalova, Donald Geman, Sidharth Puram, Loyal A. Goff, Alexander Favorov, Elana Judith Fertig. Uncovering hidden sources of transcriptional dysregulation arising from inter- and intra-tumor heterogeneity [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2019; 2019 Mar 29-Apr 3; Atlanta, GA. Philadelphia (PA): AACR; Cancer Res 2019;79(13 Suppl):Abstract nr 3399. Collapse

Afsari B, Guo T, Considine M, Florea L, Kagohara LT, Stein-O'Brien GL, Kelley D, Flam E, Zambo KD, Ha PK, Geman D, Ochs MF, Califano JA, Gaykalova DA, Favorov AV, Fertig EJ. Splice Expression Variation Analysis (SEVA) for inter-tumor heterogeneity of gene isoform usage in cancer. Bioinformatics 2019;34:1859-1867. [PMID: 29342249 DOI: 10.1093/bioinformatics/bty004] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2017] [Accepted: 01/10/2018] [Indexed: 12/22/2022] Open

Abstract

Motivation

Current bioinformatics methods to detect changes in gene isoform usage in distinct phenotypes compare the relative expected isoform usage in phenotypes. These statistics model differences in isoform usage in normal tissues, which have stable regulation of gene splicing. Pathological conditions, such as cancer, can have broken regulation of splicing that increases the heterogeneity of the expression of splice variants. Inferring events with such differential heterogeneity in gene isoform usage requires new statistical approaches.

Results

We introduce Splice Expression Variability Analysis (SEVA) to model increased heterogeneity of splice variant usage between conditions (e.g. tumor and normal samples). SEVA uses a rank-based multivariate statistic that compares the variability of junction expression profiles within one condition to the variability within another. Simulated data show that SEVA is unique in modeling heterogeneity of gene isoform usage, and benchmark SEVA's performance against EBSeq, DiffSplice and rMATS that model differential isoform usage instead of heterogeneity. We confirm the accuracy of SEVA in identifying known splice variants in head and neck cancer and perform cross-study validation of novel splice variants. A novel comparison of splice variant heterogeneity between subtypes of head and neck cancer demonstrated unanticipated similarity between the heterogeneity of gene isoform usage in HPV-positive and HPV-negative subtypes and anticipated increased heterogeneity among HPV-negative samples with mutations in genes that regulate the splice variant machinery. These results show that SEVA accurately models differential heterogeneity of gene isoform usage from RNA-seq data.

Availability and implementation

SEVA is implemented in the R/Bioconductor package GSReg.

Contact

bahman@jhu.edu or favorov@sensi.org or ejfertig@jhmi.edu.

Supplementary information

Supplementary data are available at Bioinformatics online.

Collapse

Lahouel K, Geman D, Younes L. Coarse-to-fine multiple testing strategies. Electron J Stat 2019. [DOI: 10.1214/19-ejs1536] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]

Slama P, Hoopmann MR, Moritz RL, Geman D. Robust determination of differential abundance in shotgun proteomics using nonparametric statistics. Mol Omics 2018;14:424-436. [PMID: 30259924 PMCID: PMC6490964 DOI: 10.1039/c8mo00077h] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]

Afsari B, Guo T, Considine M, Kelley D, Flam E, Florea L, Ha P, Geman D, Ochs MF, Califano JA, Gaykalova DA, Favorov AV, Fertig EJ. Abstract 3577: Splice expression variation analysis (SEVA) for differential gene isoform usage in cancer. Cancer Res 2017. [DOI: 10.1158/1538-7445.am2017-3577] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]

Abstract Abstract Alternative splicing events (ASE) are a significant component of expression alterations in cancer, and have been demonstrated to be critically important in the development of malignant phenotypes in a variety of tumors. These alternative gene isoforms alter cell-signaling networks and serve as a hidden source of tumor-driving alterations not identified in multi-omics analyses. Recent studies have demonstrated that reads from RNA-seq data can infer gene isoforms expressed in a single sample. Therefore, RNA-seq data of tumors offers the opportunity to systematically evaluate expressed gene isoforms and identify splicing events in cancer samples. To characterize a cancer specific ASEs landscape, it is essential to perform differential splice variant expression analysis to identify isoform variants that are unique to tumor samples compared to normal tissue. In spite of the breadth of ASE algorithms, few have been validated in primary tumor samples. Current methods for differential splice variant analysis compare mean expression of gene isoforms in sample groups. Because these variants are tumor-specific, ASEs are expected to have more variable exon junction expression than normal samples. Therefore, current differential ASE analysis algorithms from RNA-seq may not account for heterogeneous gene isoform usage in tumors. To address this, we introduce Splice Expression Variability Analysis (SEVA) to detect differential splice variation usage in tumor and normal samples and accounts for tumor heterogeneity. This algorithm compares the degree of variability of junction expression profiles within a population of normal samples relative to that in tumor samples. The performance of SEVA was compared with two existing algorithms, EBSeq and DiffSplice, in simulated and real RNA-seq data. Simulated data suggest that SEVA is robust and computationally efficient relative to EBSeq and DiffSplice. In contrast to EBSeq and DiffSplice, SEVA was able to identify alternative splicing events independent of overall gene expression differences. Finally, additional validation was performed using RNA-seq data for primary tumor data from HPV-positive oropharynx squamous cell carcinoma (OPSCC) tumors and normal samples from both TCGA and an independent tumor cohort of 46 OPSCC tumors and 25 normal samples. In these tumor samples, SEVA finds cancer-specific ASEs in genes that are independent of their differential expression status. Moreover, SEVA finds approximately hundreds of splice variant candidates, manageable for experimental validation in contrast to the thousands of candidates found with EBSeq or DiffSplice. These candidates include experimentally validated splice variants in HNSCC from a previous microarray study. Based on performance in both simulated and real data, SEVA represents a robust algorithm that is well suited for differential ASE analysis, particularly in RNA-sequencing data from heterogeneous primary tumor samples. Citation Format: Bahman Afsari, Theresa Guo, Michael Considine, Dylan Kelley, Emily Flam, Liliana Florea, Patrick Ha, Donald Geman, Michael F. Ochs, Joseph A. Califano, Daria A. Gaykalova, Alexander V. Favorov, Elana J. Fertig. Splice expression variation analysis (SEVA) for differential gene isoform usage in cancer [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2017; 2017 Apr 1-5; Washington, DC. Philadelphia (PA): AACR; Cancer Res 2017;77(13 Suppl):Abstract nr 3577. doi:10.1158/1538-7445.AM2017-3577 Collapse

Dinalankara W, Qe Q, Ji L, Xu Y, Pagane N, Lobo F, Younes L, Geman D, Marchionni L. Abstract 4551: Divergence analysis with coarse coding of omics data across cancer phenotypes. Cancer Res 2017. [DOI: 10.1158/1538-7445.am2017-4551] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]

Abstract Abstract Motivation: Complex cancer omics data can be difficult to interpret and analyze with standard statistical methods. We thereby propose an innovative data representation that drastically reduces complexity while improving usability and interpretability for complex cancer phenotype analysis. Method: Despite recent advances in omics technologies, the robustness of predictive biomarkers in cancer remains severely limited. We hypothesize that this is primarily due to an overemphasis on applying statistical learning methods without taking into consideration the underlying biological processes driving cancer. We therefore propose a new approach based on representing data based on the comparison to a baseline group. This results in a data format that encodes biologically meaningful information and can be easily analyzed. We apply this transformation to publicly available datasets obtained across multiple tumor types using different omics technologies. For each cancer phenotype considered, we cross-validate the learned decision rules using SVMs and random forests and demonstrate that there is no drop in performance despite the use of a simplified data representation. We also apply the Chi-squared test to our simplified data to select genomic features differentially associated with relevant cancer phenotypes. To this end we compare our method to traditional class comparison approaches. Overall, this analysis shows that omics features selected by our method provides equal or better classification performance than standard methods. Further, we show that our simplified data representation filters out much of the biologically irrelevant variation and that the resulting data can be successfully applied to gene set analysis applications, ultimately improving inference on disease phenotypes. For instance, by applying our method to signaling pathways and cancer hallmarks gene sets, we show that our approach can be used to detect dysregulated pathways more efficiently than with traditional methods. Conclusion: By comparing cancer omics data to a baseline status, we obtain a much simpler data representation that preserves biologically relevant information while eliminating much of the unwanted variance that is often confounding in the analysis of high-dimensional data. Furthermore, data represented using our approach can be easily stored and analyzed, and it is equivalent or superior to traditional data representation methods for predicting clinically relevant cancer phenotypes and detecting biologically relevant cancer pathways. Citation Format: Wikum Dinalankara, Qian Qe, Lanlan Ji, Yiran Xu, Nicole Pagane, Francisco Lobo, Laurent Younes, Donald Geman, Luigi Marchionni. Divergence analysis with coarse coding of omics data across cancer phenotypes [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2017; 2017 Apr 1-5; Washington, DC. Philadelphia (PA): AACR; Cancer Res 2017;77(13 Suppl):Abstract nr 4551. doi:10.1158/1538-7445.AM2017-4551 Collapse

Ament SA, Pearl JR, Grindeland A, St. Claire J, Earls JC, Kovalenko M, Gillis T, Mysore J, Gusella JF, Lee JM, Kwak S, Howland D, Lee MY, Baxter D, Scherler K, Wang K, Geman D, Carroll JB, MacDonald ME, Carlson G, Wheeler VC, Price ND, Hood LE. High resolution time-course mapping of early transcriptomic, molecular and cellular phenotypes in Huntington's disease CAG knock-in mice across multiple genetic backgrounds. Hum Mol Genet 2017;26:913-922. [PMID: 28334820 PMCID: PMC6075528 DOI: 10.1093/hmg/ddx006] [Citation(s) in RCA: 28] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2016] [Revised: 12/09/2016] [Accepted: 01/03/2017] [Indexed: 01/11/2023] Open

Affiliation(s)

Seth A. Ament Institute for Systems Biology, Seattle, WA, USA Institute for Genome Sciences and Department of Psychiatry, University of Maryland School of Medicine, Baltimore, MD, USA
Jocelynn R. Pearl Institute for Systems Biology, Seattle, WA, USA Molecular and Cellular Biology Graduate Program, University of Washington, Seattle, WA, USA
Andrea Grindeland McLaughlin Research Institute, Great Falls, MT, USA
Jason St. Claire Center for Human Genetic Research, Massachusetts General Hospital, Department of Neurology, Harvard Medical School, Boston, MA, USA
John C. Earls Institute for Systems Biology, Seattle, WA, USA Department of Computer Science, University of Washington, Seattle, WA, USA
Marina Kovalenko Center for Human Genetic Research, Massachusetts General Hospital, Department of Neurology, Harvard Medical School, Boston, MA, USA
Tammy Gillis Center for Human Genetic Research, Massachusetts General Hospital, Department of Neurology, Harvard Medical School, Boston, MA, USA
Jayalakshmi Mysore Center for Human Genetic Research, Massachusetts General Hospital, Department of Neurology, Harvard Medical School, Boston, MA, USA
James F. Gusella Center for Human Genetic Research, Massachusetts General Hospital, Department of Neurology, Harvard Medical School, Boston, MA, USA
Jong-Min Lee Center for Human Genetic Research, Massachusetts General Hospital, Department of Neurology, Harvard Medical School, Boston, MA, USA
Seung Kwak CHDI Management/CHDI Foundation, Princeton, NJ, USA
David Howland CHDI Management/CHDI Foundation, Princeton, NJ, USA
Min Young Lee Institute for Systems Biology, Seattle, WA, USA
David Baxter Institute for Systems Biology, Seattle, WA, USA
Kelsey Scherler Institute for Systems Biology, Seattle, WA, USA
Kai Wang Institute for Systems Biology, Seattle, WA, USA
Donald Geman Department of Applied Mathematics and Statistics, Johns Hopkins University, Baltimore, MD, USA
Jeffrey B. Carroll Behavioral Neuroscience Program, Department of Psychology, Western Washington University, Bellingham, WA, USA
Marcy E. MacDonald Center for Human Genetic Research, Massachusetts General Hospital, Department of Neurology, Harvard Medical School, Boston, MA, USA
George Carlson McLaughlin Research Institute, Great Falls, MT, USA
Vanessa C. Wheeler Center for Human Genetic Research, Massachusetts General Hospital, Department of Neurology, Harvard Medical School, Boston, MA, USA
Nathan D. Price Institute for Systems Biology, Seattle, WA, USA
Leroy E. Hood Institute for Systems Biology, Seattle, WA, USA

Collapse

Geman D. Confluent Brownian motions. ADV APPL PROBAB 2016. [DOI: 10.2307/1426583] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]

Chang LB, Geman D. Tracking Cross-Validated Estimates of Prediction Error as Studies Accumulate. J Am Stat Assoc 2015. [DOI: 10.1080/01621459.2014.1002926] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]

Marchionni L, Geman D. Abstract 3754: Predicting cancer phenotypes with mechanism-driven multi-omics data integration. Cancer Res 2015. [DOI: 10.1158/1538-7445.am2015-3754] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]

Abstract Abstract Over the past decade technological advances have enabled molecular profiling of human cancers across distinct genomic domains and other “omes”. The availability of such multi-omics datasets has in turn enabled the discovery of cancer subtypes characterized by distinct molecular patterns within and across different data modalities. Despite promising beginnings and the wealth of data, most efforts so far have focused on the discovery of new molecular taxonomies, enumerating novel cancer subtypes, and only subsequently projecting them into a biological context by leveraging knowledge on genetic and epigenetic variations, genomic alterations, gene expression patterns, and, in general, cell pathophysiology. A paradigmatic approach to omics-based cancer classification usually entails the i) discovery of novel molecular subtypes; (ii) the biological contextualization of such subtypes and their correlation with clinical phenotypes; and (iii) the development of predictors to detect these subtypes. Nevertheless, the direct clinical utility of such taxonomies is less evident. Some of the molecular subtypes, for instance, might not portend any different clinical behavior, or the underlying molecular pathways might not be actionable. Ultimately, existing biological knowledge enters the analysis only a posteriori to characterize and “label” the novel subtypes, rather than being leveraged a priori to guide the discovery process itself. To overcome such nearly universal absence of mechanistic underpinnings for the omics-derived signatures and develop clinically useful biomarkers, we have proposed to develop mechanistic predictive models by incorporating gene network and signaling pathway information directly into the statistical learning process used to detect the cancer phenotypes. Unlike the paradigm described above, we used omics data and prior biological information to directly detect and predict the phenotypes. We now further extend this concept and leverage biological knowledge also to constrain multi-omics data integration, by implementing predictive rules that mechanistically aggregate measurements across distinct genomic modalities, reproducing the natural flow of biological information in the cell: from genome to phenotype, through epigenome, transcriptome and proteome. To illustrate our approach and its impact on computational learning and cancer classification, we analyze clinically relevant cancer phenotypes using independent training and testing data. To this end we build our novel predictors using the Top Scoring Pair (TSP) algorithm, a two-gene parameter-free classifier, and its multi-pair extension kTSP. We then compare the classification performance of predictors derived from a single omics modality to those constructed by integrating multi-omics data according to mechanistic and biologically meaningful rules, revealing increased accuracy with the integrated classifiers. Citation Format: Luigi Marchionni, Donald Geman. Predicting cancer phenotypes with mechanism-driven multi-omics data integration. [abstract]. In: Proceedings of the 106th Annual Meeting of the American Association for Cancer Research; 2015 Apr 18-22; Philadelphia, PA. Philadelphia (PA): AACR; Cancer Res 2015;75(15 Suppl):Abstract nr 3754. doi:10.1158/1538-7445.AM2015-3754 Collapse

Geman D, Geman S, Hallonquist N, Younes L. Visual Turing test for computer vision systems. Proc Natl Acad Sci U S A 2015;112:3618-23. [PMID: 25755262 PMCID: PMC4378453 DOI: 10.1073/pnas.1422953112] [Citation(s) in RCA: 51] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open

Geman D, Ochs M, Price ND, Tomasetti C, Younes L. An argument for mechanism-based statistical inference in cancer. Hum Genet 2014;134:479-95. [PMID: 25381197 DOI: 10.1007/s00439-014-1501-x] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2014] [Accepted: 10/14/2014] [Indexed: 01/07/2023]

Afsari B, Geman D, Fertig EJ. Learning dysregulated pathways in cancers from differential variability analysis. Cancer Inform 2014;13:61-7. [PMID: 25392694 PMCID: PMC4218688 DOI: 10.4137/cin.s14066] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2014] [Revised: 08/13/2014] [Accepted: 08/14/2014] [Indexed: 12/16/2022] Open

Ma S, Sung J, Magis AT, Wang Y, Geman D, Price ND. Measuring the effect of inter-study variability on estimating prediction error. PLoS One 2014;9:e110840. [PMID: 25330348 PMCID: PMC4201588 DOI: 10.1371/journal.pone.0110840] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2014] [Accepted: 09/18/2014] [Indexed: 11/19/2022] Open

Abstract

Background

The biomarker discovery field is replete with molecular signatures that have not translated into the clinic despite ostensibly promising performance in predicting disease phenotypes. One widely cited reason is lack of classification consistency, largely due to failure to maintain performance from study to study. This failure is widely attributed to variability in data collected for the same phenotype among disparate studies, due to technical factors unrelated to phenotypes (e.g., laboratory settings resulting in “batch-effects”) and non-phenotype-associated biological variation in the underlying populations. These sources of variability persist in new data collection technologies.

Methods

Here we quantify the impact of these combined “study-effects” on a disease signature’s predictive performance by comparing two types of validation methods: ordinary randomized cross-validation (RCV), which extracts random subsets of samples for testing, and inter-study validation (ISV), which excludes an entire study for testing. Whereas RCV hardwires an assumption of training and testing on identically distributed data, this key property is lost in ISV, yielding systematic decreases in performance estimates relative to RCV. Measuring the RCV-ISV difference as a function of number of studies quantifies influence of study-effects on performance.

Results

As a case study, we gathered publicly available gene expression data from 1,470 microarray samples of 6 lung phenotypes from 26 independent experimental studies and 769 RNA-seq samples of 2 lung phenotypes from 4 independent studies. We find that the RCV-ISV performance discrepancy is greater in phenotypes with few studies, and that the ISV performance converges toward RCV performance as data from additional studies are incorporated into classification.

Conclusions

We show that by examining how fast ISV performance approaches RCV as the number of studies is increased, one can estimate when “sufficient” diversity has been achieved for learning a molecular signature likely to translate without significant loss of accuracy to new clinical settings.

Collapse

Afsari B, Fertig EJ, Younes L, Geman D, Marchionni L. Abstract 5342: Hardwiring mechanism into predicting cancer phenotypes by computational learning. Cancer Res 2014. [DOI: 10.1158/1538-7445.am2014-5342] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]

Abstract Abstract Rationale. Despite promising beginnings, molecular classifiers derived from statistical learning do not yet appear to be sufficiently mature for clinical use. Besides known limitations, the nearly universal absence of mechanistic underpinnings for such signatures represents as major barrier toward successful implementation of clinically useful biomarkers. To overcome this limitation we constrained the search for predictive models to those with mechanistic justification, by incorporating microRNA (miR) and transcription factor (TF) gene regulatory networks directly into the learning process of cancer phenotypes. Methods. To illustrate the impact of embedding such regulatory motifs into computational learning, we analyzed the ability to predict estrogen receptor (ER) status from transcriptional data. We applied this approach to two independent breast cancer studies used as training and validation sets respectively. This analysis provided a test case with well-characterized clinical attributes, in which the ER itself is a TF engaged in regulatory miR/TF motifs. We built our predictors using Top Scoring Pair (TSP), a two-gene parameter-free classifier returning one class (ER positive) or the other (ER negative) based on the relative ordering of the two genes. We compared classification performance between TSPs chosen from all possible gene pairs and TSPs constructed under network-based constraints - “random” and “mechanistic” TSPs respectively hereafter. Each “mechanistic” TSP consists of a gene pair: the first gene regulates a miR or a TF “hub”, which in turn regulates the second gene. We started from a network of 200 TFs, 373 miRs, and 2772 target genes based on regulatory information from the miRgen v2.0 and TarBase v5.0 databases. Results. We assessed the classification accuracy of the TSP classifiers derived from the training dataset in the validation set and nearly all top-performing predictors were based on regulatory motifs. A Wilcoxon rank-sum test comparing the “random” classifiers with either TF or miR based TSPs had P-values of 10−14 and 10−26, respectively. Most of such top “mechanistic” predictors involved the ER gene (ERS1), consistent with the underlying biology. The mechanistic predictor also paired ERS1 expression with genes relevant to the biology. For instance, TSP selected POU2F1 _ a TF member of the POU family also known as OCT1 _ which physically interacts with the ER itself and BRCA1, recruiting BRCA1 to the ESR1 promoter modulating ER expression. Consistent with the classifier, BRCA1-mutant breast tumors are typically estrogen ER negative. Conclusions. We have implemented a novel class of mechanistic predictors by ”hardwiring” gene regulatory network information into statistical learning of cancer phenotypes. This approach has intrinsic added value for knowledge discovery and disease treatment design, and will ultimately move the field towards a successful transition to personalized health care. Citation Format: Bahman Afsari, Elana Judith Fertig, Laurent Younes, Donald Geman, Luigi Marchionni. Hardwiring mechanism into predicting cancer phenotypes by computational learning. [abstract]. In: Proceedings of the 105th Annual Meeting of the American Association for Cancer Research; 2014 Apr 5-9; San Diego, CA. Philadelphia (PA): AACR; Cancer Res 2014;74(19 Suppl):Abstract nr 5342. doi:10.1158/1538-7445.AM2014-5342 Collapse

Afsari B, Fertig EJ, Geman D, Marchionni L. switchBox: an R package for k-Top Scoring Pairs classifier development. ACTA ACUST UNITED AC 2014;31:273-4. [PMID: 25262153 DOI: 10.1093/bioinformatics/btu622] [Citation(s) in RCA: 32] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]

Afsari B, Braga-Neto UM, Geman D. Rank discriminants for predicting phenotypes from RNA expression. Ann Appl Stat 2014. [DOI: 10.1214/14-aoas738] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]

Simcha DM, Younes L, Aryee MJ, Geman D. Identification of direction in gene networks from expression and methylation. BMC Syst Biol 2013;7:118. [PMID: 24182195 PMCID: PMC4228359 DOI: 10.1186/1752-0509-7-118] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/17/2012] [Accepted: 10/17/2013] [Indexed: 01/27/2023]

Sung J, Kim PJ, Ma S, Funk CC, Magis AT, Wang Y, Hood L, Geman D, Price ND. Multi-study integration of brain cancer transcriptomes reveals organ-level molecular signatures. PLoS Comput Biol 2013;9:e1003148. [PMID: 23935471 PMCID: PMC3723500 DOI: 10.1371/journal.pcbi.1003148] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2012] [Accepted: 06/05/2013] [Indexed: 12/23/2022] Open

Abstract

We utilized abundant transcriptomic data for the primary classes of brain cancers to study the feasibility of separating all of these diseases simultaneously based on molecular data alone. These signatures were based on a new method reported herein – Identification of Structured Signatures and Classifiers (ISSAC) – that resulted in a brain cancer marker panel of 44 unique genes. Many of these genes have established relevance to the brain cancers examined herein, with others having known roles in cancer biology. Analyses on large-scale data from multiple sources must deal with significant challenges associated with heterogeneity between different published studies, for it was observed that the variation among individual studies often had a larger effect on the transcriptome than did phenotype differences, as is typical. For this reason, we restricted ourselves to studying only cases where we had at least two independent studies performed for each phenotype, and also reprocessed all the raw data from the studies using a unified pre-processing pipeline. We found that learning signatures across multiple datasets greatly enhanced reproducibility and accuracy in predictive performance on truly independent validation sets, even when keeping the size of the training set the same. This was most likely due to the meta-signature encompassing more of the heterogeneity across different sources and conditions, while amplifying signal from the repeated global characteristics of the phenotype. When molecular signatures of brain cancers were constructed from all currently available microarray data, 90% phenotype prediction accuracy, or the accuracy of identifying a particular brain cancer from the background of all phenotypes, was found. Looking forward, we discuss our approach in the context of the eventual development of organ-specific molecular signatures from peripheral fluids such as the blood.

From a multi-study, integrated transcriptomic dataset, we identified a marker panel for differentiating major human brain cancers at the gene-expression level. The ISSAC molecular signatures for brain cancers, composed of 44 unique genes, are based on comparing expression levels of pairs of genes, and phenotype prediction follows a diagnostic hierarchy. We found that sufficient dataset integration across multiple studies greatly enhanced diagnostic performance on truly independent validation sets, whereas signatures learned from only one dataset typically led to high error rate. Molecular signatures of brain cancers, when obtained using all currently available gene-expression data, achieved 90% phenotype prediction accuracy. Thus, our integrative approach holds significant promise for developing organ-level, comprehensive, molecular signatures of disease.

Collapse

Marchionni L, Afsari B, Geman D, Leek JT. A simple and reproducible breast cancer prognostic test. BMC Genomics 2013;14:336. [PMID: 23682826 PMCID: PMC3662649 DOI: 10.1186/1471-2164-14-336] [Citation(s) in RCA: 38] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2012] [Accepted: 05/04/2013] [Indexed: 11/10/2022] Open

Winslow RL, Trayanova N, Geman D, Miller MI. Computational medicine: translating models to clinical care. Sci Transl Med 2013;4:158rv11. [PMID: 23115356 DOI: 10.1126/scitranslmed.3003528] [Citation(s) in RCA: 139] [Impact Index Per Article: 12.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]

Sánchez-Vega F, Younes L, Geman D. Learning multivariate distributions by competitive assembly of marginals. IEEE Trans Pattern Anal Mach Intell 2013;35:398-410. [PMID: 22529323 DOI: 10.1109/tpami.2012.96] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/31/2023]

Simcha D, Price ND, Geman D. The limits of de novo DNA motif discovery. PLoS One 2012;7:e47836. [PMID: 23144830 PMCID: PMC3492406 DOI: 10.1371/journal.pone.0047836] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2012] [Accepted: 09/21/2012] [Indexed: 12/02/2022] Open

Abstract

A major challenge in molecular biology is reverse-engineering the cis-regulatory logic that plays a major role in the control of gene expression. This program includes searching through DNA sequences to identify “motifs” that serve as the binding sites for transcription factors or, more generally, are predictive of gene expression across cellular conditions. Several approaches have been proposed for de novo motif discovery–searching sequences without prior knowledge of binding sites or nucleotide patterns. However, unbiased validation is not straightforward. We consider two approaches to unbiased validation of discovered motifs: testing the statistical significance of a motif using a DNA “background” sequence model to represent the null hypothesis and measuring performance in predicting membership in gene clusters. We demonstrate that the background models typically used are “too null,” resulting in overly optimistic assessments of significance, and argue that performance in predicting TF binding or expression patterns from DNA motifs should be assessed by held-out data, as in predictive learning. Applying this criterion to common motif discovery methods resulted in universally poor performance, although there is a marked improvement when motifs are statistically significant against real background sequences. Moreover, on synthetic data where “ground truth” is known, discriminative performance of all algorithms is far below the theoretical upper bound, with pronounced “over-fitting” in training. A key conclusion from this work is that the failure of de novo discovery approaches to accurately identify motifs is basically due to statistical intractability resulting from the fixed size of co-regulated gene clusters, and thus such failures do not necessarily provide evidence that unfound motifs are not active biologically. Consequently, the use of prior knowledge to enhance motif discovery is not just advantageous but necessary. An implementation of the LR and ALR algorithms is available at http://code.google.com/p/likelihood-ratio-motifs/.

Collapse

Yörük E, Ochs MF, Geman D, Younes L. A comprehensive statistical model for cell signaling. IEEE/ACM Trans Comput Biol Bioinform 2011;8:592-606. [PMID: 20855924 PMCID: PMC3081531 DOI: 10.1109/tcbb.2010.87] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/29/2023]

Slama P, Geman D. Identification of family-determining residues in PHD fingers. Nucleic Acids Res 2010;39:1666-79. [PMID: 21059680 PMCID: PMC3061080 DOI: 10.1093/nar/gkq947] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open

Leek JT, Scharpf RB, Bravo HC, Simcha D, Langmead B, Johnson WE, Geman D, Baggerly K, Irizarry RA. Tackling the widespread and critical impact of batch effects in high-throughput data. Nat Rev Genet 2010;11:733-9. [PMID: 20838408 DOI: 10.1038/nrg2825] [Citation(s) in RCA: 1253] [Impact Index Per Article: 89.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]

Eddy JA, Sung J, Geman D, Price ND. Relative expression analysis for molecular cancer diagnosis and prognosis. Technol Cancer Res Treat 2010;9:149-59. [PMID: 20218737 DOI: 10.1177/153303461000900204] [Citation(s) in RCA: 87] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022] Open

Abstract

The enormous amount of biomolecule measurement data generated from high-throughput technologies has brought an increased need for computational tools in biological analyses. Such tools can enhance our understanding of human health and genetic diseases, such as cancer, by accurately classifying phenotypes, detecting the presence of disease, discriminating among cancer sub-types, predicting clinical outcomes, and characterizing disease progression. In the case of gene expression microarray data, standard statistical learning methods have been used to identify classifiers that can accurately distinguish disease phenotypes. However, these mathematical prediction rules are often highly complex, and they lack the convenience and simplicity desired for extracting underlying biological meaning or transitioning into the clinic. In this review, we survey a powerful collection of computational methods for analyzing transcriptomic microarray data that address these limitations. Relative Expression Analysis (RXA) is based only on the relative orderings among the expressions of a small number of genes. Specifically, we provide a description of the first and simplest example of RXA, the K-TSP classifier, which is based on _ pairs of genes; the case K = 1 is the TSP classifier. Given their simplicity and ease of biological interpretation, as well as their invariance to data normalization and parameter-fitting, these classifiers have been widely applied in aiding molecular diagnostics in a broad range of human cancers. We review several studies which demonstrate accurate classification of disease phenotypes (e.g., cancer vs. normal), cancer subclasses (e.g., AML vs. ALL, GIST vs. LMS), disease outcomes (e.g., metastasis, survival), and diverse human pathologies assayed through blood-borne leukocytes. The studies presented demonstrate that RXA-specifically the TSP and K-TSP classifiers-is a promising new class of computational methods for analyzing high-throughput data, and has the potential to significantly contribute to molecular cancer diagnosis and prognosis.

Collapse

Eddy JA, Hood L, Price ND, Geman D. Identifying tightly regulated and variably expressed networks by Differential Rank Conservation (DIRAC). PLoS Comput Biol 2010;6:e1000792. [PMID: 20523739 PMCID: PMC2877722 DOI: 10.1371/journal.pcbi.1000792] [Citation(s) in RCA: 69] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2009] [Accepted: 04/22/2010] [Indexed: 12/18/2022] Open

Abstract

A powerful way to separate signal from noise in biology is to convert the molecular data from individual genes or proteins into an analysis of comparative biological network behaviors. One of the limitations of previous network analyses is that they do not take into account the combinatorial nature of gene interactions within the network. We report here a new technique, Differential Rank Conservation (DIRAC), which permits one to assess these combinatorial interactions to quantify various biological pathways or networks in a comparative sense, and to determine how they change in different individuals experiencing the same disease process. This approach is based on the relative expression values of participating genes—i.e., the ordering of expression within network profiles. DIRAC provides quantitative measures of how network rankings differ either among networks for a selected phenotype or among phenotypes for a selected network. We examined disease phenotypes including cancer subtypes and neurological disorders and identified networks that are tightly regulated, as defined by high conservation of transcript ordering. Interestingly, we observed a strong trend to looser network regulation in more malignant phenotypes and later stages of disease. At a sample level, DIRAC can detect a change in ranking between phenotypes for any selected network. Variably expressed networks represent statistically robust differences between disease states and serve as signatures for accurate molecular classification, validating the information about expression patterns captured by DIRAC. Importantly, DIRAC can be applied not only to transcriptomic data, but to any ordinal data type.

The systems approach to medicine derives from the idea that diseased cells arise from one or more perturbed biological networks due to the net effect of interactions among multiple molecular agents; by measuring differences in the abundance of biomolecules (e.g., mRNA, proteins, metabolites) we can identify reporters of network states and uncover molecular signatures of disease. However, a major limitation of previously published network analyses is the focus on small numbers of individual, differentially-expressed genes, hence the failure to take into account combinatorial interactions. We report a new technique, Differential Rank Conservation, for identifying and measuring network-level perturbations. Our rank conservation index is based entirely on the relative levels of expression for participating genes and allows us to detect differences in network orderings between networks for a given phenotype and between phenotypes for a given network. In examining cancer subtypes and neurological disorders, we identified networks that are tightly and loosely regulated, as defined by the level of conservation of transcript ordering, and observed a strong trend to looser network regulation in more malignant phenotypes and later stages of disease. We also demonstrate that variably expressed networks represent robust differences between disease states.

Collapse

Geman S, Geman D. Stochastic relaxation, Gibbs distributions and the Bayesian restoration of images*. J Appl Stat 2010. [DOI: 10.1080/02664769300000058] [Citation(s) in RCA: 48] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]

Eddy JA, Geman D, Price ND. Relative expression analysis for identifying perturbed pathways. Annu Int Conf IEEE Eng Med Biol Soc 2010;2009:5456-9. [PMID: 19964680 DOI: 10.1109/iembs.2009.5334063] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]

Edelman LB, Toia G, Geman D, Zhang W, Price ND. Two-transcript gene expression classifiers in the diagnosis and prognosis of human diseases. BMC Genomics 2009;10:583. [PMID: 19961616 PMCID: PMC2797819 DOI: 10.1186/1471-2164-10-583] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2009] [Accepted: 12/05/2009] [Indexed: 11/15/2022] Open

Lin X, Afsari B, Marchionni L, Cope L, Parmigiani G, Naiman D, Geman D. The ordering of expression among a few genes can provide simple cancer biomarkers and signal BRCA1 mutations. BMC Bioinformatics 2009;10:256. [PMID: 19695104 PMCID: PMC2745389 DOI: 10.1186/1471-2105-10-256] [Citation(s) in RCA: 29] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2009] [Accepted: 08/20/2009] [Indexed: 11/11/2022] Open

Ferecatu M, Geman D. A statistical framework for image category search from a mental picture. IEEE Trans Pattern Anal Mach Intell 2009;31:1087-1101. [PMID: 19372612 DOI: 10.1109/tpami.2008.259] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/27/2023]

Wang JZ, Geman D, Luo J, Gray RM. Real-world image annotation and retrieval: an introduction to the special section. IEEE Trans Pattern Anal Mach Intell 2008;30:1873-1876. [PMID: 19791313 DOI: 10.1109/tpami.2008.231] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/28/2023]

Xu L, Geman D, Winslow RL. Large-scale integration of cancer microarray data identifies a robust common cancer signature. BMC Bioinformatics 2007;8:275. [PMID: 17663766 PMCID: PMC1950528 DOI: 10.1186/1471-2105-8-275] [Citation(s) in RCA: 75] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2007] [Accepted: 07/30/2007] [Indexed: 11/15/2022] Open

Anderson TJ, Tchernyshyov I, Diez R, Cole RN, Geman D, Dang CV, Winslow RL. Discovering robust protein biomarkers for disease from relative expression reversals in 2-D DIGE data. Proteomics 2007;7:1197-207. [PMID: 17366473 DOI: 10.1002/pmic.200600374] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]

Xu L, Tan AC, Naiman DQ, Geman D, Winslow RL. Robust prostate cancer marker genes emerge from direct integration of inter-study microarray data. Bioinformatics 2005;21:3905-11. [PMID: 16131522 DOI: 10.1093/bioinformatics/bti647] [Citation(s) in RCA: 98] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open

Tan AC, Naiman DQ, Xu L, Winslow RL, Geman D. Simple decision rules for classifying human cancers from gene expression profiles. Bioinformatics 2005;21:3896-904. [PMID: 16105897 PMCID: PMC1987374 DOI: 10.1093/bioinformatics/bti631] [Citation(s) in RCA: 246] [Impact Index Per Article: 12.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open

Blanchard G, Geman D. Hierarchical testing designs for pattern recognition. Ann Stat 2005. [DOI: 10.1214/009053605000000174] [Citation(s) in RCA: 39] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]

Fang Y, Geman D. Experiments in Mental Face Retrieval. Lecture Notes in Computer Science 2005. [DOI: 10.1007/11527923_66] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]

Amit Y, Geman D, Fan X. A coarse-to-fine strategy for multiclass shape detection. IEEE Trans Pattern Anal Mach Intell 2004;26:1606-1621. [PMID: 15573821 DOI: 10.1109/tpami.2004.111] [Citation(s) in RCA: 16] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/24/2023]

Geman D, d'Avignon C, Naiman DQ, Winslow RL. Classifying gene expression profiles from pairwise mRNA comparisons. Stat Appl Genet Mol Biol 2004;3:Article19. [PMID: 16646797 PMCID: PMC1989150 DOI: 10.2202/1544-6115.1071] [Citation(s) in RCA: 226] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2023]

Fleuret F, Geman D. Int J Comput Vis 2001;41:85-107. [DOI: 10.1023/a:1011113216584] [Citation(s) in RCA: 162] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]

Amit Y, Geman D. A computational model for visual selection. Neural Comput 1999;11:1691-715. [PMID: 10490943 DOI: 10.1162/089976699300016197] [Citation(s) in RCA: 96] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022]