1
|
Kwan B, Fuhrer T, Montemayor D, Fink JC, He J, Hsu CY, Messer K, Nelson RG, Pu M, Ricardo AC, Rincon-Choles H, Shah VO, Ye H, Zhang J, Sharma K, Natarajan L. A generalized covariate-adjusted top-scoring pair algorithm with applications to diabetic kidney disease stage classification in the Chronic Renal Insufficiency Cohort (CRIC) Study. BMC Bioinformatics 2023; 24:57. [PMID: 36803209 PMCID: PMC9942303 DOI: 10.1186/s12859-023-05171-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2022] [Accepted: 02/02/2023] [Indexed: 02/22/2023] Open
Abstract
BACKGROUND The growing amount of high dimensional biomolecular data has spawned new statistical and computational models for risk prediction and disease classification. Yet, many of these methods do not yield biologically interpretable models, despite offering high classification accuracy. An exception, the top-scoring pair (TSP) algorithm derives parameter-free, biologically interpretable single pair decision rules that are accurate and robust in disease classification. However, standard TSP methods do not accommodate covariates that could heavily influence feature selection for the top-scoring pair. Herein, we propose a covariate-adjusted TSP method, which uses residuals from a regression of features on the covariates for identifying top scoring pairs. We conduct simulations and a data application to investigate our method, and compare it to existing classifiers, LASSO and random forests. RESULTS Our simulations found that features that were highly correlated with clinical variables had high likelihood of being selected as top scoring pairs in the standard TSP setting. However, through residualization, our covariate-adjusted TSP was able to identify new top scoring pairs, that were largely uncorrelated with clinical variables. In the data application, using patients with diabetes (n = 977) selected for metabolomic profiling in the Chronic Renal Insufficiency Cohort (CRIC) study, the standard TSP algorithm identified (valine-betaine, dimethyl-arg) as the top-scoring metabolite pair for classifying diabetic kidney disease (DKD) severity, whereas the covariate-adjusted TSP method identified the pair (pipazethate, octaethylene glycol) as top-scoring. Valine-betaine and dimethyl-arg had, respectively, ≥ 0.4 absolute correlation with urine albumin and serum creatinine, known prognosticators of DKD. Thus without covariate-adjustment the top-scoring pair largely reflected known markers of disease severity, whereas covariate-adjusted TSP uncovered features liberated from confounding, and identified independent prognostic markers of DKD severity. Furthermore, TSP-based methods achieved competitive classification accuracy in DKD to LASSO and random forests, while providing more parsimonious models. CONCLUSIONS We extended TSP-based methods to account for covariates, via a simple, easy to implement residualizing process. Our covariate-adjusted TSP method identified metabolite features, uncorrelated from clinical covariates, that discriminate DKD severity stage based on the relative ordering between two features, and thus provide insights into future studies on the order reversals in early vs advanced disease states.
Collapse
Grants
- R01 DK110541 NIDDK NIH HHS
- U24 DK060990 NIDDK NIH HHS
- R01DK118736, 1R01DK110541-01A1, U01DK060990, U01DK060984, U01DK061022, U01DK061021, U01DK061028, U01DK060980, U01DK060963, U01DK060902, U24DK060990 NIDDK NIH HHS
- National Science Foundation Graduate Research Fellowship Program
- Intramural Research Program of the National Institute of Diabetes and Digestive and Kidney Diseases
- National Institute of Diabetes and Digestive and Kidney Diseases
Collapse
Affiliation(s)
- Brian Kwan
- Division of Biostatistics and Bioinformatics, Herbert Wertheim School of Public Health, University of California, San Diego, La Jolla, CA, USA
- Moores Cancer Center, University of California, San Diego, La Jolla, CA, USA
| | - Tobias Fuhrer
- Institute of Molecular Systems Biology, ETH Zurich, Zurich, Switzerland
| | - Daniel Montemayor
- Division of Nephrology, Department of Medicine, University of Texas Health San Antonio, San Antonio, TX, USA
- Center for Renal Precision Medicine, University of Texas Health San Antonio, San Antonio, TX, USA
| | - Jeffery C Fink
- Department of Medicine, University of Maryland, Baltimore School of Medicine, Baltimore, MD, USA
| | - Jiang He
- Department of Epidemiology, Tulane University School of Public Health and Tropical Medicine and Tulane University Translational Science Institute,, New Orleans, LA, USA
| | - Chi-Yuan Hsu
- Division of Nephrology, University of California, San Francisco School of Medicine, San Francisco, CA, USA
| | - Karen Messer
- Division of Biostatistics and Bioinformatics, Herbert Wertheim School of Public Health, University of California, San Diego, La Jolla, CA, USA
- Moores Cancer Center, University of California, San Diego, La Jolla, CA, USA
| | - Robert G Nelson
- Chronic Kidney Disease Section, National Institute of Diabetes and Digestive and Kidney Diseases, Phoenix, AZ, USA
| | - Minya Pu
- Moores Cancer Center, University of California, San Diego, La Jolla, CA, USA
| | - Ana C Ricardo
- Department of Medicine, University of Illinois, Chicago, IL, USA
| | - Hernan Rincon-Choles
- Department of Nephrology, Glickman Urological and Kidney Institute, Cleveland Clinic Foundation, Cleveland, OH, USA
| | - Vallabh O Shah
- University of New Mexico Health Sciences Center, Albuquerque, NM, USA
| | - Hongping Ye
- Division of Nephrology, Department of Medicine, University of Texas Health San Antonio, San Antonio, TX, USA
- Center for Renal Precision Medicine, University of Texas Health San Antonio, San Antonio, TX, USA
| | - Jing Zhang
- Moores Cancer Center, University of California, San Diego, La Jolla, CA, USA
| | - Kumar Sharma
- Division of Nephrology, Department of Medicine, University of Texas Health San Antonio, San Antonio, TX, USA
- Center for Renal Precision Medicine, University of Texas Health San Antonio, San Antonio, TX, USA
| | - Loki Natarajan
- Division of Biostatistics and Bioinformatics, Herbert Wertheim School of Public Health, University of California, San Diego, La Jolla, CA, USA.
- Moores Cancer Center, University of California, San Diego, La Jolla, CA, USA.
| |
Collapse
|
2
|
Zhou S, Ren X, Yang J, Jin Q. Evaluating the Value of Defensins for Diagnosing Secondary Bacterial Infections in Influenza-Infected Patients. Front Microbiol 2018; 9:2762. [PMID: 30524393 PMCID: PMC6256186 DOI: 10.3389/fmicb.2018.02762] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2018] [Accepted: 10/29/2018] [Indexed: 11/13/2022] Open
Abstract
Acute respiratory infections by influenza viruses are commonly causes of severe pneumonia, which can further deteriorate if secondary bacterial infections occur. Although the viral and bacterial agents are quite diverse, defensins, a set of antimicrobial peptides expressed by the host, may provide promising biomarkers that would greatly improve the diagnosis and treatment. We examined the correlations between the gene expression levels of defensins and the viral and bacterial loads in the blood on a longitudinal, precision-medical study of a severe pneumonia patient infected by influenza A H7N9 virus. We found that DEFA5 is positively correlated to the blood load of influenza A H7N9 virus (r = 0.735, p < 0.05, Spearman correlation). DEFB116 and DEFB127 are positively and DEFB108B and DEFB114 are negatively correlated to the bacterial load. Then the diagnostic potential of defensins to discriminate bacterial and viral infections was evaluated on an independent dataset with 61 bacterial pneumonia patients and 39 viral pneumonia patients infected by influenza A viruses and reached 93% accuracy. Expression levels of defensins in the blood may be of important diagnostic values in clinic to indicate viral and bacterial infections.
Collapse
Affiliation(s)
- Siyu Zhou
- MOH Key Laboratory of Systems Biology of Pathogens, Peking Union Medical College, Institute of Pathogen Biology, Chinese Academy of Medical Sciences, Beijing, China
| | - Xianwen Ren
- BIOPIC, School of Life Sciences, Peking University, Beijing, China
| | - Jian Yang
- MOH Key Laboratory of Systems Biology of Pathogens, Peking Union Medical College, Institute of Pathogen Biology, Chinese Academy of Medical Sciences, Beijing, China
| | - Qi Jin
- MOH Key Laboratory of Systems Biology of Pathogens, Peking Union Medical College, Institute of Pathogen Biology, Chinese Academy of Medical Sciences, Beijing, China
| |
Collapse
|
3
|
Makarov V, Gorlin A. Computational method for discovery of biomarker signatures from large, complex data sets. Comput Biol Chem 2018; 76:161-168. [DOI: 10.1016/j.compbiolchem.2018.07.008] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2017] [Revised: 07/02/2018] [Accepted: 07/04/2018] [Indexed: 11/30/2022]
|
4
|
Langgartner D, Füchsl AM, Kaiser LM, Meier T, Foertsch S, Buske C, Reber SO, Mulaw MA. Biomarkers for classification and class prediction of stress in a murine model of chronic subordination stress. PLoS One 2018; 13:e0202471. [PMID: 30183738 PMCID: PMC6124755 DOI: 10.1371/journal.pone.0202471] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2017] [Accepted: 08/03/2018] [Indexed: 12/22/2022] Open
Abstract
Selye defined stress as the nonspecific response of the body to any demand and thus an inherent element of all diseases. He reported that rats show adrenal hypertrophy, thymicolymphatic atrophy, and gastrointestinal ulceration, referred to as the stress triad, upon repeated exposure to nocuous agents. However, Selye's stress triad as well as its extended version including reduced body weight gain, increased plasma glucocorticoid (GC) concentrations, and GC resistance of target cells do not represent reliable discriminatory biomarkers for chronic stress. To address this, we collected multivariate biological data from male mice exposed either to the preclinically validated chronic subordinate colony housing (CSC) paradigm or to single-housed control (SHC) condition. We then used principal component analysis (PCA), top scoring pairs (tsp) and support vector machines (SVM) analyses to identify markers that discriminate between chronically stressed and non-stressed mice. PCA segregated stressed and non-stressed mice, with high loading for some of Selye's stress triad parameters. The tsp analysis, a simple and highly interpretable statistical approach, identified left adrenal weight and relative thymus weight as the pair with the highest discrimination score and prediction accuracy validated by a blinded dataset (92% p-value < 0.0001; SVM model = 83% accuracy and p-value < 0.0001). This finding clearly shows that simultaneous consideration of these two parameters can be used as a reliable biomarker of chronic stress status. Furthermore, our analysis highlights that the tsp approach is a very powerful method whose application extends beyond what has previously been reported.
Collapse
Affiliation(s)
- Dominik Langgartner
- Laboratory for Molecular Psychosomatics, Clinic for Psychosomatic Medicine and Psychotherapy, Ulm University, Ulm, Germany
| | - Andrea M. Füchsl
- Laboratory for Molecular Psychosomatics, Clinic for Psychosomatic Medicine and Psychotherapy, Ulm University, Ulm, Germany
| | - Lisa M. Kaiser
- Institute for Experimental Cancer Research, Comprehensive Cancer Center Ulm, Ulm University, Ulm, Germany
| | - Tatjana Meier
- Clinic for Psychosomatic Medicine and Psychotherapy, Ulm University, Ulm, Germany
| | - Sandra Foertsch
- Laboratory for Molecular Psychosomatics, Clinic for Psychosomatic Medicine and Psychotherapy, Ulm University, Ulm, Germany
| | - Christian Buske
- Institute for Experimental Cancer Research, Comprehensive Cancer Center Ulm, Ulm University, Ulm, Germany
| | - Stefan O. Reber
- Laboratory for Molecular Psychosomatics, Clinic for Psychosomatic Medicine and Psychotherapy, Ulm University, Ulm, Germany
| | - Medhanie A. Mulaw
- Institute for Experimental Cancer Research, Comprehensive Cancer Center Ulm, Ulm University, Ulm, Germany
| |
Collapse
|
5
|
Stansfield JC, Rusay M, Shan R, Kelton C, Gaykalova DA, Fertig EJ, Califano JA, Ochs MF. Toward Signaling-Driven Biomarkers Immune to Normal Tissue Contamination. Cancer Inform 2016; 15:15-21. [PMID: 26884679 PMCID: PMC4750896 DOI: 10.4137/cin.s32468] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2015] [Revised: 12/08/2015] [Accepted: 12/10/2015] [Indexed: 01/17/2023] Open
Abstract
The goal of this study was to discover a minimally invasive pathway-specific biomarker that is immune to normal cell mRNA contamination for diagnosing head and neck squamous cell carcinoma (HNSCC). Using Elsevier's MedScan natural language processing component of the Pathway Studio software and the TRANSFAC database, we produced a curated set of genes regulated by the signaling networks driving the development of HNSCC. The network and its gene targets provided prior probabilities for gene expression, which guided our CoGAPS matrix factorization algorithm to isolate patterns related to HNSCC signaling activity from a microarray-based study. Using patterns that distinguished normal from tumor samples, we identified a reduced set of genes to analyze with Top Scoring Pair in order to produce a potential biomarker for HNSCC. Our proposed biomarker comprises targets of the transcription factor (TF) HIF1A and the FOXO family of TFs coupled with genes that show remarkable stability across all normal tissues. Based on validation with novel data from The Cancer Genome Atlas (TCGA), measured by RNAseq, and bootstrap sampling, the biomarker for normal vs. tumor has an accuracy of 0.77, a Matthews correlation coefficient of 0.54, and an area under the curve (AUC) of 0.82.
Collapse
Affiliation(s)
- John C Stansfield
- Department of Mathematics and Statistics, The College of New Jersey, Ewing, NJ, USA
| | - Matthew Rusay
- Department of Mathematics and Statistics, The College of New Jersey, Ewing, NJ, USA
| | - Roger Shan
- Department of Mathematics and Statistics, The College of New Jersey, Ewing, NJ, USA
| | - Conor Kelton
- Department of Mathematics and Statistics, The College of New Jersey, Ewing, NJ, USA
| | - Daria A Gaykalova
- Department of Otolaryngology-Head and Neck Surgery, Johns Hopkins Medical Institutions, Baltimore, MD, USA
| | - Elana J Fertig
- Division of Oncology Biostatistics and Bioinformatics, Johns Hopkins School of Medicine, Baltimore, MD, USA
| | - Joseph A Califano
- Department of Otolaryngology-Head and Neck Surgery, Johns Hopkins Medical Institutions, Baltimore, MD, USA.; Milton J. Dance Jr. Head and Neck Center, Greater Baltimore Medical Center, Baltimore, MD, USA
| | - Michael F Ochs
- Department of Mathematics and Statistics, The College of New Jersey, Ewing, NJ, USA
| |
Collapse
|
6
|
Yang S, Naiman DQ. Multiclass cancer classification based on gene expression comparison. Stat Appl Genet Mol Biol 2015; 13:477-96. [PMID: 24918456 DOI: 10.1515/sagmb-2013-0053] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
As the complexity and heterogeneity of cancer is being increasingly appreciated through genomic analyses, microarray-based cancer classification comprising multiple discriminatory molecular markers is an emerging trend. Such multiclass classification problems pose new methodological and computational challenges for developing novel and effective statistical approaches. In this paper, we introduce a new approach for classifying multiple disease states associated with cancer based on gene expression profiles. Our method focuses on detecting small sets of genes in which the relative comparison of their expression values leads to class discrimination. For an m-class problem, the classification rule typically depends on a small number of m-gene sets, which provide transparent decision boundaries and allow for potential biological interpretations. We first test our approach on seven common gene expression datasets and compare it with popular classification methods including support vector machines and random forests. We then consider an extremely large cohort of leukemia cancer patients to further assess its effectiveness. In both experiments, our method yields comparable or even better results to benchmark classifiers. In addition, we demonstrate that our approach can integrate pathway analysis of gene expression to provide accurate and biological meaningful classification.
Collapse
|
7
|
Geman D, Ochs M, Price ND, Tomasetti C, Younes L. An argument for mechanism-based statistical inference in cancer. Hum Genet 2015; 134:479-95. [PMID: 25381197 PMCID: PMC4612627 DOI: 10.1007/s00439-014-1501-x] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2014] [Accepted: 10/14/2014] [Indexed: 01/07/2023]
Abstract
Cancer is perhaps the prototypical systems disease, and as such has been the focus of extensive study in quantitative systems biology. However, translating these programs into personalized clinical care remains elusive and incomplete. In this perspective, we argue that realizing this agenda—in particular, predicting disease phenotypes, progression and treatment response for individuals—requires going well beyond standard computational and bioinformatics tools and algorithms. It entails designing global mathematical models over network-scale configurations of genomic states and molecular concentrations, and learning the model parameters from limited available samples of high-dimensional and integrative omics data. As such, any plausible design should accommodate: biological mechanism, necessary for both feasible learning and interpretable decision making; stochasticity, to deal with uncertainty and observed variation at many scales; and a capacity for statistical inference at the patient level. This program, which requires a close, sustained collaboration between mathematicians and biologists, is illustrated in several contexts, including learning biomarkers, metabolism, cell signaling, network inference and tumorigenesis.
Collapse
Affiliation(s)
- Donald Geman
- Department of Applied Mathematics and Statistics, Johns Hopkins University, Baltimore, MD, 21210, USA,
| | | | | | | | | |
Collapse
|
8
|
Afsari B, Braga-Neto UM, Geman D. Rank discriminants for predicting phenotypes from RNA expression. Ann Appl Stat 2014. [DOI: 10.1214/14-aoas738] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
9
|
Wang X. Identification of Marker Genes for Cancer Based on Microarrays Using a Computational Biology Approach. Curr Bioinform 2014; 9:140-146. [PMID: 24683388 DOI: 10.2174/1574893608999140109115649] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Rapid advances in gene expression microarray technology have enabled to discover molecular markers used for cancer diagnosis, prognosis, and prediction. One computational challenge with using microarray data analysis to create cancer classifiers is how to effectively deal with microarray data which are composed of high-dimensional attributes (p) and low-dimensional instances (n). Gene selection and classifier construction are two key issues concerned with this topics. In this article, we reviewed major methods for computational identification of cancer marker genes. We concluded that simple methods should be preferred to complicated ones for their interpretability and applicability.
Collapse
Affiliation(s)
- Xiaosheng Wang
- Biometric Research Branch, National Cancer Institute, National Institutes of Health, Rockville, MD 20852, U.S.A
| |
Collapse
|
10
|
Hoppe A. What mRNA Abundances Can Tell us about Metabolism. Metabolites 2012; 2:614-31. [PMID: 24957650 PMCID: PMC3901220 DOI: 10.3390/metabo2030614] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2012] [Revised: 08/24/2012] [Accepted: 09/04/2012] [Indexed: 01/23/2023] Open
Abstract
Inferring decreased or increased metabolic functions from transcript profiles is at first sight a bold and speculative attempt because of the functional layers in between: proteins, enzymatic activities, and reaction fluxes. However, the growing interest in this field can easily be explained by two facts: the high quality of genome-scale metabolic network reconstructions and the highly developed technology to obtain genome-covering RNA profiles. Here, an overview of important algorithmic approaches is given by means of criteria by which published procedures can be classified. The frontiers of the methods are sketched and critical voices are being heard. Finally, an outlook for the prospects of the field is given.
Collapse
Affiliation(s)
- Andreas Hoppe
- Institute for Biochemistry, Charité University Medicine Berlin, Charitéplatz 1, Berlin 10117, Germany.
| |
Collapse
|
11
|
Kaur P, Schlatzer D, Cooke K, Chance MR. Pairwise protein expression classifier for candidate biomarker discovery for early detection of human disease prognosis. BMC Bioinformatics 2012; 13:191. [PMID: 22870920 PMCID: PMC3468399 DOI: 10.1186/1471-2105-13-191] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2012] [Accepted: 07/30/2012] [Indexed: 12/24/2022] Open
Abstract
BACKGROUND An approach to molecular classification based on the comparative expression of protein pairs is presented. The method overcomes some of the present limitations in using peptide intensity data for class prediction for problems such as the detection of a disease, disease prognosis, or for predicting treatment response. Data analysis is particularly challenging in these situations due to sample size (typically tens) being much smaller than the large number of peptides (typically thousands). Methods based upon high dimensional statistical models, machine learning or other complex classifiers generate decisions which may be very accurate but can be complex and difficult to interpret in simple or biologically meaningful terms. A classification scheme, called ProtPair, is presented that generates simple decision rules leading to accurate classification which is based on measurement of very few proteins and requires only relative expression values, providing specific targeted hypotheses suitable for straightforward validation. RESULTS ProtPair has been tested against clinical data from 21 patients following a bone marrow transplant, 13 of which progress to idiopathic pneumonia syndrome (IPS). The approach combines multiple peptide pairs originating from the same set of proteins, with each unique peptide pair providing an independent measure of discriminatory power. The prediction rate of the ProtPair for IPS study as measured by leave-one-out CV is 69.1%, which can be very beneficial for clinical diagnosis as it may flag patients in need of closer monitoring. The "top ranked" proteins provided by ProtPair are known to be associated with the biological processes and pathways intimately associated with known IPS biology based on mouse models. CONCLUSIONS An approach to biomarker discovery, called ProtPair, is presented. ProtPair is based on the differential expression of pairs of peptides and the associated proteins. Using mass spectrometry data from "bottom up" proteomics methods, functionally related proteins/peptide pairs exhibiting co-ordinated changes expression profile are discovered, which represent a signature for patients progressing to various disease conditions. The method has been tested against clinical data from patients progressing to idiopthatic pneumonia syndrome (IPS) following a bone marrow transplant. The data indicates that patients with improper regulation in the concentration of specific acute phase response proteins at the time of bone marrow transplant are highly likely to develop IPS within few weeks. The results lead to a specific set of protein pairs that can be efficiently verified by investigating the pairwise abundance change in independent cohorts using ELISA or targeted mass spectrometry techniques. This generalized classifier can be extended to other clinical problems in a variety of contexts.
Collapse
Affiliation(s)
- Parminder Kaur
- Case Center for Proteomics and Bioinformatics, Case Western Reserve University, Cleveland, OH 44106, USA
| | - Daniela Schlatzer
- Case Center for Proteomics and Bioinformatics, Case Western Reserve University, Cleveland, OH 44106, USA
| | - Kenneth Cooke
- Pediatric Hematology and Oncology, University Hospitals, Cleveland, OH 44106, USA
| | - Mark R Chance
- Case Center for Proteomics and Bioinformatics, Case Western Reserve University, Cleveland, OH 44106, USA
| |
Collapse
|
12
|
Drew JE. Cellular defense system gene expression profiling of human whole blood: opportunities to predict health benefits in response to diet. Adv Nutr 2012; 3:499-505. [PMID: 22797985 PMCID: PMC3649718 DOI: 10.3945/an.112.002121] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022] Open
Abstract
Diet is a critical factor in the maintenance of human cellular defense systems, immunity, inflammation, redox regulation, metabolism, and DNA repair that ensure optimal health and reduce disease risk. Assessment of dietary modulation of cellular defense systems in humans has been limited due to difficulties in accessing target tissues. Notably, peripheral blood gene expression profiles associated with nonhematologic disease are detectable. Coupled with recent innovations in gene expression technologies, gene expression profiling of human blood to determine predictive markers associated with health status and dietary modulation is now a feasible prospect for nutrition scientists. This review focuses on cellular defense system gene expression profiling of human whole blood and the opportunities this presents, using recent technological advances, to predict health status and benefits conferred by diet.
Collapse
|
13
|
Robust two-gene classifiers for cancer prediction. Genomics 2011; 99:90-5. [PMID: 22138042 DOI: 10.1016/j.ygeno.2011.11.003] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2011] [Revised: 11/04/2011] [Accepted: 11/09/2011] [Indexed: 11/23/2022]
Abstract
Two-gene classifiers have attracted a broad interest for their simplicity and practicality. Most existing two-gene classification algorithms were involved in exhaustive search that led to their low time-efficiencies. In this study, we proposed two new two-gene classification algorithms which used simple univariate gene selection strategy and constructed simple classification rules based on optimal cut-points for two genes selected. We detected the optimal cut-point with the information entropy principle. We applied the two-gene classification models to eleven cancer gene expression datasets and compared their classification performance to that of some established two-gene classification models like the top-scoring pairs model and the greedy pairs model, as well as standard methods including Diagonal Linear Discriminant Analysis, k-Nearest Neighbor, Support Vector Machine and Random Forest. These comparisons indicated that the performance of our two-gene classifiers was comparable to or better than that of compared models.
Collapse
|
14
|
Jin K, Zheng X, Xia Y. Gene Expression Profiling via Multigene Concatemers. PLoS One 2011; 6:e15711. [PMID: 21267445 PMCID: PMC3022625 DOI: 10.1371/journal.pone.0015711] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2010] [Accepted: 11/23/2010] [Indexed: 12/26/2022] Open
Abstract
We established a novel method, Gene Expression Profiling via Multigene Concatemers (MgC-GEP), to study multigene expression patterns simultaneously. This method consists of the following steps: (1) cDNA was obtained using specific reverse primers containing an adaptor. (2) During the initial 1-3 cycles of polymerase chain reaction (PCR), the products containing universal adaptors with digestion sites at both termini were amplified using specific forward and reverse primers containing the adaptors. (3) In the subsequent 4-28 cycles, the universal adaptors were used as primers to yield products. (4) The products were digested and ligated to produce concatemers. (5) The concatemers were cloned into the vector and sequenced. Then, the occurrence of each gene tag was determined. To validate MgC-GEP, we analyzed 20 genes in Saccharomyces cerevisiae induced by weak acid using MgC-GEP combined with real-time reverse transcription (RT)-PCR. Compared with the results of real-time RT-PCR and the previous reports of microarray analysis, MgC-GEP can precisely determine the transcript levels of multigenes simultaneously. Importantly, MgC-GEP is a cost effective strategy that can be widely used in most laboratories without specific equipment. MgC-GEP is a potentially powerful tool for multigene expression profiling, particularly for moderate-throughput analysis.
Collapse
Affiliation(s)
- Kai Jin
- Genetic Engineering Research Center, School of Bioengineering, Chongqing University, Chongqing, People's Republic of China
- Chongqing Engineering Research Center for Fungal Insecticide, Chongqing, People's Republic of China
- Key Laboratory of Gene Function and Regulation Technologies under Chongqing Municipal Education Commission, Chongqing, People's Republic of China
| | - Xiaoli Zheng
- Genetic Engineering Research Center, School of Bioengineering, Chongqing University, Chongqing, People's Republic of China
- Chongqing Engineering Research Center for Fungal Insecticide, Chongqing, People's Republic of China
- Key Laboratory of Gene Function and Regulation Technologies under Chongqing Municipal Education Commission, Chongqing, People's Republic of China
| | - Yuxian Xia
- Genetic Engineering Research Center, School of Bioengineering, Chongqing University, Chongqing, People's Republic of China
- Chongqing Engineering Research Center for Fungal Insecticide, Chongqing, People's Republic of China
- Key Laboratory of Gene Function and Regulation Technologies under Chongqing Municipal Education Commission, Chongqing, People's Republic of China
| |
Collapse
|
15
|
Eddy JA, Sung J, Geman D, Price ND. Relative expression analysis for molecular cancer diagnosis and prognosis. Technol Cancer Res Treat 2010; 9:149-59. [PMID: 20218737 DOI: 10.1177/153303461000900204] [Citation(s) in RCA: 79] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022] Open
Abstract
The enormous amount of biomolecule measurement data generated from high-throughput technologies has brought an increased need for computational tools in biological analyses. Such tools can enhance our understanding of human health and genetic diseases, such as cancer, by accurately classifying phenotypes, detecting the presence of disease, discriminating among cancer sub-types, predicting clinical outcomes, and characterizing disease progression. In the case of gene expression microarray data, standard statistical learning methods have been used to identify classifiers that can accurately distinguish disease phenotypes. However, these mathematical prediction rules are often highly complex, and they lack the convenience and simplicity desired for extracting underlying biological meaning or transitioning into the clinic. In this review, we survey a powerful collection of computational methods for analyzing transcriptomic microarray data that address these limitations. Relative Expression Analysis (RXA) is based only on the relative orderings among the expressions of a small number of genes. Specifically, we provide a description of the first and simplest example of RXA, the K-TSP classifier, which is based on _ pairs of genes; the case K = 1 is the TSP classifier. Given their simplicity and ease of biological interpretation, as well as their invariance to data normalization and parameter-fitting, these classifiers have been widely applied in aiding molecular diagnostics in a broad range of human cancers. We review several studies which demonstrate accurate classification of disease phenotypes (e.g., cancer vs. normal), cancer subclasses (e.g., AML vs. ALL, GIST vs. LMS), disease outcomes (e.g., metastasis, survival), and diverse human pathologies assayed through blood-borne leukocytes. The studies presented demonstrate that RXA-specifically the TSP and K-TSP classifiers-is a promising new class of computational methods for analyzing high-throughput data, and has the potential to significantly contribute to molecular cancer diagnosis and prognosis.
Collapse
Affiliation(s)
- James A Eddy
- Institute for Genomic Biology, University of Illinois, Urbana, IL 61801, USA
| | | | | | | |
Collapse
|