1
|
Zhang X, Lee J, Goh WWB. An Investigation of How Normalisation and Local Modelling Techniques Confound Machine Learning Performance In a Mental Health Study. Heliyon 2022; 8:e09502. [PMID: 35663731 PMCID: PMC9156999 DOI: 10.1016/j.heliyon.2022.e09502] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2021] [Revised: 03/12/2022] [Accepted: 05/16/2022] [Indexed: 01/12/2023] Open
Abstract
Machine learning (ML) is increasingly deployed on biomedical studies for biomarker development (feature selection) and diagnostic/prognostic technologies (classification). While different ML techniques produce different feature sets and classification performances, less understood is how upstream data processing methods (e.g., normalisation) impact downstream analyses. Using a clinical mental health dataset, we investigated the impact of different normalisation techniques on classification model performance. Gene Fuzzy Scoring (GFS), an in-house developed normalisation technique, is compared against widely used normalisation methods such as global quantile normalisation, class-specific quantile normalisation and surrogate variable analysis. We report that choice of normalisation technique has strong influence on feature selection. with GFS outperforming other techniques. Although GFS parameters are tuneable, good classification model performance (ROC-AUC > 0.90) is observed regardless of the GFS parameter settings. We also contrasted our results against local modelling, which is meant to improve the resolution and meaningfulness of classification models built on heterogeneous data. Local models, when derived from non-biologically meaningful subpopulations, perform worse than global models. A deep dive however, revealed that the factors driving cluster formation has little to do with the phenotype-of-interest. This finding is critical, as local models are often seen as a superior means of clinical data modelling. We advise against such naivete. Additionally, we have developed a combinatorial reasoning approach using both global and local paradigms: This helped reveal potential data quality issues or underlying factors causing data heterogeneity that are often overlooked. It also assists to explain the model as well as provides directions for further improvement.
Collapse
Affiliation(s)
- Xinxin Zhang
- School of Biological Sciences, Nanyang Technological University, 637551, Singapore
| | - Jimmy Lee
- North Region & Department of Psychosis, Institute of Mental Health, 539747, Singapore
- Corresponding author.
| | - Wilson Wen Bin Goh
- School of Biological Sciences, Nanyang Technological University, 637551, Singapore
- Lee Kong Chian School of Medicine, Nanyang Technological University, 636921, Singapore
- Centre for Biomedical Informatics, Nanyang Technological University, 636921, Singapore
- Corresponding author.
| |
Collapse
|
2
|
Hou J, Archer KJ. Regularization method for predicting an ordinal response using longitudinal high-dimensional genomic data. Stat Appl Genet Mol Biol 2015; 14:93-111. [PMID: 25720102 PMCID: PMC4454613 DOI: 10.1515/sagmb-2014-0004] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
Abstract An ordinal scale is commonly used to measure health status and disease related outcomes in hospital settings as well as in translational medical research. In addition, repeated measurements are common in clinical practice for tracking and monitoring the progression of complex diseases. Classical methodology based on statistical inference, in particular, ordinal modeling has contributed to the analysis of data in which the response categories are ordered and the number of covariates (p) remains smaller than the sample size (n). With the emergence of genomic technologies being increasingly applied for more accurate diagnosis and prognosis, high-dimensional data where the number of covariates (p) is much larger than the number of samples (n), are generated. To meet the emerging needs, we introduce our proposed model which is a two-stage algorithm: Extend the generalized monotone incremental forward stagewise (GMIFS) method to the cumulative logit ordinal model; and combine the GMIFS procedure with the classical mixed-effects model for classifying disease status in disease progression along with time. We demonstrate the efficiency and accuracy of the proposed models in classification using a time-course microarray dataset collected from the Inflammation and the Host Response to Injury study.
Collapse
Affiliation(s)
- Jiayi Hou
- Department of Biostatistics, Virginia Commonwealth University.
| | - Kellie J. Archer
- Department of Biostatistics, Virginia Commonwealth University, 830 East Main St., Room 718, Richmond, VA 23298-0032, United States.
| |
Collapse
|
3
|
Abstract
Alterations in neurodevelopment are thought to modify risk of numerous psychiatric disorders, including schizophrenia, autism, ADHD, mood and anxiety disorders, and substance abuse. However, little is known about the cellular and molecular changes that guide these neurodevelopmental changes and how they contribute to mental illness. In this review, we suggest that elucidating this process in humans requires the use of model organisms. Furthermore, we advocate that such translational work should focus on the role that genes and/or environmental factors play in the development of circuits that regulate specific physiological and behavioral outcomes in adulthood. This emphasis on circuit development, as a fundamental unit for understanding behavior, is distinct from current approaches of modeling psychiatric illnesses in animals in two important ways. First, it proposes to replace the diagnostic and statistical manual of mental disorders (DSM) diagnostic system with measurable endophenotypes as the basis for modeling human psychopathology in animals. We argue that a major difficulty in establishing valid animal models lies in their reliance on the DSM/International Classification of Diseases conceptual framework, and suggest that the Research Domain Criteria project, recently proposed by the NIMH, provides a more suitable system to model human psychopathology in animals. Second, this proposal emphasizes the developmental origin of many (though clearly not all) psychiatric illnesses, an issue that is often glossed over in current animal models of mental illness. We suggest that animal models are essential to elucidate the mechanisms by which neurodevelopmental changes program complex behavior in adulthood. A better understanding of this issue, in animals, is the key for defining human psychopathology, and the development of earlier and more effective interventions for mental illness.
Collapse
|
4
|
Stress coping stimulates hippocampal neurogenesis in adult monkeys. Proc Natl Acad Sci U S A 2010; 107:14823-7. [PMID: 20675584 DOI: 10.1073/pnas.0914568107] [Citation(s) in RCA: 80] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Coping with intermittent social stress is an essential aspect of living in complex social environments. Coping tends to counteract the deleterious effects of stress and is thought to induce neuroadaptations in corticolimbic brain systems. Here we test this hypothesis in adult squirrel monkey males exposed to intermittent social separations and new pair formations. These manipulations simulate conditions that typically occur in male social associations because of competition for limited access to residency in mixed-sex groups. As evidence of coping, we previously confirmed that cortisol levels initially increase and then are restored to prestress levels within several days of each separation and new pair formation. Follow-up studies with exogenous cortisol further established that feedback regulation of the hypothalamic-pituitary-adrenal axis is not impaired. Now we report that exposure to intermittent social separations and new pair formations increased hippocampal neurogenesis in squirrel monkey males. Hippocampal neurogenesis in rodents contributes to spatial learning performance, and in monkeys we found that spatial learning was enhanced in conditions that increased hippocampal neurogenesis. Corresponding changes were discerned in the expression of genes involved in survival and integration of adult-born granule cells into hippocampal neural circuits. These findings support recent indications that stress coping stimulates hippocampal neurogenesis in adult rodents. Psychotherapies designed to promote stress coping potentially have similar effects in humans with major depression.
Collapse
|
5
|
Fukuoka T, Sumida K, Yamada T, Higuchi C, Nakagaki K, Nakamura K, Kohsaka S, Saito K, Oeda K. Gene expression profiles in the common marmoset brain determined using a newly developed common marmoset-specific DNA microarray. Neurosci Res 2010; 66:62-85. [DOI: 10.1016/j.neures.2009.09.1709] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2009] [Revised: 08/28/2009] [Accepted: 09/28/2009] [Indexed: 10/20/2022]
|
6
|
Her S, Lee MS, Morita K. Trichostatin A Stimulates Steroid 5α-Reductase Gene Expression in Rat C6 Glioma Cells via a Mechanism Involving Sp1 and Sp3 Transcription Factors. J Mol Neurosci 2009; 41:252-62. [DOI: 10.1007/s12031-009-9284-6] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2009] [Accepted: 08/02/2009] [Indexed: 10/20/2022]
|
7
|
Urbanski HF, Noriega NC, Lemos DR, Kohama SG. Gene expression profiling in the rhesus macaque: experimental design considerations. Methods 2009; 49:26-31. [PMID: 19467336 PMCID: PMC2734384 DOI: 10.1016/j.ymeth.2009.05.009] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2009] [Revised: 05/06/2009] [Accepted: 05/18/2009] [Indexed: 12/31/2022] Open
Abstract
The development of species-specific gene microarrays has greatly facilitated gene expression profiling in nonhuman primates. However, to obtain accurate and physiologically meaningful data from these microarrays, one needs to consider several factors when designing the studies. This article focuses on effective experimental design while the companion article focuses on methodology and data analysis. Biological cycles have a major influence on gene expression, and at least 10% of the expressed genes are likely to show a 24-h expression pattern. Consequently, the time of day when RNA samples are collected can influence detection of significant changes in gene expression levels. Similarly, when photoperiodic species such as the rhesus macaque are housed outdoors, some of their genes show differential expression according to the time of year. In addition, the sex-steroid environment of humans and many nonhuman primates changes markedly across the menstrual cycle, and so phase of the cycle needs to be considered when studying gene expression in adult females.
Collapse
Affiliation(s)
- Henryk F Urbanski
- Division of Neuroscience, Oregon National Primate Research Center, Beaverton, OR 97006, USA.
| | | | | | | |
Collapse
|
8
|
Antidepressant actions of the exercise-regulated gene VGF. Nat Med 2007; 13:1476-82. [PMID: 18059283 DOI: 10.1038/nm1669] [Citation(s) in RCA: 199] [Impact Index Per Article: 11.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2007] [Accepted: 09/20/2007] [Indexed: 01/30/2023]
Abstract
Exercise has many health benefits, including antidepressant actions in depressed human subjects, but the mechanisms underlying these effects have not been elucidated. We used a custom microarray to identify a previously undescribed profile of exercise-regulated genes in the mouse hippocampus, a brain region implicated in mood and antidepressant response. Pathway analysis of the regulated genes shows that exercise upregulates a neurotrophic factor signaling cascade that has been implicated in the actions of antidepressants. One of the most highly regulated target genes of exercise and of the growth factor pathway is the gene encoding the VGF nerve growth factor, a peptide precursor previously shown to influence synaptic plasticity and metabolism. We show that administration of a synthetic VGF-derived peptide produces a robust antidepressant response in mice and, conversely, that mutation of VGF in mice produces the opposite effects. The results suggest a new role for VGF and identify VGF signaling as a potential therapeutic target for antidepressant drug development.
Collapse
|
9
|
Karssen AM, Her S, Li JZ, Patel PD, Meng F, Bunney WE, Jones EG, Watson SJ, Akil H, Myers RM, Schatzberg AF, Lyons DM. Stress-induced changes in primate prefrontal profiles of gene expression. Mol Psychiatry 2007; 12:1089-102. [PMID: 17893703 DOI: 10.1038/sj.mp.4002095] [Citation(s) in RCA: 71] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Stressful experiences that consistently increase cortisol levels appear to alter the expression of hundreds of genes in prefrontal limbic brain regions. Here, we investigate this hypothesis in monkeys exposed to intermittent social stress-induced episodes of hypercortisolism or a no-stress control condition. Prefrontal profiles of gene expression compiled from Affymetrix microarray data for monkeys randomized to the no-stress condition were consistent with microarray results published for healthy humans. In monkeys exposed to intermittent social stress, more genes than expected by chance appeared to be differentially expressed in ventromedial prefrontal cortex compared to monkeys not exposed to adult social stress. Most of these stress responsive candidate genes were modestly downregulated, including ubiquitin conjugation enzymes and ligases involved in synaptic plasticity, cell cycle progression and nuclear receptor signaling. Social stress did not affect gene expression beyond that expected by chance in dorsolateral prefrontal cortex or prefrontal white matter. Thirty four of 48 comparisons chosen for verification by quantitative real-time polymerase chain reaction (qPCR) were consistent with the microarray-predicted result. Furthermore, qPCR and microarray data were highly correlated. These results provide new insights on the regulation of gene expression in a prefrontal corticolimbic region involved in the pathophysiology of stress and major depression. Comparisons between these data from monkeys and those for ventromedial prefrontal cortex in humans with a history of major depression may help to distinguish the molecular signature of stress from other confounding factors in human postmortem brain research.
Collapse
Affiliation(s)
- A M Karssen
- Department of Psychiatry and Behavioral Sciences, Stanford University, Stanford, CA 94305-5485, USA
| | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
10
|
Lachance PED, Chaudhuri A. Gene profiling of pooled single neuronal cell bodies from laser capture microdissected vervet monkey lateral geniculate nucleus hybridized to the Rhesus Macaque Genome Array. Brain Res 2007; 1185:33-44. [PMID: 17996221 DOI: 10.1016/j.brainres.2007.09.080] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2007] [Revised: 08/27/2007] [Accepted: 09/23/2007] [Indexed: 12/12/2022]
Abstract
This report is based on an ongoing study to examine gene expression differences in monkey lateral geniculate nucleus (LGN). Here, samples from an Old World species, the vervet monkey (Cercopithecus aethiops), were cross-hybridized to the Rhesus Macaque Genome Array (Affymetrix). Microarray analysis was performed using laser capture microdissected populations of individual neuronal cell bodies isolated from the LGN compared to heterogeneous samples from whole lamina. Our results indicated that cross-species hybridization of microdissected brain tissue samples from vervet monkeys to the Rhesus array produced reliable and biologically relevant data sets. We present the first list of genes enriched in the large neuronal cell bodies of the LGN. We found that these cell bodies are concentrated with genes involved in metabolic processes and protein synthesis, whereas signaling molecules including chemokines and integrins were expressed at higher levels within heterogeneous samples. Our data set also provides support for a contribution of Wnt signaling in adult monkey LGN.
Collapse
Affiliation(s)
- Pascal E D Lachance
- Department of Psychology, McGill University, 1205 Ave. Dr. Penfield, Montreal, QC, Canada H3A1B1.
| | | |
Collapse
|
11
|
Datson NA, Morsink MC, Atanasova S, Armstrong VW, Zischler H, Schlumbohm C, Dutilh BE, Huynen MA, Waegele B, Ruepp A, de Kloet ER, Fuchs E. Development of the first marmoset-specific DNA microarray (EUMAMA): a new genetic tool for large-scale expression profiling in a non-human primate. BMC Genomics 2007; 8:190. [PMID: 17592630 PMCID: PMC1929077 DOI: 10.1186/1471-2164-8-190] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2007] [Accepted: 06/25/2007] [Indexed: 01/01/2023] Open
Abstract
Background The common marmoset monkey (Callithrix jacchus), a small non-endangered New World primate native to eastern Brazil, is becoming increasingly used as a non-human primate model in biomedical research, drug development and safety assessment. In contrast to the growing interest for the marmoset as an animal model, the molecular tools for genetic analysis are extremely limited. Results Here we report the development of the first marmoset-specific oligonucleotide microarray (EUMAMA) containing probe sets targeting 1541 different marmoset transcripts expressed in hippocampus. These 1541 transcripts represent a wide variety of different functional gene classes. Hybridisation of the marmoset microarray with labelled RNA from hippocampus, cortex and a panel of 7 different peripheral tissues resulted in high detection rates of 85% in the neuronal tissues and on average 70% in the non-neuronal tissues. The expression profiles of the 2 neuronal tissues, hippocampus and cortex, were highly similar, as indicated by a correlation coefficient of 0.96. Several transcripts with a tissue-specific pattern of expression were identified. Besides the marmoset microarray we have generated 3215 ESTs derived from marmoset hippocampus, which have been annotated and submitted to GenBank [GenBank: EF214838 – EF215447, EH380242 – EH382846]. Conclusion We have generated the first marmoset-specific DNA microarray and demonstrated its use to characterise large-scale gene expression profiles of hippocampus but also of other neuronal and non-neuronal tissues. In addition, we have generated a large collection of ESTs of marmoset origin, which are now available in the public domain. These new tools will facilitate molecular genetic research into this non-human primate animal model.
Collapse
Affiliation(s)
- Nicole A Datson
- Division of Medical Pharmacology, Leiden/Amsterdam Center for Drug Research and Leiden University Medical Center, The Netherlands
| | - Maarten C Morsink
- Division of Medical Pharmacology, Leiden/Amsterdam Center for Drug Research and Leiden University Medical Center, The Netherlands
| | - Srebrena Atanasova
- Department of Clinical Chemistry, Georg-August University, Goettingen, Germany
| | - Victor W Armstrong
- Department of Clinical Chemistry, Georg-August University, Goettingen, Germany
| | - Hans Zischler
- Institute of Anthropology, University of Mainz, Mainz, Germany
| | | | - Bas E Dutilh
- Center for Molecular and Biomolecular Informatics/Nijmegen Center for Molecular Life Sciences, Radboud University Nijmegen, Geert Grooteplein 28, 6525 GA, Nijmegen, The Netherlands
| | - Martijn A Huynen
- Center for Molecular and Biomolecular Informatics/Nijmegen Center for Molecular Life Sciences, Radboud University Nijmegen, Geert Grooteplein 28, 6525 GA, Nijmegen, The Netherlands
| | - Brigitte Waegele
- Institute for Bioinformatics, GSF – National Research Center for Environment and Health, Ingolstaedter Landstrasse 1, Germany
| | - Andreas Ruepp
- Institute for Bioinformatics, GSF – National Research Center for Environment and Health, Ingolstaedter Landstrasse 1, Germany
| | - E Ronald de Kloet
- Division of Medical Pharmacology, Leiden/Amsterdam Center for Drug Research and Leiden University Medical Center, The Netherlands
| | - Eberhard Fuchs
- Clinical Neurobiology Laboratory, German Primate Center, Göttingen, Germany
| |
Collapse
|