1
|
Woodward AA, Urbanowicz RJ, Naj AC, Moore JH. Genetic heterogeneity: Challenges, impacts, and methods through an associative lens. Genet Epidemiol 2022; 46:555-571. [PMID: 35924480 PMCID: PMC9669229 DOI: 10.1002/gepi.22497] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2022] [Revised: 07/06/2022] [Accepted: 07/19/2022] [Indexed: 01/07/2023]
Abstract
Genetic heterogeneity describes the occurrence of the same or similar phenotypes through different genetic mechanisms in different individuals. Robustly characterizing and accounting for genetic heterogeneity is crucial to pursuing the goals of precision medicine, for discovering novel disease biomarkers, and for identifying targets for treatments. Failure to account for genetic heterogeneity may lead to missed associations and incorrect inferences. Thus, it is critical to review the impact of genetic heterogeneity on the design and analysis of population level genetic studies, aspects that are often overlooked in the literature. In this review, we first contextualize our approach to genetic heterogeneity by proposing a high-level categorization of heterogeneity into "feature," "outcome," and "associative" heterogeneity, drawing on perspectives from epidemiology and machine learning to illustrate distinctions between them. We highlight the unique nature of genetic heterogeneity as a heterogeneous pattern of association that warrants specific methodological considerations. We then focus on the challenges that preclude effective detection and characterization of genetic heterogeneity across a variety of epidemiological contexts. Finally, we discuss systems heterogeneity as an integrated approach to using genetic and other high-dimensional multi-omic data in complex disease research.
Collapse
Affiliation(s)
- Alexa A. Woodward
- Department of Biostatistics, Epidemiology and InformaticsUniversity of PennsylvaniaPhiladelphiaPennsylvaniaUSA
| | - Ryan J. Urbanowicz
- Department of Computational BiomedicineCedars‐Sinai Medical CenterLos AngelesCaliforniaUSA
| | - Adam C. Naj
- Department of Biostatistics, Epidemiology and InformaticsUniversity of PennsylvaniaPhiladelphiaPennsylvaniaUSA
| | - Jason H. Moore
- Department of Computational BiomedicineCedars‐Sinai Medical CenterLos AngelesCaliforniaUSA
| |
Collapse
|
2
|
Agelink van Rentergem JA, Deserno MK, Geurts HM. Validation strategies for subtypes in psychiatry: A systematic review of research on autism spectrum disorder. Clin Psychol Rev 2021; 87:102033. [PMID: 33962352 DOI: 10.1016/j.cpr.2021.102033] [Citation(s) in RCA: 42] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2020] [Revised: 02/14/2021] [Accepted: 04/14/2021] [Indexed: 12/11/2022]
Abstract
Heterogeneity within autism spectrum disorder (ASD) is recognized as a challenge to both biological and psychological research, as well as clinical practice. To reduce unexplained heterogeneity, subtyping techniques are often used to establish more homogeneous subtypes based on metrics of similarity and dissimilarity between people. We review the ASD literature to create a systematic overview of the subtyping procedures and subtype validation techniques that are used in this field. We conducted a systematic review of 156 articles (2001-June 2020) that subtyped participants (range N of studies = 17-20,658), of which some or all had an ASD diagnosis. We found a large diversity in (parametric and non-parametric) methods and (biological, psychological, demographic) variables used to establish subtypes. The majority of studies validated their subtype results using variables that were measured concurrently, but were not included in the subtyping procedure. Other investigations into subtypes' validity were rarer. In order to advance clinical research and the theoretical and clinical usefulness of identified subtypes, we propose a structured approach and present the SUbtyping VAlidation Checklist (SUVAC), a checklist for validating subtyping results.
Collapse
Affiliation(s)
- Joost A Agelink van Rentergem
- Department of Psychology, University of Amsterdam, Amsterdam, the Netherlands; Dutch Autism & ADHD Research Center, the Netherlands.
| | - Marie K Deserno
- Department of Psychology, University of Amsterdam, Amsterdam, the Netherlands; Dutch Autism & ADHD Research Center, the Netherlands
| | - Hilde M Geurts
- Department of Psychology, University of Amsterdam, Amsterdam, the Netherlands; Dutch Autism & ADHD Research Center, the Netherlands; Dr. Leo Kannerhuis, the Netherlands
| |
Collapse
|
3
|
Reilly J, Gallagher L, Chen JL, Leader G, Shen S. Bio-collections in autism research. Mol Autism 2017; 8:34. [PMID: 28702161 PMCID: PMC5504648 DOI: 10.1186/s13229-017-0154-8] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2017] [Accepted: 06/23/2017] [Indexed: 01/06/2023] Open
Abstract
Autism spectrum disorder (ASD) is a group of complex neurodevelopmental disorders with diverse clinical manifestations and symptoms. In the last 10 years, there have been significant advances in understanding the genetic basis for ASD, critically supported through the establishment of ASD bio-collections and application in research. Here, we summarise a selection of major ASD bio-collections and their associated findings. Collectively, these include mapping ASD candidate genes, assessing the nature and frequency of gene mutations and their association with ASD clinical subgroups, insights into related molecular pathways such as the synapses, chromatin remodelling, transcription and ASD-related brain regions. We also briefly review emerging studies on the use of induced pluripotent stem cells (iPSCs) to potentially model ASD in culture. These provide deeper insight into ASD progression during development and could generate human cell models for drug screening. Finally, we provide perspectives concerning the utilities of ASD bio-collections and limitations, and highlight considerations in setting up a new bio-collection for ASD research.
Collapse
Affiliation(s)
- Jamie Reilly
- Regenerative Medicine Institute, School of Medicine, BioMedical Sciences Building, National University of Ireland (NUI), Galway, Ireland
| | - Louise Gallagher
- Trinity Translational Medicine Institute and Department of Psychiatry, Trinity Centre for Health Sciences, St. James Hospital Street, Dublin 8, Ireland
| | - June L Chen
- Department of Special Education, Faculty of Education, East China Normal University, Shanghai, 200062 China
| | - Geraldine Leader
- Irish Centre for Autism and Neurodevelopmental Research (ICAN), Department of Psychology, National University of Ireland Galway, University Road, Galway, Ireland
| | - Sanbing Shen
- Regenerative Medicine Institute, School of Medicine, BioMedical Sciences Building, National University of Ireland (NUI), Galway, Ireland
| |
Collapse
|
4
|
Beaulieu-Jones BK, Greene CS. Semi-supervised learning of the electronic health record for phenotype stratification. J Biomed Inform 2016; 64:168-178. [PMID: 27744022 DOI: 10.1016/j.jbi.2016.10.007] [Citation(s) in RCA: 79] [Impact Index Per Article: 9.9] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2016] [Revised: 10/05/2016] [Accepted: 10/08/2016] [Indexed: 12/12/2022]
Abstract
Patient interactions with health care providers result in entries to electronic health records (EHRs). EHRs were built for clinical and billing purposes but contain many data points about an individual. Mining these records provides opportunities to extract electronic phenotypes, which can be paired with genetic data to identify genes underlying common human diseases. This task remains challenging: high quality phenotyping is costly and requires physician review; many fields in the records are sparsely filled; and our definitions of diseases are continuing to improve over time. Here we develop and evaluate a semi-supervised learning method for EHR phenotype extraction using denoising autoencoders for phenotype stratification. By combining denoising autoencoders with random forests we find classification improvements across multiple simulation models and improved survival prediction in ALS clinical trial data. This is particularly evident in cases where only a small number of patients have high quality phenotypes, a common scenario in EHR-based research. Denoising autoencoders perform dimensionality reduction enabling visualization and clustering for the discovery of new subtypes of disease. This method represents a promising approach to clarify disease subtypes and improve genotype-phenotype association studies that leverage EHRs.
Collapse
Affiliation(s)
- Brett K Beaulieu-Jones
- Graduate Group in Genomics and Computational Biology, Perelman School of Medicine, University of Pennsylvania, United States; Institute for Biomedical Informatics, Perelman School of Medicine, University of Pennsylvania, United States.
| | - Casey S Greene
- Institute for Biomedical Informatics, Perelman School of Medicine, University of Pennsylvania, United States; Department of Systems Pharmacology and Translational Therapeutics, Perelman School of Medicine, University of Pennsylvania, United States; Institute for Translational Medicine and Therapeutics, University of Pennsylvania, Perelman School of Medicine, University of Pennsylvania, United States.
| | | |
Collapse
|
5
|
Swanson SA, Lindenberg K, Bauer S, Crosby RD. A Monte Carlo investigation of factors influencing latent class analysis: an application to eating disorder research. Int J Eat Disord 2012; 45:677-84. [PMID: 21882219 DOI: 10.1002/eat.20958] [Citation(s) in RCA: 66] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 07/09/2011] [Indexed: 11/11/2022]
Abstract
OBJECTIVE Latent class analysis (LCA) has frequently been used to identify qualitatively distinct phenotypes of disordered eating. However, little consideration has been given to methodological factors that may influence the accuracy of these results. METHOD Monte Carlo simulations were used to evaluate methodological factors that may influence the accuracy of LCA under scenarios similar to those seen in previous eating disorder research. RESULTS Under these scenarios, the aBIC provided the best overall performance as an information criterion, requiring sample sizes of 300 in both balanced and unbalanced structures to achieve accuracy proportions of at least 80%. The BIC and cAIC required larger samples to achieve comparable performance, while the AIC performed poorly universally in comparison. Accuracy generally was lower with unbalanced classes, fewer indicators, greater or nonrandom missing data, conditional independence assumption violations, and lower base rates of indicator endorsement. DISCUSSION These results provide critical information for interpreting previous LCA research and designing future classification studies.
Collapse
Affiliation(s)
- Sonja A Swanson
- Harvard School of Public Health, Department of Epidemiology, Boston, MA 02115, USA.
| | | | | | | |
Collapse
|
6
|
Symptom dimensions as alternative phenotypes to address genetic heterogeneity in schizophrenia and bipolar disorder. Eur J Hum Genet 2012; 20:1182-8. [PMID: 22535187 DOI: 10.1038/ejhg.2012.67] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022] Open
Abstract
This study introduces a novel way to use the lifetime ratings of symptoms of psychosis, mania and depression in genetic linkage analysis of schizophrenia (SZ) and bipolar disorder (BP). It suggests using a latent class model developed for family data to define more homogeneous symptom subtypes that are influenced by a smaller number of genes that will thus be more easily detectable. In a two-step approach, we proposed: (i) to form homogeneous clusters of subjects based on the symptom dimensions and (ii) to use the information from these homogeneous clusters in linkage analysis. This framework was applied to a unique SZ and BP sample composed of 1278 subjects from 48 large kindreds from the Eastern Quebec population. The results suggest that our strategy has the power to increase linkage signals previously obtained using the diagnosis as phenotype and allows for a better characterization of the linkage signals. This is the case for a linkage signal, which we formerly obtained in chromosome 13q and enhanced using the dimension mania. The analysis also suggests that the methods may detect new linkage signals not previously uncovered by using diagnosis alone, as in chromosomes 2q (delusion), 15q (bizarre behavior), 7p (anhedonia) and 9q (delusion). In the case of the 15q and 2q region, the results coincide with linkage signals detected in other studies. Our results support the view that dissecting phenotypic heterogeneity by modeling symptom dimensions may provide new insights into the genetics of SZ and BP.
Collapse
|
7
|
Common variants in the TPH2 promoter confer susceptibility to paranoid schizophrenia. J Mol Neurosci 2012; 47:465-9. [PMID: 22392150 DOI: 10.1007/s12031-012-9725-5] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2012] [Accepted: 02/10/2012] [Indexed: 12/31/2022]
Abstract
Serotonergic system-related genes may be good candidates in investigating the genetic basis of schizophrenia. Our previous study suggested that promoter region of tryptophan hydroxylase 2 gene (TPH2) may confer the susceptibility to paranoid schizophrenia. In this study, we investigated whether common variants within TPH2 promoter may predispose to paranoid schizophrenia in Han Chinese. A total of 509 patients who met DSM-IV criteria for paranoid schizophrenia and 510 matched healthy controls were recruited for this study. Five polymorphisms within TPH2 promoter region were tested. No statistically significant differences were found in allele or genotype frequencies between schizophrenic patients and healthy controls. The frequency of the rs4448731T-rs6582071A-rs7963803A-rs4570625T-rs11178997A haplotype was significantly higher in cases compared to the controls (P = 0.003; OR = 1.49; 95% CI, 1.15-1.95). Our results suggest that the common variants within TPH2 promoter are associated with paranoid schizophrenia in Han Chinese. Further studies in larger samples are warranted to elucidate the role of TPH2 in the etiology of paranoid schizophrenia.
Collapse
|
8
|
Bureau A, Croteau J, Tayeb A, Mérette C, Labbe A. Latent class model with familial dependence to address heterogeneity in complex diseases: adapting the approach to family-based association studies. Genet Epidemiol 2011; 35:182-9. [PMID: 21308764 DOI: 10.1002/gepi.20566] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2010] [Revised: 11/29/2010] [Accepted: 01/04/2011] [Indexed: 11/10/2022]
Abstract
Clinical diagnoses of complex diseases may often encompass multiple genetically heterogeneous disorders. One way of dissecting this heterogeneity is to apply latent class (LC) analysis to measurements related to the diagnosis, such as detailed symptoms, to define more homogeneous disease sub-types, influenced by a smaller number of genes that will thus be more easily detectable. We have previously developed a LC model allowing dependence between the latent disease class status of relatives within families. We have also proposed a strategy to incorporate the posterior probability of class membership of each subject in parametric linkage analysis, which is not directly transferable to genetic association methods. Under the framework of family-based association tests (FBAT), we now propose to make the contribution of an affected subject to the FBAT statistic proportional to his or her posterior class membership probability. Simulations showed a modest but robust power advantage compared to simply assigning each subject to his or her most probable class, and important power gains over the analysis of the disease diagnosis without LC modeling under certain scenarios. The use of LC analysis with FBAT is illustrated using autism spectrum disorder (ASD) symptoms on families from the Autism Genetics Research Exchange, where we examined eight regions previously associated to autism in this sample. The analysis using the posterior probability of membership to an LC detected an association in the JARID2 gene as significant as that for ASD (P = 3 × 10(-5)) but with a larger effect size (odds ratio = 2.17 vs. 1.55).
Collapse
Affiliation(s)
- Alexandre Bureau
- Centre de recherche Université Laval Robert-Giffard, Quebec City, Quebec, Canada.
| | | | | | | | | |
Collapse
|
9
|
Novel method for combined linkage and genome-wide association analysis finds evidence of distinct genetic architecture for two subtypes of autism. J Neurodev Disord 2011; 3:113-23. [PMID: 21484201 PMCID: PMC3105232 DOI: 10.1007/s11689-011-9072-9] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/20/2010] [Accepted: 01/04/2011] [Indexed: 11/26/2022] Open
Abstract
The Autism Genome Project has assembled two large datasets originally designed for linkage analysis and genome-wide association analysis, respectively: 1,069 multiplex families genotyped on the Affymetrix 10 K platform, and 1,129 autism trios genotyped on the Illumina 1 M platform. We set out to exploit this unique pair of resources by analyzing the combined data with a novel statistical method, based on the PPL statistical framework, simultaneously searching for linkage and association to loci involved in autism spectrum disorders (ASD). Our analysis also allowed for potential differences in genetic architecture for ASD in the presence or absence of lower IQ, an important clinical indicator of ASD subtypes. We found strong evidence of multiple linked loci; however, association evidence implicating specific genes was low even under the linkage peaks. Distinct loci were found in the lower IQ families, and these families showed stronger and more numerous linkage peaks, while the normal IQ group yielded the strongest association evidence. It appears that presence/absence of lower IQ (LIQ) demarcates more genetically homogeneous subgroups of ASD patients, with not just different sets of loci acting in the two groups, but possibly distinct genetic architecture between them, such that the LIQ group involves more major gene effects (amenable to linkage mapping), while the normal IQ group potentially involves more common alleles with lower penetrances. The possibility of distinct genetic architecture across subtypes of ASD has implications for further research and perhaps for research approaches to other complex disorders as well.
Collapse
|
10
|
Tayeb A, Labbe A, Bureau A, Mérette C. Solving genetic heterogeneity in extended families by identifying sub-types of complex diseases. Comput Stat 2011. [DOI: 10.1007/s00180-010-0224-2] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
11
|
Piggot J, Shirinyan D, Shemmassian S, Vazirian S, Alarcón M. Neural systems approaches to the neurogenetics of autism spectrum disorders. Neuroscience 2009; 164:247-56. [PMID: 19482063 DOI: 10.1016/j.neuroscience.2009.05.054] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2008] [Revised: 05/08/2009] [Accepted: 05/22/2009] [Indexed: 10/20/2022]
Abstract
Autism is generally accepted as the most genetic of all the developmental neuropsychiatric syndromes. However, despite more than several decades of genetic study, the etiology of autism remains unknown, largely due to the genetic and phenotypic diversity, or heterogeneity, of this disorder, and the lack of biologically based classification systems. At the same time, in the neuroimaging literature, the body of research identifying candidate neural systems underlying aspects of autistic impairment has grown considerably, fueled by the advent of technologies such as functional magnetic resonance imaging (fMRI). Yet the findings from these neuroimaging studies have not been incorporated to inform the collection of samples for genetic studies of autism, which are predominantly based on a diagnosis of the disorder. This article presents a review of the genetics of autism and describes the genetic approaches that have been applied, including the phenotypic strategies that have been used to address heterogeneity and optimize the power of these genetic studies. With the increasing recognition that there may be different "autisms" (Geschwind and Levitt, 2007) with unique neural mechanisms, it is argued that neural systems research, using technologies such as fMRI, currently allows for the identification of more biologically informative phenotypes for genetic studies of autism and is positioned to identify informative neuroimaging markers for "neurogenetic" studies of the disorder. To illustrate this, we describe several candidate neural systems for the social communication impairment seen in autism, and the characteristic behavioral and physiological manifestations associated with these that could be incorporated into phenotypic assessments.
Collapse
Affiliation(s)
- J Piggot
- Division of Child and Adolescent Psychiatry, Department of Psychiatry and Biobehavioral Sciences, University of California, Los Angeles, CA 90095-1769, USA.
| | | | | | | | | |
Collapse
|