1
|
Boutry S, Helaers R, Lenaerts T, Vikkula M. Rare variant association on unrelated individuals in case-control studies using aggregation tests: existing methods and current limitations. Brief Bioinform 2023; 24:bbad412. [PMID: 37974506 DOI: 10.1093/bib/bbad412] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2023] [Revised: 10/14/2023] [Accepted: 10/28/2023] [Indexed: 11/19/2023] Open
Abstract
Over the past years, progress made in next-generation sequencing technologies and bioinformatics have sparked a surge in association studies. Especially, genome-wide association studies (GWASs) have demonstrated their effectiveness in identifying disease associations with common genetic variants. Yet, rare variants can contribute to additional disease risk or trait heterogeneity. Because GWASs are underpowered for detecting association with such variants, numerous statistical methods have been recently proposed. Aggregation tests collapse multiple rare variants within a genetic region (e.g. gene, gene set, genomic loci) to test for association. An increasing number of studies using such methods successfully identified trait-associated rare variants and led to a better understanding of the underlying disease mechanism. In this review, we compare existing aggregation tests, their statistical features and scope of application, splitting them into the five classical classes: burden, adaptive burden, variance-component, omnibus and other. Finally, we describe some limitations of current aggregation tests, highlighting potential direction for further investigations.
Collapse
Affiliation(s)
- Simon Boutry
- Human Molecular Genetics, de Duve Institute, University of Louvain, Avenue Hippocrate 74 (+5) bte B1.74.06, 1200 Brussels, Belgium
- Interuniversity Institute of Bioinformatics in Brussels, Université Libre de Bruxelles-Vrije Universiteit Brussels, 1050 Brussels, Belgium
| | - Raphaël Helaers
- Human Molecular Genetics, de Duve Institute, University of Louvain, Avenue Hippocrate 74 (+5) bte B1.74.06, 1200 Brussels, Belgium
| | - Tom Lenaerts
- Interuniversity Institute of Bioinformatics in Brussels, Université Libre de Bruxelles-Vrije Universiteit Brussels, 1050 Brussels, Belgium
- Machine Learning Group, Université Libre de Bruxelles, 1050 Brussels, Belgium
- Artificial Intelligence laboratory, Vrije Universiteit Brussel, 1050 Brussels, Belgium
| | - Miikka Vikkula
- Human Molecular Genetics, de Duve Institute, University of Louvain, Avenue Hippocrate 74 (+5) bte B1.74.06, 1200 Brussels, Belgium
- WELBIO department, WEL Research Institute, avenue Pasteur, 6, 1300 Wavre, Belgium
| |
Collapse
|
2
|
Boutry S, Helaers R, Lenaerts T, Vikkula M. Excalibur: A new ensemble method based on an optimal combination of aggregation tests for rare-variant association testing for sequencing data. PLoS Comput Biol 2023; 19:e1011488. [PMID: 37708232 PMCID: PMC10522036 DOI: 10.1371/journal.pcbi.1011488] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2023] [Revised: 09/26/2023] [Accepted: 09/04/2023] [Indexed: 09/16/2023] Open
Abstract
The development of high-throughput next-generation sequencing technologies and large-scale genetic association studies produced numerous advances in the biostatistics field. Various aggregation tests, i.e. statistical methods that analyze associations of a trait with multiple markers within a genomic region, have produced a variety of novel discoveries. Notwithstanding their usefulness, there is no single test that fits all needs, each suffering from specific drawbacks. Selecting the right aggregation test, while considering an unknown underlying genetic model of the disease, remains an important challenge. Here we propose a new ensemble method, called Excalibur, based on an optimal combination of 36 aggregation tests created after an in-depth study of the limitations of each test and their impact on the quality of result. Our findings demonstrate the ability of our method to control type I error and illustrate that it offers the best average power across all scenarios. The proposed method allows for novel advances in Whole Exome/Genome sequencing association studies, able to handle a wide range of association models, providing researchers with an optimal aggregation analysis for the genetic regions of interest.
Collapse
Affiliation(s)
- Simon Boutry
- Human Molecular Genetics, de Duve Institute, University of Louvain, Brussels, Belgium
- Interuniversity Institute of Bioinformatics in Brussels, Université Libre de Bruxelles-Vrije Universiteit Brussels, Brussels, Belgium
| | - Raphaël Helaers
- Human Molecular Genetics, de Duve Institute, University of Louvain, Brussels, Belgium
| | - Tom Lenaerts
- Interuniversity Institute of Bioinformatics in Brussels, Université Libre de Bruxelles-Vrije Universiteit Brussels, Brussels, Belgium
- Machine Learning Group, Université Libre de Bruxelles, Brussels, Belgium
- Artificial Intelligence laboratory, Vrije Universiteit Brussel, Brussels, Belgium
| | - Miikka Vikkula
- Human Molecular Genetics, de Duve Institute, University of Louvain, Brussels, Belgium
- WELBIO department, WEL Research Institute, Wavre, Belgium
| |
Collapse
|
3
|
Fabbri C. Genetics in psychiatry: Methods, clinical applications and future perspectives. PCN REPORTS : PSYCHIATRY AND CLINICAL NEUROSCIENCES 2022; 1:e6. [PMID: 38868637 PMCID: PMC11114394 DOI: 10.1002/pcn5.6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/18/2021] [Revised: 01/18/2022] [Accepted: 03/02/2022] [Indexed: 06/14/2024]
Abstract
Psychiatric disorders and related traits have a demonstrated genetic component, with heritability estimated by twin studies generally between 80% and 40%. Their pathogenesis is complex and multi-determined: environmental factors interact with a polygenic architecture, making difficult the development of models able to stratify patients or predict mental health outcomes. Despite this difficult challenge, relevant progress has been made in the field of psychiatric genetics in recent years. This review aims to present the main current methods in psychiatric genetics, their output, limitations, clinical applications, and possible future developments. Genome-wide association studies (GWASs) performed in increasingly large samples have led to the identification of replicated genetic loci associated with the risk of major psychiatric disorders, including schizophrenia and mood disorders. Statistical and biological approaches have been developed to improve our understanding of the etiopathogenetic mechanisms behind genome-wide significant associations, as well as for estimating the cumulative effect of risk variants at the individual level and the genetic overlap between different disorders, as pleiotropy is the rule rather than the exception. Clinical applications are available in the pharmacogenetics field. The main issues that remain to be addressed include improving ethnic diversity in genetic studies and the optimization of statistical power through methodological improvements, such as the definition of dimensional phenotypes with specific biological correlates and the integration of different types of omics data.
Collapse
Affiliation(s)
- Chiara Fabbri
- Department of Biomedical and Neuromotor SciencesUniversity of BolognaBolognaItaly
- Institute of Psychiatry, Psychology & NeuroscienceKing's College LondonLondonUK
| |
Collapse
|
4
|
Bocher O, Génin E. Rare variant association testing in the non-coding genome. Hum Genet 2020; 139:1345-1362. [PMID: 32500240 DOI: 10.1007/s00439-020-02190-y] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2020] [Accepted: 05/29/2020] [Indexed: 12/25/2022]
Abstract
The development of next-generation sequencing technologies has opened-up some new possibilities to explore the contribution of genetic variants to human diseases and in particular that of rare variants. Statistical methods have been developed to test for association with rare variants that require the definition of testing units and, in these testing units, the selection of qualifying variants to include in the test. In the coding regions of the genome, testing units are usually the different genes and qualifying variants are selected based on their functional effects on the encoded proteins. Extending these tests to the non-coding regions of the genome is challenging. Testing units are difficult to define as the non-coding genome organisation is still rather unknown. Qualifying variants are difficult to select as the functional impact of non-coding variants on gene expression is hard to predict. These difficulties could explain why very few investigators so far have analysed the non-coding parts of their whole genome sequencing data. These non-coding parts yet represent the vast majority of the genome and some studies suggest that they could play a major role in disease susceptibility. In this review, we discuss recent experimental and statistical developments to gain knowledge on the non-coding genome and how this knowledge could be used to include rare non-coding variants in association tests. We describe the few studies that have considered variants from the non-coding genome in association tests and how they managed to define testing units and select qualifying variants.
Collapse
Affiliation(s)
- Ozvan Bocher
- Génétique, Génomique Fonctionnelle Et Biotechnologies, Faculté de Médecine, Univ Brest, Inserm, Inserm UMR1078, Bâtiment E-IBRBS 2ieme étage, 22 avenue Camille Desmoulins, 29238, Brest Cedex 3, France.
| | - Emmanuelle Génin
- Génétique, Génomique Fonctionnelle Et Biotechnologies, Faculté de Médecine, Univ Brest, Inserm, Inserm UMR1078, Bâtiment E-IBRBS 2ieme étage, 22 avenue Camille Desmoulins, 29238, Brest Cedex 3, France.
- CHU Brest, Brest, France.
| |
Collapse
|
5
|
Dapas M, Dunaif A. The contribution of rare genetic variants to the pathogenesis of polycystic ovary syndrome. ACTA ACUST UNITED AC 2020; 12:26-32. [PMID: 32440573 DOI: 10.1016/j.coemr.2020.02.011] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
Polycystic ovary syndrome (PCOS) is a highly heritable disorder, but only a small proportion of the heritability can be accounted for by common genetic risk variants identified to date. It is possible that variants with lower allele frequencies that cannot be detected using genome-wide association study arrays contribute to PCOS. Here, we discuss the challenges inherent to studying rare genetic variants in complex disease and review several recent studies that have used DNA sequencing techniques to investigate whether rare variants play a role in PCOS pathogenesis. We evaluate these findings in the context of the latest literature in PCOS and complex disease genetics.
Collapse
|
6
|
Kottyan LC, Parameswaran S, Weirauch MT, Rothenberg ME, Martin LJ. The genetic etiology of eosinophilic esophagitis. J Allergy Clin Immunol 2020; 145:9-15. [PMID: 31910986 PMCID: PMC6984394 DOI: 10.1016/j.jaci.2019.11.013] [Citation(s) in RCA: 45] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2019] [Revised: 11/15/2019] [Accepted: 11/15/2019] [Indexed: 12/13/2022]
Abstract
Eosinophilic esophagitis (EoE) is a chronic allergic disease associated with marked mucosal eosinophil accumulation. Multiple studies have reported a strong familial component to EoE, with the presence of EoE increasing the risk for other family members with EoE. Epidemiologic studies support an important role for environmental risk factors as modulators of genetic risk. In a small percentage of cases, including patients who have Mendelian diseases with co-occurrent EoE, rare genetic variation with large effect sizes could mediate EoE and explain multigenerational incidence in families. Common genetic risk variants mediate genetic risk for the majority of patients with EoE. Across the 31 reported independent EoE risk loci (P < 10-5), most of the EoE risk variants are located in between genes (36.7%) or within the introns of genes (42.4%). Although some variants do change the amino acid sequence of genes (2.2%), only 3 of the 31 EoE risk loci harbor an amino acid-changing variant. Thus most EoE risk loci are outside of the coding regions of genes, suggesting a key role for gene regulation in patients with EoE, which is consistent with most other complex diseases.
Collapse
Affiliation(s)
- Leah C Kottyan
- Department of Pediatrics, University of Cincinnati, College of Medicine, Cincinnati Children's Hospital Medical Center, Cincinnati, Ohio; Division of Allergy and Immunology, Cincinnati Children's Hospital Medical Center, Cincinnati, Ohio; Center for Autoimmune Genomics and Etiology, Cincinnati Children's Hospital Medical Center, Cincinnati, Ohio.
| | - Sreeja Parameswaran
- Center for Autoimmune Genomics and Etiology, Cincinnati Children's Hospital Medical Center, Cincinnati, Ohio
| | - Matthew T Weirauch
- Department of Pediatrics, University of Cincinnati, College of Medicine, Cincinnati Children's Hospital Medical Center, Cincinnati, Ohio; Center for Autoimmune Genomics and Etiology, Cincinnati Children's Hospital Medical Center, Cincinnati, Ohio; Divisions of Biomedical Informatics and Developmental Biology, Cincinnati Children's Hospital Medical Center, Cincinnati, Ohio
| | - Marc E Rothenberg
- Department of Pediatrics, University of Cincinnati, College of Medicine, Cincinnati Children's Hospital Medical Center, Cincinnati, Ohio; Division of Allergy and Immunology, Cincinnati Children's Hospital Medical Center, Cincinnati, Ohio
| | - Lisa J Martin
- Department of Pediatrics, University of Cincinnati, College of Medicine, Cincinnati Children's Hospital Medical Center, Cincinnati, Ohio; Division of Human Genetics, Cincinnati Children's Hospital Medical Center, Cincinnati, Ohio
| |
Collapse
|
7
|
Broekema RV, Bakker OB, Jonkers IH. A practical view of fine-mapping and gene prioritization in the post-genome-wide association era. Open Biol 2020; 10:190221. [PMID: 31937202 PMCID: PMC7014684 DOI: 10.1098/rsob.190221] [Citation(s) in RCA: 67] [Impact Index Per Article: 16.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2019] [Accepted: 12/05/2019] [Indexed: 12/17/2022] Open
Abstract
Over the past 15 years, genome-wide association studies (GWASs) have enabled the systematic identification of genetic loci associated with traits and diseases. However, due to resolution issues and methodological limitations, the true causal variants and genes associated with traits remain difficult to identify. In this post-GWAS era, many biological and computational fine-mapping approaches now aim to solve these issues. Here, we review fine-mapping and gene prioritization approaches that, when combined, will improve the understanding of the underlying mechanisms of complex traits and diseases. Fine-mapping of genetic variants has become increasingly sophisticated: initially, variants were simply overlapped with functional elements, but now the impact of variants on regulatory activity and direct variant-gene 3D interactions can be identified. Moreover, gene manipulation by CRISPR/Cas9, the identification of expression quantitative trait loci and the use of co-expression networks have all increased our understanding of the genes and pathways affected by GWAS loci. However, despite this progress, limitations including the lack of cell-type- and disease-specific data and the ever-increasing complexity of polygenic models of traits pose serious challenges. Indeed, the combination of fine-mapping and gene prioritization by statistical, functional and population-based strategies will be necessary to truly understand how GWAS loci contribute to complex traits and diseases.
Collapse
Affiliation(s)
| | | | - I. H. Jonkers
- Department of Genetics, University Medical Center Groningen, University of Groningen, Groningen, The Netherlands
| |
Collapse
|
8
|
Flück CE, Audí L, Fernández-Cancio M, Sauter KS, Martinez de LaPiscina I, Castaño L, Esteva I, Camats N. Broad Phenotypes of Disorders/Differences of Sex Development in MAMLD1 Patients Through Oligogenic Disease. Front Genet 2019; 10:746. [PMID: 31555317 PMCID: PMC6726737 DOI: 10.3389/fgene.2019.00746] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2019] [Accepted: 07/16/2019] [Indexed: 02/06/2023] Open
Abstract
Disorders/differences of sex development (DSD) are the result of a discordance between chromosomal, gonadal, and genital sex. DSD may be due to mutations in any of the genes involved in sex determination and development in general, as well as gonadal and/or genital development specifically. MAMLD1 is one of the recognized DSD genes. However, its role is controversial as some MAMLD1 variants are present in normal individuals, several MAMLD1 mutations have wild-type activity in functional studies, and the Mamld1-knockout male mouse presents with normal genitalia and reproduction. We previously tested nine MAMLD1 variants detected in nine 46,XY DSD patients with broad phenotypes for their functional activity, but none of the mutants, except truncated L210X, had diminished transcriptional activity on known target promoters CYP17A1 and HES3. In addition, protein expression of MAMLD1 variants was similar to wild-type, except for the truncated L210X. We hypothesized that MAMLD1 variants may not be sufficient to explain the phenotype in 46,XY DSD individuals, and that further genetic studies should be performed to search for additional hits explaining the broad phenotypes. We therefore performed whole exome sequencing (WES) in seven of these 46,XY patients with DSD and in one 46,XX patient with ovarian insufficiency, who all carried MAMLD1 variants. WES data were filtered by an algorithm including disease-tailored lists of MAMLD1-related and DSD-related genes. Fifty-five potentially deleterious variants in 41 genes were identified; 16/55 variants were reported in genes in association with hypospadias, 8/55 with cryptorchidism, 5/55 with micropenis, and 13/55 were described in relation with female sex development. Patients carried 1-16 variants in 1-16 genes together with their MAMLD1 variation. Network analysis of the identified genes revealed that 23 genes presented gene/protein interactions with MAMLD1. Thus, our study shows that the broad phenotypes of individual DSD might involve multiple genetic variations contributing towards the complex network of sexual development.
Collapse
Affiliation(s)
- Christa E Flück
- Pediatric Endocrinology, Diabetology and Metabolism, Department of Pediatrics and Department of BioMedical Research, Bern University Hospital and University of Bern, Bern, Switzerland
| | - Laura Audí
- Growth and Development Research Unit, Vall d'Hebron Research Institute (VHIR), Center for Biomedical Research on Rare Diseases (CIBERER), Instituto de Salud Carlos III, Barcelona, Spain
| | - Mónica Fernández-Cancio
- Growth and Development Research Unit, Vall d'Hebron Research Institute (VHIR), Center for Biomedical Research on Rare Diseases (CIBERER), Instituto de Salud Carlos III, Barcelona, Spain
| | - Kay-Sara Sauter
- Pediatric Endocrinology, Diabetology and Metabolism, Department of Pediatrics and Department of BioMedical Research, Bern University Hospital and University of Bern, Bern, Switzerland
| | - Idoia Martinez de LaPiscina
- Endocrinology and Diabetes Research Group, BioCruces Bizkaia Health Research Institute, Cruces University Hospital, CIBERDEM, CIBERER, University of the Basque Country (UPV-EHU), Barakaldo, Spain
| | - Luis Castaño
- Pediatric Endocrinology Section, Cruces University Hospital, Endocrinology and Diabetes Research Group, BioCruces Bizkaia Health Research Institute, CIBERDEM, CIBERER, University of the Basque Country (UPV-EHU), Barakaldo, Spain
| | - Isabel Esteva
- Endocrinology Section, Gender Identity Unit, Regional University Hospital of Malaga, Málaga, Spain
| | - Núria Camats
- Growth and Development Research Unit, Vall d'Hebron Research Institute (VHIR), Center for Biomedical Research on Rare Diseases (CIBERER), Instituto de Salud Carlos III, Barcelona, Spain
| |
Collapse
|