1
|
Carlin DE, Larsen SJ, Sirupurapu V, Cho MH, Silverman EK, Baumbach J, Ideker T. Hierarchical association of COPD to principal genetic components of biological systems. PLoS One 2023; 18:e0286064. [PMID: 37228113 DOI: 10.1371/journal.pone.0286064] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2021] [Accepted: 05/08/2023] [Indexed: 05/27/2023] Open
Abstract
Many disease-causing genetic variants converge on common biological functions and pathways. Precisely how to incorporate pathway knowledge in genetic association studies is not yet clear, however. Previous approaches employ a two-step approach, in which a regular association test is first performed to identify variants associated with the disease phenotype, followed by a test for functional enrichment within the genes implicated by those variants. Here we introduce a concise one-step approach, Hierarchical Genetic Analysis (Higana), which directly computes phenotype associations against each function in the large hierarchy of biological functions documented by the Gene Ontology. Using this approach, we identify risk genes and functions for Chronic Obstructive Pulmonary Disease (COPD), highlighting microtubule transport, muscle adaptation, and nicotine receptor signaling pathways. Microtubule transport has not been previously linked to COPD, as it integrates genetic variants spread over numerous genes. All associations validate strongly in a second COPD cohort.
Collapse
Affiliation(s)
- Daniel E Carlin
- Department of Medicine, Division of Genetics, University of California San Diego, La Jolla, CA, United States of America
| | | | - Vikram Sirupurapu
- Department of Medicine, Division of Genetics, University of California San Diego, La Jolla, CA, United States of America
| | - Michael H Cho
- Channing Division of Network Medicine, Brigham and Women's Hospital, Boston, MA, United States of America
| | - Edwin K Silverman
- Channing Division of Network Medicine, Brigham and Women's Hospital, Boston, MA, United States of America
| | - Jan Baumbach
- Department of Computational Systems Biology, University of Hamburg, Hamburg, Germany
| | - Trey Ideker
- Department of Medicine, Division of Genetics, University of California San Diego, La Jolla, CA, United States of America
| |
Collapse
|
2
|
Hamdan S, Love BC, von Polier GG, Weis S, Schwender H, Eickhoff SB, Patil KR. Confound-leakage: confound removal in machine learning leads to leakage. Gigascience 2022; 12:giad071. [PMID: 37776368 PMCID: PMC10541796 DOI: 10.1093/gigascience/giad071] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2023] [Revised: 06/01/2023] [Accepted: 08/17/2023] [Indexed: 10/02/2023] Open
Abstract
BACKGROUND Machine learning (ML) approaches are a crucial component of modern data analysis in many fields, including epidemiology and medicine. Nonlinear ML methods often achieve accurate predictions, for instance, in personalized medicine, as they are capable of modeling complex relationships between features and the target. Problematically, ML models and their predictions can be biased by confounding information present in the features. To remove this spurious signal, researchers often employ featurewise linear confound regression (CR). While this is considered a standard approach for dealing with confounding, possible pitfalls of using CR in ML pipelines are not fully understood. RESULTS We provide new evidence that, contrary to general expectations, linear confound regression can increase the risk of confounding when combined with nonlinear ML approaches. Using a simple framework that uses the target as a confound, we show that information leaked via CR can increase null or moderate effects to near-perfect prediction. By shuffling the features, we provide evidence that this increase is indeed due to confound-leakage and not due to revealing of information. We then demonstrate the danger of confound-leakage in a real-world clinical application where the accuracy of predicting attention-deficit/hyperactivity disorder is overestimated using speech-derived features when using depression as a confound. CONCLUSIONS Mishandling or even amplifying confounding effects when building ML models due to confound-leakage, as shown, can lead to untrustworthy, biased, and unfair predictions. Our expose of the confound-leakage pitfall and provided guidelines for dealing with it can help create more robust and trustworthy ML models.
Collapse
Affiliation(s)
- Sami Hamdan
- Institute of Neuroscience and Medicine, Brain and Behaviour (INM-7), Forschungszentrum Jülich, 52428 Jülich, Germany
- Institute of Systems Neuroscience, Medical Faculty, Heinrich-Heine University Düsseldorf, 40225 Düsseldorf, Germany
| | - Bradley C Love
- Department of Experimental Psychology, University College London, WC1H 0AP London, UK
- The Alan Turing Institute, London NW1 2DB, UK
- European Lab for Learning & Intelligent Systems (ELLIS), WC1E 6BT, London, UK
| | - Georg G von Polier
- Institute of Neuroscience and Medicine, Brain and Behaviour (INM-7), Forschungszentrum Jülich, 52428 Jülich, Germany
- Department of Child and Adolescent Psychiatry, Psychosomatics and Psychotherapy, University Hospital Frankfurt, 60528 Frankfurt, Germany
- Department of Child and Adolescent Psychiatry, Psychosomatics and Psychotherapy, RWTH Aachen University, 52074 Aachen, Germany
| | - Susanne Weis
- Institute of Neuroscience and Medicine, Brain and Behaviour (INM-7), Forschungszentrum Jülich, 52428 Jülich, Germany
- Institute of Systems Neuroscience, Medical Faculty, Heinrich-Heine University Düsseldorf, 40225 Düsseldorf, Germany
| | - Holger Schwender
- Institute of Mathematics, Heinrich-Heine University Düsseldorf, 40225 Düsseldorf, Germany
| | - Simon B Eickhoff
- Institute of Neuroscience and Medicine, Brain and Behaviour (INM-7), Forschungszentrum Jülich, 52428 Jülich, Germany
- Institute of Systems Neuroscience, Medical Faculty, Heinrich-Heine University Düsseldorf, 40225 Düsseldorf, Germany
| | - Kaustubh R Patil
- Institute of Neuroscience and Medicine, Brain and Behaviour (INM-7), Forschungszentrum Jülich, 52428 Jülich, Germany
- Institute of Systems Neuroscience, Medical Faculty, Heinrich-Heine University Düsseldorf, 40225 Düsseldorf, Germany
| |
Collapse
|
3
|
Dobriban E. Consistency of invariance-based randomization tests. Ann Stat 2022. [DOI: 10.1214/22-aos2200] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Affiliation(s)
- Edgar Dobriban
- Department of Statistics and Data Science, The Wharton School, University of Pennsylvania
| |
Collapse
|
4
|
Gerloff C, Konrad K, Bzdok D, Büsing C, Reindl V. Interacting brains revisited: A cross-brain network neuroscience perspective. Hum Brain Mapp 2022; 43:4458-4474. [PMID: 35661477 PMCID: PMC9435014 DOI: 10.1002/hbm.25966] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2021] [Revised: 03/25/2022] [Accepted: 05/02/2022] [Indexed: 12/14/2022] Open
Abstract
Elucidating the neural basis of social behavior is a long‐standing challenge in neuroscience. Such endeavors are driven by attempts to extend the isolated perspective on the human brain by considering interacting persons' brain activities, but a theoretical and computational framework for this purpose is still in its infancy. Here, we posit a comprehensive framework based on bipartite graphs for interbrain networks and address whether they provide meaningful insights into the neural underpinnings of social interactions. First, we show that the nodal density of such graphs exhibits nonrandom properties. While the current hyperscanning analyses mostly rely on global metrics, we encode the regions' roles via matrix decomposition to obtain an interpretable network representation yielding both global and local insights. With Bayesian modeling, we reveal how synchrony patterns seeded in specific brain regions contribute to global effects. Beyond inferential inquiries, we demonstrate that graph representations can be used to predict individual social characteristics, outperforming functional connectivity estimators for this purpose. In the future, this may provide a means of characterizing individual variations in social behavior or identifying biomarkers for social interaction and disorders.
Collapse
Affiliation(s)
- Christian Gerloff
- JARA-Brain Institute II, Molecular Neuroscience and Neuroimaging, RWTH Aachen & Research Centre Juelich, Aachen, Germany.,Child Neuropsychology Section, Department of Child and Adolescent Psychiatry, Psychosomatics and Psychotherapy, Medical Faculty, RWTH Aachen University, Aachen, Germany.,Chair II of Mathematics, Faculty of Mathematics, Computer Science and Natural Sciences, RWTH Aachen University, Aachen, Germany
| | - Kerstin Konrad
- JARA-Brain Institute II, Molecular Neuroscience and Neuroimaging, RWTH Aachen & Research Centre Juelich, Aachen, Germany.,Child Neuropsychology Section, Department of Child and Adolescent Psychiatry, Psychosomatics and Psychotherapy, Medical Faculty, RWTH Aachen University, Aachen, Germany
| | - Danilo Bzdok
- Department of Biomedical Engineering, McConnell Brain Imaging Centre, Montreal Neurological Institute, Faculty of Medicine, McGill University, Montreal, Canada.,Mila - Quebec Artificial Intelligence Institute, Montreal, Canada
| | - Christina Büsing
- Chair II of Mathematics, Faculty of Mathematics, Computer Science and Natural Sciences, RWTH Aachen University, Aachen, Germany
| | - Vanessa Reindl
- JARA-Brain Institute II, Molecular Neuroscience and Neuroimaging, RWTH Aachen & Research Centre Juelich, Aachen, Germany.,Child Neuropsychology Section, Department of Child and Adolescent Psychiatry, Psychosomatics and Psychotherapy, Medical Faculty, RWTH Aachen University, Aachen, Germany.,Psychology, School of Social Sciences, Nanyang Technological University, Singapore, Singapore
| |
Collapse
|
5
|
Hébert F, Causeur D, Emily M. Omnibus testing approach for gene-based gene-gene interaction. Stat Med 2022; 41:2854-2878. [PMID: 35338506 DOI: 10.1002/sim.9389] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2020] [Revised: 03/03/2022] [Accepted: 03/04/2022] [Indexed: 11/07/2022]
Abstract
Genetic interaction is considered as one of the main heritable component of complex traits. With the emergence of genome-wide association studies (GWAS), a collection of statistical methods dedicated to the identification of interaction at the SNP level have been proposed. More recently, gene-based gene-gene interaction testing has emerged as an attractive alternative as they confer advantage in both statistical power and biological interpretation. Most of the gene-based interaction methods rely on a multidimensional modeling of the interaction, thus facing a lack of robustness against the huge space of interaction patterns. In this paper, we study a global testing approaches to address the issue of gene-based gene-gene interaction. Based on a logistic regression modeling framework, all SNP-SNP interaction tests are combined to produce a gene-level test for interaction. We propose an omnibus test that takes advantage of (1) the heterogeneity between existing global tests and (2) the complementarity between allele-based and genotype-based coding of SNPs. Through an extensive simulation study, it is demonstrated that the proposed omnibus test has the ability to detect with high power the most common interaction genetic models with one causal pair as well as more complex genetic models where more than one causal pair is involved. On the other hand, the flexibility of the proposed approach is shown to be robust and improves power compared to single global tests in replication studies. Furthermore, the application of our procedure to real datasets confirms the adaptability of our approach to replicate various gene-gene interactions.
Collapse
Affiliation(s)
- Florian Hébert
- Department of Statistics and Computer Science, Institut Agro, CNRS, IRMAR, Univ Rennes, F-35000, Rennes, France
| | - David Causeur
- Department of Statistics and Computer Science, Institut Agro, CNRS, IRMAR, Univ Rennes, F-35000, Rennes, France
| | - Mathieu Emily
- Department of Statistics and Computer Science, Institut Agro, CNRS, IRMAR, Univ Rennes, F-35000, Rennes, France
| |
Collapse
|
6
|
Controlling for human population stratification in rare variant association studies. Sci Rep 2021; 11:19015. [PMID: 34561511 PMCID: PMC8463695 DOI: 10.1038/s41598-021-98370-5] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2020] [Accepted: 08/25/2021] [Indexed: 12/05/2022] Open
Abstract
Population stratification is a confounder of genetic association studies. In analyses of rare variants, corrections based on principal components (PCs) and linear mixed models (LMMs) yield conflicting conclusions. Studies evaluating these approaches generally focused on limited types of structure and large sample sizes. We investigated the properties of several correction methods through a large simulation study using real exome data, and several within- and between-continent stratification scenarios. We considered different sample sizes, with situations including as few as 50 cases, to account for the analysis of rare disorders. Large samples showed that accounting for stratification was more difficult with a continental than with a worldwide structure. When considering a sample of 50 cases, an inflation of type-I-errors was observed with PCs for small numbers of controls (≤ 100), and with LMMs for large numbers of controls (≥ 1000). We also tested a novel local permutation method (LocPerm), which maintained a correct type-I-error in all situations. Powers were equivalent for all approaches pointing out that the key issue is to properly control type-I-errors. Finally, we found that power of analyses including small numbers of cases can be increased, by adding a large panel of external controls, provided an appropriate stratification correction was used.
Collapse
|
7
|
Mullaert J, Bouaziz M, Seeleuthner Y, Bigio B, Casanova JL, Alcaïs A, Abel L, Cobat A. Taking population stratification into account by local permutations in rare-variant association studies on small samples. Genet Epidemiol 2021; 45:821-829. [PMID: 34402542 DOI: 10.1002/gepi.22426] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2020] [Revised: 06/07/2021] [Accepted: 07/15/2021] [Indexed: 11/08/2022]
Abstract
Many methods for rare variant association studies require permutations to assess the significance of tests. Standard permutations assume that all individuals are exchangeable and do not take population stratification (PS), a known confounding factor in genetic studies, into account. We propose a novel strategy, LocPerm, in which individual phenotypes are permuted only with their closest ancestry-based neighbors. We performed a simulation study, focusing on small samples, to evaluate and compare LocPerm with standard permutations and classical adjustment on first principal components. Under the null hypothesis, LocPerm was the only method providing an acceptable type I error, regardless of sample size and level of stratification. The power of LocPerm was similar to that of standard permutation in the absence of PS, and remained stable in different PS scenarios. We conclude that LocPerm is a method of choice for taking PS and/or small sample size into account in rare variant association studies.
Collapse
Affiliation(s)
- Jimmy Mullaert
- Université de Paris, IAME, INSERM, Paris, France.,AP-HP, Hôpital Bichat, DEBRC, Paris, France.,Laboratory of Human Genetics of Infectious Diseases, Paris, EU, France
| | - Matthieu Bouaziz
- Laboratory of Human Genetics of Infectious Diseases, Paris, EU, France.,Université de Paris, Imagine Institute, Paris, EU, France
| | - Yoann Seeleuthner
- Laboratory of Human Genetics of Infectious Diseases, Paris, EU, France.,Université de Paris, Imagine Institute, Paris, EU, France
| | - Benedetta Bigio
- St. Giles Laboratory of Human Genetics of Infectious Diseases, Rockefeller Branch, The Rockefeller University, New York, New York, USA
| | - Jean-Laurent Casanova
- Laboratory of Human Genetics of Infectious Diseases, Paris, EU, France.,Université de Paris, Imagine Institute, Paris, EU, France.,St. Giles Laboratory of Human Genetics of Infectious Diseases, Rockefeller Branch, The Rockefeller University, New York, New York, USA.,Howard Hughes Medical Institute, New York, New York, USA
| | - Alexandre Alcaïs
- Laboratory of Human Genetics of Infectious Diseases, Paris, EU, France.,Université de Paris, Imagine Institute, Paris, EU, France
| | - Laurent Abel
- Laboratory of Human Genetics of Infectious Diseases, Paris, EU, France.,Université de Paris, Imagine Institute, Paris, EU, France.,St. Giles Laboratory of Human Genetics of Infectious Diseases, Rockefeller Branch, The Rockefeller University, New York, New York, USA
| | - Aurélie Cobat
- Laboratory of Human Genetics of Infectious Diseases, Paris, EU, France.,Université de Paris, Imagine Institute, Paris, EU, France
| |
Collapse
|
8
|
Cleynen I, Engchuan W, Hestand MS, Heung T, Holleman AM, Johnston HR, Monfeuga T, McDonald-McGinn DM, Gur RE, Morrow BE, Swillen A, Vorstman JAS, Bearden CE, Chow EWC, van den Bree M, Emanuel BS, Vermeesch JR, Warren ST, Owen MJ, Chopra P, Cutler DJ, Duncan R, Kotlar AV, Mulle JG, Voss AJ, Zwick ME, Diacou A, Golden A, Guo T, Lin JR, Wang T, Zhang Z, Zhao Y, Marshall C, Merico D, Jin A, Lilley B, Salmons HI, Tran O, Holmans P, Pardinas A, Walters JTR, Demaerel W, Boot E, Butcher NJ, Costain GA, Lowther C, Evers R, van Amelsvoort TAMJ, van Duin E, Vingerhoets C, Breckpot J, Devriendt K, Vergaelen E, Vogels A, Crowley TB, McGinn DE, Moss EM, Sharkus RJ, Unolt M, Zackai EH, Calkins ME, Gallagher RS, Gur RC, Tang SX, Fritsch R, Ornstein C, Repetto GM, Breetvelt E, Duijff SN, Fiksinski A, Moss H, Niarchou M, Murphy KC, Prasad SE, Daly EM, Gudbrandsen M, Murphy CM, Murphy DG, Buzzanca A, Fabio FD, Digilio MC, Pontillo M, Marino B, Vicari S, Coleman K, Cubells JF, Ousley OY, Carmel M, Gothelf D, Mekori-Domachevsky E, Michaelovsky E, Weinberger R, Weizman A, Kushan L, Jalbrzikowski M, Armando M, Eliez S, Sandini C, Schneider M, Béna FS, Antshel KM, Fremont W, Kates WR, Belzeaux R, Busa T, Philip N, Campbell LE, McCabe KL, Hooper SR, Schoch K, Shashi V, Simon TJ, Tassone F, Arango C, Fraguas D, García-Miñaúr S, Morey-Canyelles J, Rosell J, Suñer DH, Raventos-Simic J, Epstein MP, Williams NM, Bassett AS. Genetic contributors to risk of schizophrenia in the presence of a 22q11.2 deletion. Mol Psychiatry 2021; 26:4496-4510. [PMID: 32015465 PMCID: PMC7396297 DOI: 10.1038/s41380-020-0654-3] [Citation(s) in RCA: 72] [Impact Index Per Article: 24.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/05/2019] [Revised: 11/01/2019] [Accepted: 01/16/2020] [Indexed: 12/17/2022]
Abstract
Schizophrenia occurs in about one in four individuals with 22q11.2 deletion syndrome (22q11.2DS). The aim of this International Brain and Behavior 22q11.2DS Consortium (IBBC) study was to identify genetic factors that contribute to schizophrenia, in addition to the ~20-fold increased risk conveyed by the 22q11.2 deletion. Using whole-genome sequencing data from 519 unrelated individuals with 22q11.2DS, we conducted genome-wide comparisons of common and rare variants between those with schizophrenia and those with no psychotic disorder at age ≥25 years. Available microarray data enabled direct comparison of polygenic risk for schizophrenia between 22q11.2DS and independent population samples with no 22q11.2 deletion, with and without schizophrenia (total n = 35,182). Polygenic risk for schizophrenia within 22q11.2DS was significantly greater for those with schizophrenia (padj = 6.73 × 10-6). Novel reciprocal case-control comparisons between the 22q11.2DS and population-based cohorts showed that polygenic risk score was significantly greater in individuals with psychotic illness, regardless of the presence of the 22q11.2 deletion. Within the 22q11.2DS cohort, results of gene-set analyses showed some support for rare variants affecting synaptic genes. No common or rare variants within the 22q11.2 deletion region were significantly associated with schizophrenia. These findings suggest that in addition to the deletion conferring a greatly increased risk to schizophrenia, the risk is higher when the 22q11.2 deletion and common polygenic risk factors that contribute to schizophrenia in the general population are both present.
Collapse
Affiliation(s)
| | - Worrawat Engchuan
- The Centre for Applied Genomics (TCAG), The Hospital for Sick Children, Toronto, ON, Canada
| | - Matthew S Hestand
- Department of Human Genetics, KU Leuven, Leuven, Belgium
- Department of Pediatrics, University of Cincinnati, Cincinnati, OH, USA
- Division of Human Genetics, Cincinnati Children's Hospital Medical Center, Cincinnati, OH, USA
| | - Tracy Heung
- Clinical Genetics Research Program, Centre for Addiction and Mental Health, Toronto, ON, Canada
- Dalglish Family 22q Clinic, Toronto General Hospital, University Health Network, Toronto, ON, Canada
| | | | - H Richard Johnston
- Department of Human Genetics, Emory University School of Medicine, Atlanta, GA, USA
| | - Thomas Monfeuga
- MRC Centre for Neuropsychiatric Genetics and Genomics, Division of Psychological Medicine and Clinical Neurosciences, School of Medicine, Cardiff University, Cardiff, UK
| | - Donna M McDonald-McGinn
- Department of Pediatrics, Perelman School of Medicine of the University of Pennsylvania, Philadelphia, PA, USA
- Division of Human Genetics and 22q and You Center, the Children's Hospital of Philadelphia, Philadelphia, PA, USA
| | - Raquel E Gur
- Department of Psychiatry and Lifespan Brain Institute, Penn Medicine-CHOP, University of Pennsylvania, Philadelphia, PA, USA
| | - Bernice E Morrow
- Department of Genetics, Albert Einstein College of Medicine, Bronx, NY, USA
| | - Ann Swillen
- Department of Human Genetics, KU Leuven, Leuven, Belgium
- Center for Human Genetics, University Hospitals Leuven, Leuven, Belgium
| | - Jacob A S Vorstman
- Children's Hospital of Philadelphia, Philadelphia, PA, USA
- Department of Psychiatry, University Medical Center Utrecht, Utrecht, The Netherlands
- Department of Psychiatry, University of Toronto, Toronto, ON, Canada
| | - Carrie E Bearden
- Departments of Psychiatry and Biobehavioral Sciences and Psychology, Semel Institute for Neuroscience and Human Behavior, University of California, Los Angeles, CA, USA
| | - Eva W C Chow
- Clinical Genetics Research Program, Centre for Addiction and Mental Health, Toronto, ON, Canada
- Department of Psychiatry, University of Toronto, Toronto, ON, Canada
| | - Marianne van den Bree
- MRC Centre for Neuropsychiatric Genetics and Genomics, Division of Psychological Medicine and Clinical Neurosciences, School of Medicine, Cardiff University, Cardiff, UK
| | - Beverly S Emanuel
- Division of Human Genetics and 22q and You Center, the Children's Hospital of Philadelphia, Philadelphia, PA, USA
| | | | - Stephen T Warren
- Department of Human Genetics, Emory University School of Medicine, Atlanta, GA, USA
| | - Michael J Owen
- MRC Centre for Neuropsychiatric Genetics and Genomics, Division of Psychological Medicine and Clinical Neurosciences, School of Medicine, Cardiff University, Cardiff, UK
| | - Pankaj Chopra
- Department of Human Genetics, Emory University School of Medicine, Atlanta, GA, USA
| | - David J Cutler
- Department of Human Genetics, Emory University School of Medicine, Atlanta, GA, USA
| | - Richard Duncan
- Department of Human Genetics, Emory University School of Medicine, Atlanta, GA, USA
| | - Alex V Kotlar
- Department of Human Genetics, Emory University School of Medicine, Atlanta, GA, USA
| | - Jennifer G Mulle
- Department of Human Genetics, Emory University School of Medicine, Atlanta, GA, USA
| | - Anna J Voss
- Department of Human Genetics, Emory University School of Medicine, Atlanta, GA, USA
| | - Michael E Zwick
- Department of Human Genetics, Emory University School of Medicine, Atlanta, GA, USA
| | - Alexander Diacou
- Department of Genetics, Albert Einstein College of Medicine, Bronx, NY, USA
| | - Aaron Golden
- Department of Genetics, Albert Einstein College of Medicine, Bronx, NY, USA
| | - Tingwei Guo
- Department of Genetics, Albert Einstein College of Medicine, Bronx, NY, USA
| | - Jhih-Rong Lin
- Department of Genetics, Albert Einstein College of Medicine, Bronx, NY, USA
| | - Tao Wang
- Department of Epidemiology & Population Health, Albert Einstein College of Medicine, Bronx, NY, USA
| | - Zhengdong Zhang
- Department of Genetics, Albert Einstein College of Medicine, Bronx, NY, USA
| | - Yingjie Zhao
- Department of Genetics, Albert Einstein College of Medicine, Bronx, NY, USA
| | - Christian Marshall
- Department of Laboratory Medicine and Pathobiology, University of Toronto, Toronto, ON, Canada
- Division of Genome Diagnostics, Department of Paediatric Laboratory Medicine, The Hospital for Sick Children, Toronto, ON, Canada
| | - Daniele Merico
- The Centre for Applied Genomics (TCAG), The Hospital for Sick Children, Toronto, ON, Canada
- Deep Genomics Inc., Toronto, ON, Canada
| | - Andrea Jin
- Division of Human Genetics and 22q and You Center, the Children's Hospital of Philadelphia, Philadelphia, PA, USA
| | - Brenna Lilley
- Division of Human Genetics and 22q and You Center, the Children's Hospital of Philadelphia, Philadelphia, PA, USA
| | - Harold I Salmons
- Division of Human Genetics and 22q and You Center, the Children's Hospital of Philadelphia, Philadelphia, PA, USA
| | - Oanh Tran
- Division of Human Genetics and 22q and You Center, the Children's Hospital of Philadelphia, Philadelphia, PA, USA
| | - Peter Holmans
- MRC Centre for Neuropsychiatric Genetics and Genomics, Division of Psychological Medicine and Clinical Neurosciences, School of Medicine, Cardiff University, Cardiff, UK
| | - Antonio Pardinas
- MRC Centre for Neuropsychiatric Genetics and Genomics, Division of Psychological Medicine and Clinical Neurosciences, School of Medicine, Cardiff University, Cardiff, UK
| | - James T R Walters
- MRC Centre for Neuropsychiatric Genetics and Genomics, Division of Psychological Medicine and Clinical Neurosciences, School of Medicine, Cardiff University, Cardiff, UK
| | | | - Erik Boot
- Dalglish Family 22q Clinic, Toronto General Hospital, University Health Network, Toronto, ON, Canada
| | - Nancy J Butcher
- Clinical Genetics Research Program, Centre for Addiction and Mental Health, Toronto, ON, Canada
| | - Gregory A Costain
- Clinical Genetics Research Program, Centre for Addiction and Mental Health, Toronto, ON, Canada
- Hospital for Sick Children, Toronto, ON, Canada
| | - Chelsea Lowther
- Clinical Genetics Research Program, Centre for Addiction and Mental Health, Toronto, ON, Canada
| | - Rens Evers
- School for Mental Health and Neuroscience, Maastricht University, Maastricht, The Netherlands
| | | | - Esther van Duin
- School for Mental Health and Neuroscience, Maastricht University, Maastricht, The Netherlands
| | - Claudia Vingerhoets
- School for Mental Health and Neuroscience, Maastricht University, Maastricht, The Netherlands
| | - Jeroen Breckpot
- Department of Human Genetics, KU Leuven, Leuven, Belgium
- Center for Human Genetics, University Hospitals Leuven, Leuven, Belgium
| | - Koen Devriendt
- Department of Human Genetics, KU Leuven, Leuven, Belgium
- Center for Human Genetics, University Hospitals Leuven, Leuven, Belgium
| | - Elfi Vergaelen
- Center for Human Genetics, University Hospitals Leuven, Leuven, Belgium
| | - Annick Vogels
- Department of Human Genetics, KU Leuven, Leuven, Belgium
- Center for Human Genetics, University Hospitals Leuven, Leuven, Belgium
| | - T Blaine Crowley
- Division of Human Genetics and 22q and You Center, the Children's Hospital of Philadelphia, Philadelphia, PA, USA
| | - Daniel E McGinn
- Division of Human Genetics and 22q and You Center, the Children's Hospital of Philadelphia, Philadelphia, PA, USA
| | - Edward M Moss
- Division of Human Genetics and 22q and You Center, the Children's Hospital of Philadelphia, Philadelphia, PA, USA
| | - Robert J Sharkus
- Division of Human Genetics and 22q and You Center, the Children's Hospital of Philadelphia, Philadelphia, PA, USA
| | - Marta Unolt
- Division of Human Genetics and 22q and You Center, the Children's Hospital of Philadelphia, Philadelphia, PA, USA
| | - Elaine H Zackai
- Department of Pediatrics, Perelman School of Medicine of the University of Pennsylvania, Philadelphia, PA, USA
- Division of Human Genetics and 22q and You Center, the Children's Hospital of Philadelphia, Philadelphia, PA, USA
| | - Monica E Calkins
- Department of Psychiatry and Lifespan Brain Institute, Penn Medicine-CHOP, University of Pennsylvania, Philadelphia, PA, USA
| | - Robert S Gallagher
- Department of Psychiatry and Lifespan Brain Institute, Penn Medicine-CHOP, University of Pennsylvania, Philadelphia, PA, USA
| | - Ruben C Gur
- Department of Psychiatry and Lifespan Brain Institute, Penn Medicine-CHOP, University of Pennsylvania, Philadelphia, PA, USA
| | - Sunny X Tang
- Department of Psychiatry and Lifespan Brain Institute, Penn Medicine-CHOP, University of Pennsylvania, Philadelphia, PA, USA
| | | | | | | | - Elemi Breetvelt
- Department of Psychiatry, University of Toronto, Toronto, ON, Canada
- Department of Psychiatry, Hospital for Sick Children, Toronto, ON, Canada
| | - Sasja N Duijff
- Department of Pediatrics, University Medical Center Utrecht, Utrecht, Netherlands
| | - Ania Fiksinski
- Clinical Genetics Research Program, Centre for Addiction and Mental Health, Toronto, ON, Canada
- Brain Center Rudolf Magnus, University Medical Center Utrecht, Utrecht, The Netherlands
| | - Hayley Moss
- MRC Centre for Neuropsychiatric Genetics and Genomics, Division of Psychological Medicine and Clinical Neurosciences, School of Medicine, Cardiff University, Cardiff, UK
| | - Maria Niarchou
- MRC Centre for Neuropsychiatric Genetics and Genomics, Division of Psychological Medicine and Clinical Neurosciences, School of Medicine, Cardiff University, Cardiff, UK
| | | | | | - Eileen M Daly
- Department of Forensic and Neurodevelopmental Sciences, Institute of Psychiatry, Psychology & Neuroscience (IoPPN), King's College London, London, UK
| | - Maria Gudbrandsen
- Department of Forensic and Neurodevelopmental Sciences, Institute of Psychiatry, Psychology & Neuroscience (IoPPN), King's College London, London, UK
| | - Clodagh M Murphy
- Department of Forensic and Neurodevelopmental Sciences, Institute of Psychiatry, Psychology & Neuroscience (IoPPN), King's College London, London, UK
| | - Declan G Murphy
- Department of Forensic and Neurodevelopmental Sciences, Institute of Psychiatry, Psychology & Neuroscience (IoPPN), King's College London, London, UK
| | - Antonio Buzzanca
- Department of Human Neurosciences, University Sapienza of Rome, Rome, Italy
| | - Fabio Di Fabio
- Department of Human Neurosciences, University Sapienza of Rome, Rome, Italy
| | | | - Maria Pontillo
- Child and Adolescence Neuropsychiatry Unit, Department of Neuroscience, IRCSS Bambino Gesù Children's Hospital of Rome, Rome, Italy
| | | | - Stefano Vicari
- Child and Adolescence Neuropsychiatry Unit, Department of Neuroscience, IRCSS Bambino Gesù Children's Hospital of Rome, Rome, Italy
| | - Karlene Coleman
- Department of Human Genetics, Emory University School of Medicine, Atlanta, GA, USA
| | - Joseph F Cubells
- Department of Human Genetics, Emory University School of Medicine, Atlanta, GA, USA
- Department of Psychiatry and Behavioral Sciences, Emory University School of Medicine, Atlanta, GA, USA
| | - Opal Y Ousley
- Department of Psychiatry and Behavioral Sciences, Emory University School of Medicine, Atlanta, GA, USA
| | - Miri Carmel
- Felsenstein Medical Research Center, Petach Tikva, Israel
- Sackler Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel
| | - Doron Gothelf
- Sackler Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel
- The Child Psychiatry Division, Edmond and Lily Safra Children's Hospital, Sheba Medical Center, Tel Hashomer, Israel
| | - Ehud Mekori-Domachevsky
- Sackler Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel
- The Child Psychiatry Division, Edmond and Lily Safra Children's Hospital, Sheba Medical Center, Tel Hashomer, Israel
| | - Elena Michaelovsky
- Felsenstein Medical Research Center, Petach Tikva, Israel
- Sackler Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel
| | - Ronnie Weinberger
- The Child Psychiatry Division, Edmond and Lily Safra Children's Hospital, Sheba Medical Center, Tel Hashomer, Israel
| | - Abraham Weizman
- Felsenstein Medical Research Center, Petach Tikva, Israel
- Sackler Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel
- Geha Mental Health Center, Petach Tikva, Israel
| | - Leila Kushan
- Departments of Psychiatry and Biobehavioral Sciences and Psychology, Semel Institute for Neuroscience and Human Behavior, University of California, Los Angeles, CA, USA
| | - Maria Jalbrzikowski
- Department of Psychiatry, University of Pittsburgh School of Medicine, Pittsburgh, PA, USA
| | - Marco Armando
- Developmental Imaging and Psychopathology, Department of Psychiatry, University of Geneva, Geneva, Switzerland
| | - Stéphan Eliez
- Developmental Imaging and Psychopathology, Department of Psychiatry, University of Geneva, Geneva, Switzerland
| | - Corrado Sandini
- Developmental Imaging and Psychopathology, Department of Psychiatry, University of Geneva, Geneva, Switzerland
| | - Maude Schneider
- Developmental Imaging and Psychopathology, Department of Psychiatry, University of Geneva, Geneva, Switzerland
| | | | - Kevin M Antshel
- Department of Psychology, Syracuse University, Syracuse, NY, USA
| | - Wanda Fremont
- Department of Psychiatry and Behavioral Sciences, SUNY Upstate Medical University, Syracuse, NY, USA
| | - Wendy R Kates
- Department of Psychiatry and Behavioral Sciences, SUNY Upstate Medical University, Syracuse, NY, USA
| | - Raoul Belzeaux
- Pôle de psychiatrie, Hopital Sainte Marguerite, Batiment Solaris, APHM, Marseille, France
| | - Tiffany Busa
- Departement de Genetique Medicale Hôpital d'Enfants de la Timone, APHM, Marseille, France
| | - Nicole Philip
- Departement de Genetique Medicale Aix Marseille Univ, INSERM, GMGF, APHM, Marseille, France
| | | | - Kathryn L McCabe
- University of Newcastle, Callaghan, Australia
- University of California Davis, Davis, CA, USA
| | - Stephen R Hooper
- Department of Allied Health Sciences, School of Medicine, University of North Carolina-Chapel Hill, Chapel Hill, NC, USA
| | - Kelly Schoch
- Department of Pediatrics, Division of Medical Genetics, Duke University School of Medicine, Durham, NC, USA
| | - Vandana Shashi
- Department of Pediatrics, Division of Medical Genetics, Duke University School of Medicine, Durham, NC, USA
| | - Tony J Simon
- MIND Institute and Department of Psychiatry and Behavioral Sciences, University of California Davis, Davis, CA, USA
| | - Flora Tassone
- Department of Microbiology and Molecular Medicine, University of California Davis, Davis, CA, USA
| | - Celso Arango
- Department of Child and Adolescent Psychiatry, Hospital General Universitario Gregorio Marañón, IiSGM, CIBERSAM, School of Medicine, Universidad Complutense, Madrid, Spain
| | - David Fraguas
- Department of Child and Adolescent Psychiatry, Hospital General Universitario Gregorio Marañón, IiSGM, CIBERSAM, School of Medicine, Universidad Complutense, Madrid, Spain
| | - Sixto García-Miñaúr
- Institute of Medical and Molecular Genetics (INGEMM), La Paz University Hospital, Madrid, Spain
| | | | | | - Damià H Suñer
- Laboratorio Unidad de Diagnóstico Molecular y Genética Clínica, Hospital Universitari Son Espases, Palma de Mallorca, Spain
| | | | - Michael P Epstein
- Department of Human Genetics, Emory University School of Medicine, Atlanta, GA, USA.
| | - Nigel M Williams
- MRC Centre for Neuropsychiatric Genetics and Genomics, Division of Psychological Medicine and Clinical Neurosciences, School of Medicine, Cardiff University, Cardiff, UK.
| | - Anne S Bassett
- Clinical Genetics Research Program, Centre for Addiction and Mental Health, Toronto, ON, Canada.
- Dalglish Family 22q Clinic, Toronto General Hospital, University Health Network, Toronto, ON, Canada.
- Department of Psychiatry, University of Toronto, Toronto, ON, Canada.
| |
Collapse
|
9
|
Asif H, Alliey-Rodriguez N, Keedy S, Tamminga CA, Sweeney JA, Pearlson G, Clementz BA, Keshavan MS, Buckley P, Liu C, Neale B, Gershon ES. GWAS significance thresholds for deep phenotyping studies can depend upon minor allele frequencies and sample size. Mol Psychiatry 2021; 26:2048-2055. [PMID: 32066829 PMCID: PMC7429341 DOI: 10.1038/s41380-020-0670-3] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/28/2019] [Revised: 01/28/2020] [Accepted: 01/29/2020] [Indexed: 02/01/2023]
Abstract
An important issue affecting genome-wide association studies with deep phenotyping (multiple correlated phenotypes) is determining the suitable family-wise significance threshold. Straightforward family-wise correction (Bonferroni) of p < 0.05 for 4.3 million genotypes and 335 phenotypes would give a threshold of p < 3.46E-11. This would be too conservative because it assumes all tests are independent. The effective number of tests, both phenotypic and genotypic, must be adjusted for the correlations between them. Spectral decomposition of the phenotype matrix and LD-based correction of the number of tested SNPs are currently used to determine an effective number of tests. In this paper, we compare these calculated estimates with permutation-determined family-wise significance thresholds. Permutations are performed by shuffling individual IDs of the genotype vector for this dataset, to preserve correlation of phenotypes. Our results demonstrate that the permutation threshold is influenced by minor allele frequency (MAF) of the SNPs, and by the number of individuals tested. For the more common SNPs (MAF > 0.1), the permutation family-wise threshold was in close agreement with spectral decomposition methods. However, for less common SNPs (0.05 < MAF ≤ 0.1), the permutation threshold calculated over all SNPs was off by orders of magnitude. This applies to the number of individuals studied (here 777) but not to very much larger numbers. Based on these findings, we propose that the threshold to find a particular level of family-wise significance may need to be established using separate permutations of the actual data for several MAF bins.
Collapse
Affiliation(s)
- Huma Asif
- Department of Psychiatry and Behavioral Neurosciences, University of Chicago, 924 East 57th Street Room. R016, Chicago, IL, 60637, USA.
| | - Ney Alliey-Rodriguez
- Department of Psychiatry and Behavioral Neurosciences, University of Chicago, 924 East 57th Street Room. R016, Chicago, IL, 60637, USA
| | - Sarah Keedy
- Department of Psychiatry and Behavioral Neurosciences, University of Chicago, 924 East 57th Street Room. R016, Chicago, IL, 60637, USA
| | - Carol A Tamminga
- Department of Psychiatry, University of Texas Southwestern Medical Center, Dallas, TX, USA
| | - John A Sweeney
- Department of Psychiatry, University of Cincinnati, Cincinnati, OH, USA
| | - Godfrey Pearlson
- Departments of Psychiatry & Neuroscience, Yale University, New Haven, CT, USA
| | - Brett A Clementz
- Department of Psychology, University of Georgia, Athens, GA, USA
| | | | | | - Chunyu Liu
- Department of Psychiatry, SUNY Upstate Medical University, Binghamton, NY, USA
| | | | - Elliot S Gershon
- Department of Psychiatry and Behavioral Neurosciences, University of Chicago, 924 East 57th Street Room. R016, Chicago, IL, 60637, USA. .,Department of Human Genetics, University of Chicago, 924 East 57th Street Room. R016, Chicago, IL, 60637, USA.
| |
Collapse
|
10
|
Hébert F, Causeur D, Emily M. An adaptive decorrelation procedure for signal detection. Comput Stat Data Anal 2021. [DOI: 10.1016/j.csda.2020.107082] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
|
11
|
Milanlouei S, Menichetti G, Li Y, Loscalzo J, Willett WC, Barabási AL. A systematic comprehensive longitudinal evaluation of dietary factors associated with acute myocardial infarction and fatal coronary heart disease. Nat Commun 2020; 11:6074. [PMID: 33247093 PMCID: PMC7699643 DOI: 10.1038/s41467-020-19888-2] [Citation(s) in RCA: 25] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2019] [Accepted: 10/29/2020] [Indexed: 12/17/2022] Open
Abstract
Environmental factors, and in particular diet, are known to play a key role in the development of Coronary Heart Disease. Many of these factors were unveiled by detailed nutritional epidemiology studies, focusing on the role of a single nutrient or food at a time. Here, we apply an Environment-Wide Association Study approach to Nurses' Health Study data to explore comprehensively and agnostically the association of 257 nutrients and 117 foods with coronary heart disease risk (acute myocardial infarction and fatal coronary heart disease). After accounting for multiple testing, we identify 16 food items and 37 nutrients that show statistically significant association - while adjusting for potential confounding and control variables such as physical activity, smoking, calorie intake, and medication use - among which 38 associations were validated in Nurses' Health Study II. Our implementation of Environment-Wide Association Study successfully reproduces prior knowledge of diet-coronary heart disease associations in the epidemiological literature, and helps us detect new associations that were only marginally studied, opening potential avenues for further extensive experimental validation. We also show that Environment-Wide Association Study allows us to identify a bipartite food-nutrient network, highlighting which foods drive the associations of specific nutrients with coronary heart disease risk.
Collapse
Affiliation(s)
- Soodabeh Milanlouei
- Center for Complex Network Research, Northeastern University, Boston, MA, USA
| | - Giulia Menichetti
- Center for Complex Network Research, Northeastern University, Boston, MA, USA
| | - Yanping Li
- Department of Nutrition, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Joseph Loscalzo
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital, Boston, MA, USA
| | - Walter C Willett
- Department of Nutrition, Harvard T.H. Chan School of Public Health, Boston, MA, USA
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital, Boston, MA, USA
| | - Albert-László Barabási
- Center for Complex Network Research, Northeastern University, Boston, MA, USA.
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital, Boston, MA, USA.
- Center for Network Science, Central European University, Budapest, Hungary.
| |
Collapse
|
12
|
Long noncoding RNA DLEU2 predicts a poor prognosis and enhances malignant properties in laryngeal squamous cell carcinoma through the miR-30c-5p/PIK3CD/Akt axis. Cell Death Dis 2020; 11:472. [PMID: 32555190 PMCID: PMC7303144 DOI: 10.1038/s41419-020-2581-2] [Citation(s) in RCA: 28] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2020] [Revised: 04/29/2020] [Accepted: 04/30/2020] [Indexed: 12/11/2022]
Abstract
Long noncoding RNAs (lncRNAs) have been identified as potential prognostic tools and therapeutic biomarkers for a variety of human cancers. However, the functional roles and underlying mechanisms of key lncRNAs affecting laryngeal squamous cell carcinomas (LSCCs) are largely unknown. Here, we adopted a novel subpathway strategy based on the lncRNA-mRNA profiles from the Cancer Genome Atlas (TCGA) database and identified the lncRNA deleted in lymphocytic leukemia 2 (DLEU2) as an oncogene in the pathogenesis of LSCCs. We found that DLEU2 was significantly upregulated and predicted poor clinical outcomes in LSCC patients. In addition, ectopic overexpression of DLEU2 promoted the proliferation and migration of LSCC cells both in vivo and in vitro. Mechanistically, DLEU2 served as a competing endogenous RNA to regulate PIK3CD expression by sponging miR-30c-5p and subsequently activated the Akt signaling pathway. As a target gene of DLEU2, PIK3CD was also upregulated and could predict a poor prognosis in LSCC patients. In conclusion, we found that the novel LSCC-related gene DLEU2 enhances the malignant properties of LSCCs via the miR-30c-5p/PIK3CD/Akt axis. DLEU2 and its targeted miR-30c-5p/PIK3CD/Akt axis may represent valuable prognostic biomarkers and therapeutic targets for LSCCs.
Collapse
|
13
|
Bocher O, Génin E. Rare variant association testing in the non-coding genome. Hum Genet 2020; 139:1345-1362. [PMID: 32500240 DOI: 10.1007/s00439-020-02190-y] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2020] [Accepted: 05/29/2020] [Indexed: 12/25/2022]
Abstract
The development of next-generation sequencing technologies has opened-up some new possibilities to explore the contribution of genetic variants to human diseases and in particular that of rare variants. Statistical methods have been developed to test for association with rare variants that require the definition of testing units and, in these testing units, the selection of qualifying variants to include in the test. In the coding regions of the genome, testing units are usually the different genes and qualifying variants are selected based on their functional effects on the encoded proteins. Extending these tests to the non-coding regions of the genome is challenging. Testing units are difficult to define as the non-coding genome organisation is still rather unknown. Qualifying variants are difficult to select as the functional impact of non-coding variants on gene expression is hard to predict. These difficulties could explain why very few investigators so far have analysed the non-coding parts of their whole genome sequencing data. These non-coding parts yet represent the vast majority of the genome and some studies suggest that they could play a major role in disease susceptibility. In this review, we discuss recent experimental and statistical developments to gain knowledge on the non-coding genome and how this knowledge could be used to include rare non-coding variants in association tests. We describe the few studies that have considered variants from the non-coding genome in association tests and how they managed to define testing units and select qualifying variants.
Collapse
Affiliation(s)
- Ozvan Bocher
- Génétique, Génomique Fonctionnelle Et Biotechnologies, Faculté de Médecine, Univ Brest, Inserm, Inserm UMR1078, Bâtiment E-IBRBS 2ieme étage, 22 avenue Camille Desmoulins, 29238, Brest Cedex 3, France.
| | - Emmanuelle Génin
- Génétique, Génomique Fonctionnelle Et Biotechnologies, Faculté de Médecine, Univ Brest, Inserm, Inserm UMR1078, Bâtiment E-IBRBS 2ieme étage, 22 avenue Camille Desmoulins, 29238, Brest Cedex 3, France.
- CHU Brest, Brest, France.
| |
Collapse
|
14
|
Hu F, Yu Y, Chen JS, Hu H, Scheet P, Huff CD. Integrated case-control and somatic-germline interaction analyses of soft-tissue sarcoma. J Med Genet 2020; 58:145-153. [PMID: 32447321 DOI: 10.1136/jmedgenet-2019-106814] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2020] [Revised: 04/02/2020] [Accepted: 04/05/2020] [Indexed: 11/04/2022]
Abstract
PURPOSE The contribution of rare genetic variation in the development of soft-tissue sarcoma (STS) remains underexplored. To address this gap, we conducted a whole-exome case-control and somatic-germline interaction study to identify and characterise STS susceptible genes. METHODS The study involved 219 STS cases from The Cancer Genome Atlas and 3507 controls. All cases and controls were matched genetically onEuropean ancestry based on the 1000 Genomes project. Cross-platform technological stratification was performed with XPAT and gene-based association tests with VAAST 2. RESULTS NF1 exhibited the strongest genome-wide signal across the six subtypes, with p=1×10-5. We also observed nominally significant association signals for three additional genes of interest, TP53 (p=0.0025), RB1 (p=0.0281), and MSH2 (p=0.0085). BAG1, which has not previously been implicated in STS, exhibited the strongest genome-wide signal after NF1, with p=6×10-5. The association signals for NF1 and MSH2 were driven primarily by truncating variants, with ORs of 39 (95% CI: 7.1 to 220) for NF1 and 33 (95% CI: 2.4 to 460) for MSH2. In contrast, the association signals for RB1 and BAG1 were driven primarily by predicted damaging missense variants, with estimated ORs of 12 (95% CI: 2.4 to 59) for RB1 and 20 (95% CI: 1.4 to 300) for BAG1. CONCLUSIONS Our results confirm that pathogenic variants in NF1, RB1 and TP53 confer large increases in the risk of developing multiple STS subtypes, provide support for the role of MSH2 in STS susceptibility and identify BAG1 as a novel candidate STS risk gene.
Collapse
Affiliation(s)
- Fulan Hu
- Department of Biostatistics and Epidemiology, School of Public Health, Shenzhen University Health Science Center, Shenzhen, China.,Department of Epidemiology, University of Texas MD Anderson Cancer Center, Houston, Texas, USA
| | - Yao Yu
- Department of Epidemiology, University of Texas MD Anderson Cancer Center, Houston, Texas, USA
| | - Jiun-Sheng Chen
- Department of Epidemiology, University of Texas MD Anderson Cancer Center, Houston, Texas, USA
| | - Hao Hu
- Department of Epidemiology, University of Texas MD Anderson Cancer Center, Houston, Texas, USA
| | - Paul Scheet
- Department of Epidemiology, University of Texas MD Anderson Cancer Center, Houston, Texas, USA
| | - Chad D Huff
- Department of Epidemiology, University of Texas MD Anderson Cancer Center, Houston, Texas, USA
| |
Collapse
|
15
|
He J, Ma W, Zhou Y. Gene association detection via local linear regression method. J Hum Genet 2019; 65:115-123. [PMID: 31602004 DOI: 10.1038/s10038-019-0676-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2018] [Revised: 09/10/2019] [Accepted: 09/18/2019] [Indexed: 11/09/2022]
Abstract
The development of next-generation sequencing technology has provided us with great convenience in genetic association studies and many effective analysis methods were proposed continuously. However, population stratification is still a major issue in current genetic association studies. Many existing methods have been developed to remove the bias due to population stratification for common variant association studies, but such methods may be not effective for rare variant, which will lead to power reduction. Therefore, in this paper, we develop a principal component analysis strategy (called PC-LLR) based on local linear regression method to eliminate population stratification effect in both rare variant and common variant association studies. Simulation results indicate that the new PC-LLR method can eliminate population stratification effect well. It has correct type I error rates in all cases and higher powers in most cases, while most existing methods have inflated type I error rates at least in some cases. We also demonstrate that the PC-LLR is more effective to eliminate population stratification effect through applying the PC-LLR to the whole-exome sequencing data set from genetic analysis workshop 19 (GAW19).
Collapse
Affiliation(s)
- Jinli He
- Department of Statistics, School of Mathematical Sciences, Heilongjiang University and Heilongjiang Provincial Key Laboratory of the Theory and Computation of Complex Systems, Harbin, 150080, China
| | - Weijun Ma
- Department of Statistics, School of Mathematical Sciences, Heilongjiang University and Heilongjiang Provincial Key Laboratory of the Theory and Computation of Complex Systems, Harbin, 150080, China
| | - Ying Zhou
- Department of Statistics, School of Mathematical Sciences, Heilongjiang University and Heilongjiang Provincial Key Laboratory of the Theory and Computation of Complex Systems, Harbin, 150080, China.
| |
Collapse
|
16
|
Identification of Aberrantly Expressed lncRNAs Involved in Orthodontic Force Using a Subpathway Strategy. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2019; 2019:9250129. [PMID: 31565070 PMCID: PMC6745140 DOI: 10.1155/2019/9250129] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/20/2019] [Accepted: 07/07/2019] [Indexed: 01/25/2023]
Abstract
Background The aim of the study was to identify key long noncoding RNAs (lncRNA) and related subpathways in the periodontal ligament tissue following orthodontic force. Methods We adopt a novelty subpathway strategy to identify lncRNAs competitively regulated functions and the key competitive lncRNAs in periodontal ligament disorders after undergoing orthodontic force. To begin with, patients with orthodontics in our hospital were enrolled in our research. The relationship of lncRNA-mRNA was established through shared predicted miRNA by using the hypergeometric test, Jaccard coefficient standardization, and the Pearson coefficient to determine the valid interaction relationship. After embedding screened lncRNA interactions to pathways, the significant subpathways were recognized by lenient distance and Wallenius approximation methods to calculate the false discovery rate value of each subpathway. Results The lncRNA-mRNA intersections including 263 lncRNAs, 1,599 mRNAs, and 3,762 interacting pairs were obtained. The enriched mRNAs were further enriched into various candidate pathways such as the PI3K-Akt signaling pathway. Several subpathways were screened, including the PI3K-Akt signaling pathway, 04510_1 focal adhesion, and p53 signaling pathway, respectively. The network of pathway-lncRNA-mRNA was constructed. Several key lncRNAs including DNAJC3-AS1, WDFY3-AS2, LINC00482, and DLEU2 were screened. Conclusions DNAJC3-AS1, WDFY3-AS2, LINC00482, and DLEU2 as aberrantly expressed lncRNAs involved in orthodontic force might play crucial roles in periodontal ligament disease pathogenesis.
Collapse
|
17
|
Guan BG, Cai XX. Abnormal sub-pathways competitively regulated by lncRNAs contribute to postmenopausal osteoporosis. Exp Ther Med 2019; 17:2894-2900. [PMID: 30936959 PMCID: PMC6434238 DOI: 10.3892/etm.2019.7326] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2018] [Accepted: 02/04/2019] [Indexed: 11/21/2022] Open
Abstract
Abnormal sub-pathways competitively regulated by long non-coding RNAs (lncRNAs) for postmenopausal osteoporosis (PO) based on integration of lncRNA-mRNA expression data and pathway network topologies were investigated. Interesting lncRNA-mRNA pairs were selected by Pearsons correlation coefficient (PCC) algorithm on the basis of lncRNA-miRNA and miRNA-mRNA interactions and gene expression profiles. Then, lncRNAs in interesting pairs were embedded into pathway graphs as signature nodes by linking to their regulated-mRNAs, and lncRNA competitively regulated pathways (LCRPs) were gained for PO patients. Moreover, sub-pathways were detected dependent on the shortest distance similarity and the pathway topology. The abnormal sub-pathways were determined utilizing the Wallenius approximation methods through evaluating the statistical significance of sub-pathways. In total 75 interesting lncRNA-mRNA pairs (representing 17 lncRNAs and 74 mRNAs) were identified. Subsequently, 42 LCRPs were extracted from pathway graphs by signature lncRNA regulated mRNAs. Moreover, 14 abnormal sub-pathways with P<0.05 were obtained between PO patients and controls, such as sub-pathways of PI3K-Akt signaling pathway and long-term potentiation. This finding may facilitate understanding the molecular mechanism of PO, and point a new direction to identify potential biomarkers for treatment and prevention of the disease.
Collapse
Affiliation(s)
- Bing-Gang Guan
- Spine Surgery, Tianjin Hospital, Tianjin 300211, P.R. China
| | - Xiao-Xi Cai
- Department of Orthopedics, Huadong Hospital Affiliated to Fudan University, Shanghai 200040, P.R. China
| |
Collapse
|
18
|
A case-control collapsing analysis identifies retinal dystrophy genes associated with ophthalmic disease in patients with no pathogenic ABCA4 variants. Genet Med 2019; 21:2336-2344. [PMID: 30926958 DOI: 10.1038/s41436-019-0495-0] [Citation(s) in RCA: 22] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2018] [Accepted: 03/12/2019] [Indexed: 12/12/2022] Open
Abstract
PURPOSE Variants in the ABCA4 gene are causal for a variety of retinal dystrophy phenotypes, including Stargardt disease (STGD1). However, 15% of patients who present with symptoms compatible with STGD1/ABCA4 disease do not have identifiable causal ABCA4 variants. We hypothesized that a case-control collapsing analysis in ABCA4-negative patients with compatible symptoms would provide an objective measure to identify additional disease genes. METHODS We performed a genome-wide enrichment analysis of "qualifying variants"-ultrarare variants predicted to impact protein function-in protein-coding genes in 79 unrelated cases and 9028 unrelated controls. RESULTS Despite modest sample size, two known retinal dystrophy genes, PRPH2 and CRX, achieved study-wide significance (p < 1.33 × 10-6) under a dominant disease model, and eight additional known retinal dystrophy genes achieved nominal significance (p < 0.05). Across these ten genes, the excess of qualifying variants explained up to 36.8% of affected individuals. Furthermore, under a recessive model, the cone-rod dystrophy gene CERKL approached study-wide significance. CONCLUSION Our results indicate that case-control collapsing analyses can efficiently identify pathogenic variants in genes in non-ABCA4 retinal dystrophies. The genome-wide collapsing analysis framework is an objective discovery method particularly suitable in settings with overlapping disease phenotypes.
Collapse
|
19
|
Zheng Y, Sun S, Yu M, Fu X. Identification of potential hub‐lncRNAs in ischemic stroke based on Subpathway‐LNCE method. J Cell Biochem 2019; 120:12832-12842. [PMID: 30882937 DOI: 10.1002/jcb.28554] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2018] [Revised: 12/13/2018] [Accepted: 01/10/2019] [Indexed: 12/13/2022]
Affiliation(s)
- Yanhua Zheng
- The First Department of Neurology Weifang People's Hospital Weifang Shandong China
| | - Shaopeng Sun
- The First Department of Neurology Weifang People's Hospital Weifang Shandong China
| | - Miao Yu
- The First Department of Neurology Weifang People's Hospital Weifang Shandong China
| | - Xiuxin Fu
- The First Department of Neurology Weifang People's Hospital Weifang Shandong China
| |
Collapse
|
20
|
Wu X, Sun L, Wang Z. Identification of lncRNA competitively regulated subpathways in myocardial infarction. Exp Ther Med 2019; 17:3041-3046. [PMID: 30936975 PMCID: PMC6434249 DOI: 10.3892/etm.2019.7320] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2018] [Accepted: 02/06/2019] [Indexed: 11/05/2022] Open
Abstract
The functions of long non-coding RNAs (lncRNAs) in myocardial infarction (MI) remain largely unknown. Thus, we used the subp athway-LINCE method to characterize the potential roles of lncRNAs in MI. Candidate lncRNA-mRNA interactions were obtained from miRNA-mRNA interactions and lncRNA-miRNA interactions. Then the lncRNA and mRNA co-expression relationship pairs (LncGenePairs) were screened from the lncRNAs and mRNA intersections, which were extracted through candidate lncRNA-mRNA interactions and sample gene expression profiles. The lncRNAs in LncGenePairs were embedded into pathway graphs as nodes through linking to their regulated mRNAs, which resulted in obtaining condition-specific lncRNA competitively regulated signal pathways (csLncRPs). Finally, the csLncRPs were calculated using lenient distance similarity to obtain the lncRNA competitively regulated subpathways. Based on the statistical significance of signal subpathways, lncRNA-mRNA networks were constructed, in which hub lncRNAs were selected. A total of 65 lncRNAs competitively regulated subpathways and 13 hub lncRNAs were obtained, which associated with a risk of MI. Identifying lncRNAs competitively regulated subpathways not only provides potential lncRNA biomarkers for MI, but also helps the understanding of pathogenesis of MI.
Collapse
Affiliation(s)
- Xia Wu
- Department of Geriatrics, Daqing Oilfield General Hospital, Daqing, Heilongjiang 163316, P.R. China
| | - Lili Sun
- Department of Geriatrics, Daqing Oilfield General Hospital, Daqing, Heilongjiang 163316, P.R. China
| | - Ziliang Wang
- Department of Cardiovascular Medicine, Daqing People's Hospital, Daqing, Heilongjiang 163316, P.R. China
| |
Collapse
|
21
|
Han DM. Sub-pathway based approach to systematically track candidate sub-pathway biomarkers for heart failure. Exp Ther Med 2019; 17:3162-3168. [PMID: 30936989 PMCID: PMC6434253 DOI: 10.3892/etm.2019.7319] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2018] [Accepted: 02/06/2019] [Indexed: 11/23/2022] Open
Abstract
Identification of potential novel biomarkers for heart failure was undertaken using a sub-pathway based method. To realize this goal, heart failure-relevant dataset, reference pathways, and lncRNA-miRNA-mRNA interactions were firstly recruited. Secondly, the informative pathways were extracted relying on KEGG pathways and the mRNAs in the PCC-weighted lncRNA-mRNA interactions. Thirdly, lncRNA-regulated sub-pathways were dissected after construction of condition-specific lncRNA competitively regulated pathways (LCRP). To detect crucial heart failure-relevant lncRNAs, degree analysis was conducted for all nodes within the LCRP. Ultimately, the significance of candidate sub-pathways were assessed to further identify the significant sub-pathways. There were 44 lncRNAs, 165 mRNAs and 224 co-expressed interactions. After putting the 165 mRNAs into the reference pathways, 56 informative pathways were obtained which were then embedded into undirected graphs, and 44 lncRNAs were inserted into the pathway graphs to further construct the condition-specific LCRP. According to degree distribution, 4 hub lncRNAs were selected, including ERVK13-1, YLPM1, PDXDC2P, and LINC00482. Based on the LCRP information, a total of 36 sub-pathways mediated by lncRNAs participated in 40 complete pathways. Among these 40 pathways, we mainly concentrated on the top three sub-pathways, including a sub-part of MAPK signaling pathway, an important sub-part in ErbB signaling pathway, and a part of chemokine signaling pathway. In the top 3 significant sub-pathways, gene AKT3 was simultaneously regulated by ERVK13-1, YLPM1, and PDXDC2P. Sub-pathways including MAPK signaling pathway and hub lncRNAs (ERVK13-1, YLPM1, and PDXDC2P) may play an important role in heart failure.
Collapse
Affiliation(s)
- Dong-Mei Han
- Department of Cardiology, General Hospital of Daqing Oil Field, Daqing, Heilongjiang 163000, P.R. China
| |
Collapse
|
22
|
The impact of a fine-scale population stratification on rare variant association test results. PLoS One 2018; 13:e0207677. [PMID: 30521541 PMCID: PMC6283567 DOI: 10.1371/journal.pone.0207677] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2018] [Accepted: 11/05/2018] [Indexed: 12/28/2022] Open
Abstract
Population stratification is a well-known confounding factor in both common and rare variant association analyses. Rare variants tend to be more geographically clustered than common variants, because of their more recent origin. However, it is not yet clear if population stratification at a very fine scale (neighboring administrative regions within a country) would lead to statistical bias in rare variant analyses. As the inclusion of convenience controls from external studies is indeed a common procedure, in order to increase the power to detect genetic associations, this problem is important. We studied through simulation the impact of a fine scale population structure on different rare variant association strategies, assessing type I error and power. We showed that principal component analysis (PCA) based methods of adjustment for population stratification adequately corrected type I error inflation at the largest geographical scales, but not at finest scales. We also showed in our simulations that adding controls obviously increased power, but at a considerably lower level when controls were drawn from another population.
Collapse
|
23
|
Fischer ST, Jiang Y, Broadaway KA, Conneely KN, Epstein MP. Powerful and robust cross-phenotype association test for case-parent trios. Genet Epidemiol 2018; 42:447-458. [PMID: 29460449 PMCID: PMC6013339 DOI: 10.1002/gepi.22116] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2017] [Revised: 01/05/2018] [Accepted: 01/08/2018] [Indexed: 12/17/2022]
Abstract
There has been increasing interest in identifying genes within the human genome that influence multiple diverse phenotypes. In the presence of pleiotropy, joint testing of these phenotypes is not only biologically meaningful but also statistically more powerful than univariate analysis of each separate phenotype accounting for multiple testing. Although many cross-phenotype association tests exist, the majority of such methods assume samples composed of unrelated subjects and therefore are not applicable to family-based designs, including the valuable case-parent trio design. In this paper, we describe a robust gene-based association test of multiple phenotypes collected in a case-parent trio study. Our method is based on the kernel distance covariance (KDC) method, where we first construct a similarity matrix for multiple phenotypes and a similarity matrix for genetic variants in a gene; we then test the dependency between the two similarity matrices. The method is applicable to either common variants or rare variants in a gene, and resulting tests from the method are by design robust to confounding due to population stratification. We evaluated our method through simulation studies and observed that the method is substantially more powerful than standard univariate testing of each separate phenotype. We also applied our method to phenotypic and genotypic data collected in case-parent trios as part of the Genetics of Kidneys in Diabetes (GoKinD) study and identified a genome-wide significant gene demonstrating cross-phenotype effects that was not identified using standard univariate approaches.
Collapse
Affiliation(s)
- S. Taylor Fischer
- Department of Human Genetics and Center for Computational and Quantitative Genetics, Emory University, Atlanta, GA
| | - Yunxuan Jiang
- Department of Biostatistics and Bioinformatics, Emory University, Atlanta, GA
| | - K. Alaine Broadaway
- Department of Human Genetics and Center for Computational and Quantitative Genetics, Emory University, Atlanta, GA
| | - Karen N. Conneely
- Department of Human Genetics and Center for Computational and Quantitative Genetics, Emory University, Atlanta, GA
| | - Michael P. Epstein
- Department of Human Genetics and Center for Computational and Quantitative Genetics, Emory University, Atlanta, GA
| |
Collapse
|
24
|
Satten GA, Kong M, Datta S. Multisample adjusted U-statistics that account for confounding covariates. Stat Med 2018; 37:3357-3372. [PMID: 29923344 DOI: 10.1002/sim.7825] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2017] [Revised: 02/11/2018] [Accepted: 04/19/2018] [Indexed: 01/19/2023]
Abstract
Multisample U-statistics encompass a wide class of test statistics that allow the comparison of 2 or more distributions. U-statistics are especially powerful because they can be applied to both numeric and nonnumeric data, eg, ordinal and categorical data where a pairwise similarity or distance-like measure between categories is available. However, when comparing the distribution of a variable across 2 or more groups, observed differences may be due to confounding covariates. For example, in a case-control study, the distribution of exposure in cases may differ from that in controls entirely because of variables that are related to both exposure and case status and are distributed differently among case and control participants. We propose to use individually reweighted data (ie, using the stratification score for retrospective data or the propensity score for prospective data) to construct adjusted U-statistics that can test the equality of distributions across 2 (or more) groups in the presence of confounding covariates. Asymptotic normality of our adjusted U-statistics is established and a closed form expression of their asymptotic variance is presented. The utility of our approach is demonstrated through simulation studies, as well as in an analysis of data from a case-control study conducted among African-Americans, comparing whether the similarity in haplotypes (ie, sets of adjacent genetic loci inherited from the same parent) occurring in a case and a control participant differs from the similarity in haplotypes occurring in 2 control participants.
Collapse
Affiliation(s)
- Glen A Satten
- Division of Reproductive Health, Centers for Disease Control and Prevention, Atlanta, Georgia, USA
| | - Maiying Kong
- Department of Bioinformatics and Biostatistics, SPHIS, University of Louisville, Louisville, Kentucky, USA
| | - Somnath Datta
- Department of Biostatistics, University of Florida, Gainesville, Florida, USA
| |
Collapse
|
25
|
Su YR, Di C, Bien S, Huang L, Dong X, Abecasis G, Berndt S, Bezieau S, Brenner H, Caan B, Casey G, Chang-Claude J, Chanock S, Chen S, Connolly C, Curtis K, Figueiredo J, Gala M, Gallinger S, Harrison T, Hoffmeister M, Hopper J, Huyghe JR, Jenkins M, Joshi A, Le Marchand L, Newcomb P, Nickerson D, Potter J, Schoen R, Slattery M, White E, Zanke B, Peters U, Hsu L. A Mixed-Effects Model for Powerful Association Tests in Integrative Functional Genomics. Am J Hum Genet 2018; 102:904-919. [PMID: 29727690 PMCID: PMC5986723 DOI: 10.1016/j.ajhg.2018.03.019] [Citation(s) in RCA: 25] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2017] [Accepted: 03/15/2018] [Indexed: 01/05/2023] Open
Abstract
Genome-wide association studies (GWASs) have successfully identified thousands of genetic variants for many complex diseases; however, these variants explain only a small fraction of the heritability. Recently, genetic association studies that leverage external transcriptome data have received much attention and shown promise for discovering novel variants. One such approach, PrediXcan, is to use predicted gene expression through genetic regulation. However, there are limitations in this approach. The predicted gene expression may be biased, resulting from regularized regression applied to moderately sample-sized reference studies. Further, some variants can individually influence disease risk through alternative functional mechanisms besides expression. Thus, testing only the association of predicted gene expression as proposed in PrediXcan will potentially lose power. To tackle these challenges, we consider a unified mixed effects model that formulates the association of intermediate phenotypes such as imputed gene expression through fixed effects, while allowing residual effects of individual variants to be random. We consider a set-based score testing framework, MiST (mixed effects score test), and propose two data-driven combination approaches to jointly test for the fixed and random effects. We establish the asymptotic distributions, which enable rapid calculation of p values for genome-wide analyses, and provide p values for fixed and random effects separately to enhance interpretability over GWASs. Extensive simulations demonstrate that our approaches are more powerful than existing ones. We apply our approach to a large-scale GWAS of colorectal cancer and identify two genes, POU5F1B and ATF1, which would have otherwise been missed by PrediXcan, after adjusting for all known loci.
Collapse
Affiliation(s)
- Yu-Ru Su
- Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA.
| | - Chongzhi Di
- Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA
| | - Stephanie Bien
- Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA
| | - Licai Huang
- Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA
| | - Xinyuan Dong
- Department of Biostatistics, University of Washington, Seattle, WA 98195, USA
| | - Goncalo Abecasis
- Biostatistics, School of Public Health, University of Michigan, Ann Arbor, MI 48109, USA
| | - Sonja Berndt
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Rockville, MD 20850, USA
| | - Stephane Bezieau
- Service de Génétique Médicale Centre Hospitalier Universitaire (CHU) Nantes, Nantes 44093, France
| | - Hermann Brenner
- Division of Clinical Epidemiology and Aging Research, German Cancer Research Center (DKFZ), Heidelberg 69120, Germany
| | - Bette Caan
- Division of Research, Kaiser Permanente Medical Care Program of Northern California, Oakland, CA 94612, USA
| | - Graham Casey
- Public Health Sciences Division, University of Virginia, Charlottesville, VA 22908, USA
| | - Jenny Chang-Claude
- Division of Cancer Epidemiology, German Cancer Research Center, Heidelberg 69009, Germany
| | - Stephen Chanock
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Rockville, MD 20850, USA
| | - Sai Chen
- Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Charles Connolly
- Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA
| | - Keith Curtis
- Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA
| | - Jane Figueiredo
- Samuel Oschin Comprehensive Cancer Institute, Cedars-Sinai Medical Center, Los Angeles, CA 90048, USA
| | - Manish Gala
- Division of Gastroenterology, Massachusetts General Hospital, Boston, MA 02114, USA
| | - Steven Gallinger
- Department of Surgery, Mount Sinai Hospital, Toronto, ON M5G 1X5, Canada
| | - Tabitha Harrison
- Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA
| | - Michael Hoffmeister
- Division of Clinical Epidemiology and Aging Research, German Cancer Research Center (DKFZ), Heidelberg 69120, Germany
| | - John Hopper
- Melborne School of Population Health, The University of Melborne, Carlton, VIC 3010, Australia
| | - Jeroen R Huyghe
- Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA
| | - Mark Jenkins
- Melborne School of Population Health, The University of Melborne, Carlton, VIC 3010, Australia
| | - Amit Joshi
- Department of Epidemiology, Harvard School of Public Health, Boston, MA 02115, USA
| | - Loic Le Marchand
- Epidemiology Program, University of Hawaii Cancer Center, Honolulu, HI 96813, USA
| | - Polly Newcomb
- Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA; Department of Epidemiology, University of Washington School of Public Health, Seattle, WA 98109, USA
| | | | - John Potter
- Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA; Department of Epidemiology, University of Washington School of Public Health, Seattle, WA 98109, USA
| | - Robert Schoen
- Department of Medicine and Epidemiology, University of Pittsburgh Medical Center, Pittsburgh, PA 15213, USA
| | - Martha Slattery
- Department of Internal Medicine, University of Utah Health Sciences Center, Salt Lake City, UT 84132, USA
| | - Emily White
- Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA; Department of Epidemiology, University of Washington School of Public Health, Seattle, WA 98109, USA
| | - Brent Zanke
- Division of Hematology, Faculty of Medicine, The University of Ottawa, Ottawa, ON K1Y 4E9, USA
| | - Ulrike Peters
- Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA; Department of Epidemiology, University of Washington School of Public Health, Seattle, WA 98109, USA
| | - Li Hsu
- Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA; Department of Biostatistics, University of Washington, Seattle, WA 98195, USA.
| |
Collapse
|
26
|
Rouillard AD, Hurle MR, Agarwal P. Systematic interrogation of diverse Omic data reveals interpretable, robust, and generalizable transcriptomic features of clinically successful therapeutic targets. PLoS Comput Biol 2018; 14:e1006142. [PMID: 29782487 PMCID: PMC5983857 DOI: 10.1371/journal.pcbi.1006142] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2017] [Revised: 06/01/2018] [Accepted: 04/13/2018] [Indexed: 11/19/2022] Open
Abstract
Target selection is the first and pivotal step in drug discovery. An incorrect choice may not manifest itself for many years after hundreds of millions of research dollars have been spent. We collected a set of 332 targets that succeeded or failed in phase III clinical trials, and explored whether Omic features describing the target genes could predict clinical success. We obtained features from the recently published comprehensive resource: Harmonizome. Nineteen features appeared to be significantly correlated with phase III clinical trial outcomes, but only 4 passed validation schemes that used bootstrapping or modified permutation tests to assess feature robustness and generalizability while accounting for target class selection bias. We also used classifiers to perform multivariate feature selection and found that classifiers with a single feature performed as well in cross-validation as classifiers with more features (AUROC = 0.57 and AUPR = 0.81). The two predominantly selected features were mean mRNA expression across tissues and standard deviation of expression across tissues, where successful targets tended to have lower mean expression and higher expression variance than failed targets. This finding supports the conventional wisdom that it is favorable for a target to be present in the tissue(s) affected by a disease and absent from other tissues. Overall, our results suggest that it is feasible to construct a model integrating interpretable target features to inform target selection. We anticipate deeper insights and better models in the future, as researchers can reuse the data we have provided to improve methods for handling sample biases and learn more informative features. Code, documentation, and data for this study have been deposited on GitHub at https://github.com/arouillard/omic-features-successful-targets.
Collapse
Affiliation(s)
| | - Mark R. Hurle
- Computational Biology, GSK, Collegeville, PA, United States of America
| | - Pankaj Agarwal
- Computational Biology, GSK, Collegeville, PA, United States of America
| |
Collapse
|
27
|
Yu Y, Hu H, Bohlender RJ, Hu F, Chen JS, Holt C, Fowler J, Guthery SL, Scheet P, Hildebrandt MAT, Yandell M, Huff CD. XPAT: a toolkit to conduct cross-platform association studies with heterogeneous sequencing datasets. Nucleic Acids Res 2018; 46:e32. [PMID: 29294048 PMCID: PMC5888834 DOI: 10.1093/nar/gkx1280] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2017] [Revised: 12/07/2017] [Accepted: 12/20/2017] [Indexed: 12/12/2022] Open
Abstract
High-throughput sequencing data are increasingly being made available to the research community for secondary analyses, providing new opportunities for large-scale association studies. However, heterogeneity in target capture and sequencing technologies often introduce strong technological stratification biases that overwhelm subtle signals of association in studies of complex traits. Here, we introduce the Cross-Platform Association Toolkit, XPAT, which provides a suite of tools designed to support and conduct large-scale association studies with heterogeneous sequencing datasets. XPAT includes tools to support cross-platform aware variant calling, quality control filtering, gene-based association testing and rare variant effect size estimation. To evaluate the performance of XPAT, we conducted case-control association studies for three diseases, including 783 breast cancer cases, 272 ovarian cancer cases, 205 Crohn disease cases and 3507 shared controls (including 1722 females) using sequencing data from multiple sources. XPAT greatly reduced Type I error inflation in the case-control analyses, while replicating many previously identified disease-gene associations. We also show that association tests conducted with XPAT using cross-platform data have comparable performance to tests using matched platform data. XPAT enables new association studies that combine existing sequencing datasets to identify genetic loci associated with common diseases and other complex traits.
Collapse
Affiliation(s)
- Yao Yu
- Department of Epidemiology, University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA
| | - Hao Hu
- Department of Epidemiology, University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA
| | - Ryan J Bohlender
- Department of Epidemiology, University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA
| | - Fulan Hu
- Department of Epidemiology, University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA
- Department of Epidemiology, Public Health College, Harbin Medical University, Harbin, Heilongjiang 150081, China
| | - Jiun-Sheng Chen
- Department of Epidemiology, University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA
- The The University of Texas MD Anderson Cancer Center UTHealth Graduate School of Biomedical Sciences, Houston, TX 77030, USA
| | - Carson Holt
- Eccles Institute of Human Genetics, University of Utah, Salt Lake City, UT 84112, USA
| | - Jerry Fowler
- Department of Epidemiology, University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA
| | - Stephen L Guthery
- Department of Pediatrics, University of Utah School of Medicine, Salt Lake City, UT 84132, USA
| | - Paul Scheet
- Department of Epidemiology, University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA
| | - Michelle A T Hildebrandt
- Department of Epidemiology, University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA
| | - Mark Yandell
- Eccles Institute of Human Genetics, University of Utah, Salt Lake City, UT 84112, USA
| | - Chad D Huff
- Department of Epidemiology, University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA
| |
Collapse
|
28
|
Yu Y, Hu H, Chen JS, Hu F, Fowler J, Scheet P, Zhao H, Huff CD. Integrated case-control and somatic-germline interaction analyses of melanoma susceptibility genes. Biochim Biophys Acta Mol Basis Dis 2018; 1864:2247-2254. [PMID: 29317335 DOI: 10.1016/j.bbadis.2018.01.007] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2017] [Revised: 12/20/2017] [Accepted: 01/04/2018] [Indexed: 12/18/2022]
Abstract
While a number of genes have been implicated in melanoma susceptibility, the role of protein-coding variation in melanoma development and progression remains underexplored. To better characterize the role of germline coding variation in melanoma, we conducted a whole-exome case-control and somatic-germline interaction study involving 322 skin cutaneous melanoma cases from The Cancer Genome Atlas and 3607 controls of European ancestry. We controlled for cross-platform technological stratification using XPAT and conducted gene-based association tests using VAAST 2. Four established melanoma susceptibility genes achieved nominal statistical significance, MC1R (p = .0014), MITF (p = .0165) BRCA2 (p = .0206), and MTAP (p = .0393). We also observed a suggestive association for FANCA (p = .002), a gene previously implicated in melanoma survival. The association signal for BRCA2 was driven primarily by likely gene disrupting (LGD) variants, with an Odds Ratio (OR) of 5.62 (95% Confidence Interval (CI) 1.03-30.1). In contrast, the association signals for MC1R and MITF were driven primarily by predicted pathogenic missense variants, with estimated ORs of 1.4 to 3.0 for MC1R and 4.1 for MITF. MTAP exhibited an excess of both LGD and predicted damaging missense variants among cases, with ORs of 5.62 and 3.72, respectively, although neither category was significant. For individuals with known or predicted damaging variants, age of disease onset was significantly lower for two of the four genes, MC1R (p = .005) and MTAP (p = .035). In an analysis of germline carrier status and overlapping copy number alterations, we observed no evidence to support a two-hit model of carcinogenesis in any of the four genes. Although MC1R carriers were represented proportionally among the four molecular tumor subtypes, these individuals accounted for 69% of ultraviolet (UV) radiation mutational signatures among triple-wild type tumors (p = .040), highlighting the increased sensitivity to UV exposure among individuals with loss-of-function variants in MC1R.
Collapse
Affiliation(s)
- Yao Yu
- Department of Epidemiology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Hao Hu
- Department of Epidemiology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Jiun-Sheng Chen
- Department of Epidemiology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA; The University of Texas MD Anderson Cancer Center UTHealth Graduate School of Biomedical Sciences, Houston, TX, USA
| | - Fulan Hu
- Department of Epidemiology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA; Department of Epidemiology, Public Health College, Harbin Medical University, Harbin, Heilongjiang 150081, China
| | - Jerry Fowler
- Department of Epidemiology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Paul Scheet
- Department of Epidemiology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Hua Zhao
- Department of Epidemiology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Chad D Huff
- Department of Epidemiology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA.
| |
Collapse
|
29
|
Computational Inferring of Risk Subpathways Mediated by Dysfunctional Non-coding RNAs. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2018; 1094:87-95. [DOI: 10.1007/978-981-13-0719-5_9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]
|
30
|
Integrating gene and lncRNA expression to infer subpathway activity for tumor analyses. Oncotarget 2017; 8:111433-111443. [PMID: 29340065 PMCID: PMC5762333 DOI: 10.18632/oncotarget.22811] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2017] [Accepted: 11/16/2017] [Indexed: 02/01/2023] Open
Abstract
LncRNAs acting as miRNA sponges to indirectly regulate mRNAs is a novel layer of gene regulation, therefore, it is necessary to integrate lncRNA and gene levels for interpreting tumor biological mechanism. In this study, we developed a lncRNA-gene integrated strategy to infer functional activities for tumor analyses at the subpathway level. In this strategy, we reconstructed subpathway graphs by embedding lncRNA components and considered the expression levels of both genes and lncRNAs to infer subpathway activities for each tumor sample. And the activities were applied to three aspects of tumor analyses; First, the subpathway activities across tumor samples of five tumor types were analyzed, and it was observed that the samples with consistent subpathway activities were derived from the same or similar tumor types. Also, the subpathway activities could stratify samples into several subtypes which has different clinical characterization, e.g. survival status. Second, the subpathway activities between tumor and normal samples were analyzed, and the comparative results showed that subpathway activities displayed more specificities than entire pathway activities. Finally, based on the subpathway activities, we identified prognostic subpathways for lung cancer. Our subpathway-based signatures shared significant overlap with enrichment analysis results and displayed predictive power in the independent testing sets. In conclusion, our integrated strategy provided a framework to infer subpathway activities for tumor analyses and identify subpathway signatures for clinical use.
Collapse
|
31
|
Lutz SM, Fingerlin TE, Hokanson JE, Lange C. A general approach to testing for pleiotropy with rare and common variants. Genet Epidemiol 2017; 41:163-170. [PMID: 27900789 PMCID: PMC5472207 DOI: 10.1002/gepi.22011] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2016] [Revised: 08/01/2016] [Accepted: 09/19/2016] [Indexed: 12/22/2022]
Abstract
Through genome-wide association studies, numerous genes have been shown to be associated with multiple phenotypes. To determine the overlap of genetic susceptibility of correlated phenotypes, one can apply multivariate regression or dimension reduction techniques, such as principal components analysis, and test for the association with the principal components of the phenotypes rather than the individual phenotypes. However, as these approaches test whether there is a genetic effect for at least one of the phenotypes, a significant test result does not necessarily imply pleiotropy. Recently, a method called Pleiotropy Estimation and Test Bootstrap (PET-B) has been proposed to specifically test for pleiotropy (i.e., that two normally distributed phenotypes are both associated with the single nucleotide polymorphism of interest). Although the method examines the genetic overlap between the two quantitative phenotypes, the extension to binary phenotypes, three or more phenotypes, and rare variants is not straightforward. We provide two approaches to formally test this pleiotropic relationship in multiple scenarios. These approaches depend on permuting the phenotypes of interest and comparing the set of observed P-values to the set of permuted P-values in relation to the origin (e.g., a vector of zeros) either using the Hausdorff metric or a cutoff-based approach. These approaches are appropriate for categorical and quantitative phenotypes, more than two phenotypes, common variants and rare variants. We evaluate these approaches under various simulation scenarios and apply them to the COPDGene study, a case-control study of chronic obstructive pulmonary disease in current and former smokers.
Collapse
Affiliation(s)
- Sharon M Lutz
- Department of Biostatistics, University of Colorado, Anschutz Medical Campus, Aurora, CO, USA
| | - Tasha E Fingerlin
- Department of Biostatistics, University of Colorado, Anschutz Medical Campus, Aurora, CO, USA
- Center for Genes, Environment, and Health, National Jewish Health, Denver, CO, USA
| | - John E Hokanson
- Department of Epidemiology, University of Colorado, Anschutz Medical Campus, Aurora, CO, USA
| | - Christoph Lange
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| |
Collapse
|
32
|
Sha Q, Zhang K, Zhang S. A Nonparametric Regression Approach to Control for Population Stratification in Rare Variant Association Studies. Sci Rep 2016; 6:37444. [PMID: 27857226 PMCID: PMC5114546 DOI: 10.1038/srep37444] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2016] [Accepted: 10/28/2016] [Indexed: 01/31/2023] Open
Abstract
Recently, there is increasing interest to detect associations between rare variants and complex traits. Rare variant association studies usually need large sample sizes due to the rarity of the variants, and large sample sizes typically require combining information from different geographic locations within and across countries. Although several statistical methods have been developed to control for population stratification in common variant association studies, these methods are not necessarily controlling for population stratification in rare variant association studies. Thus, new statistical methods that can control for population stratification in rare variant association studies are needed. In this article, we propose a principal component based nonparametric regression (PC-nonp) approach to control for population stratification in rare variant association studies. Our simulations show that the proposed PC-nonp can control for population stratification well in all scenarios, while existing methods cannot control for population stratification at least in some scenarios. Simulations also show that PC-nonp's robustness to population stratification will not reduce power. Furthermore, we illustrate our proposed method by using whole genome sequencing data from genetic analysis workshop 18 (GAW18).
Collapse
Affiliation(s)
- Qiuying Sha
- Department of Mathematical Sciences, Michigan Technological University, Houghton, MI 49931, USA
| | - Kui Zhang
- Department of Mathematical Sciences, Michigan Technological University, Houghton, MI 49931, USA
| | - Shuanglin Zhang
- Department of Mathematical Sciences, Michigan Technological University, Houghton, MI 49931, USA
| |
Collapse
|
33
|
Block-based association tests for rare variants using Kullback–Leibler divergence. J Hum Genet 2016; 61:965-975. [DOI: 10.1038/jhg.2016.90] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2015] [Revised: 05/03/2016] [Accepted: 06/17/2016] [Indexed: 11/09/2022]
|
34
|
Shi X, Xu Y, Zhang C, Feng L, Sun Z, Han J, Su F, Zhang Y, Li C, Li X. Subpathway-LNCE: Identify dysfunctional subpathways competitively regulated by lncRNAs through integrating lncRNA-mRNA expression profile and pathway topologies. Oncotarget 2016; 7:69857-69870. [PMID: 27634882 PMCID: PMC5342520 DOI: 10.18632/oncotarget.12005] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2016] [Accepted: 09/02/2016] [Indexed: 12/14/2022] Open
Abstract
Recently, studies have reported that long noncoding RNAs (lncRNAs) can act as modulators of mRNAs through competitively binding to microRNAs (miRNAs) and have relevance to tumorigenesis as well as other diseases. Identify lncRNA competitively regulated subpathway not only can gain insight into the initiation and progression of disease, but also help for understanding the functional roles of lncRNAs in the disease context. Here, we present an effective method, Subpathway-LNCE, which was specifically designed to identify lncRNAs competitively regulated functions and the functional roles of these competitive regulation lncRNAs have not be well characterized in diseases. Moreover, the method integrated lncRNA-mRNA expression profile and pathway topologies. Using prostate cancer datasets and LUAD data sets, we confirmed the effectiveness of our method in identifying disease associated dysfunctional subpathway that regulated by lncRNAs. By analyzing kidney renal clear cell carcinoma related lncRNA competitively regulated subpathway network, we show that Subpathway-LNCE can help uncover disease key lncRNAs. Furthermore, we demonstrated that our method is reproducible and robust. Subpathway-LNCE provide a flexible tool to identify lncRNA competitively regulated signal subpathways underlying certain condition, and help to expound the functional roles of lncRNAs in various status. Subpathway-LNCE has been developed as an R package freely available at https://cran.rstudio.com/web/packages/SubpathwayLNCE/.
Collapse
Affiliation(s)
- Xinrui Shi
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150081, China
| | - Yanjun Xu
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150081, China
| | - Chunlong Zhang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150081, China
| | - Li Feng
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150081, China
| | - Zeguo Sun
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150081, China
| | - Junwei Han
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150081, China
| | - Fei Su
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150081, China
| | - Yunpeng Zhang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150081, China
| | - Chunquan Li
- Department of Medical Informatics, Daqing Campus, Harbin Medical University, Daqing, 163319, China
| | - Xia Li
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150081, China
| |
Collapse
|
35
|
Darst BF, Engelman CD. Transmission and decorrelation methods for detecting rare variants using sequencing data from related individuals. BMC Proc 2016; 10:203-207. [PMID: 27980637 PMCID: PMC5133523 DOI: 10.1186/s12919-016-0031-z] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023] Open
Abstract
BACKGROUND Advances in whole genome sequencing have enabled the investigation of rare variants, which could explain some of the missing heritability that genome-wide association studies are unable to detect. Most methods to detect associations with rare variants are developed for unrelated individuals; however, several methods exist that utilize family studies and could have better power to detect such associations. METHODS Using whole genome sequencing data and simulated phenotypes provided by the organizers of the Genetic Analysis Workshop 19 (GAW19), we compared family-based methods that test for associations between rare and common variants with a quantitative trait. This was done using 2 fairly novel methods: family-based association test for rare variants (FBAT-RV), which is a transmission-based method that utilizes the transmission of genetic information from parent to offspring; and Minimum p value Optimized Nuisance parameter Score Test Extended to Relatives (MONSTER), which is a decorrelation method that instead attempts to adjust for relatedness using a regression-based method. We also considered family-based association test linear combination (FBAT-LC) and FBAT-Min P, which are slightly older methods that do not allow for the weighting of rare or common variants, but contrast some of the limitations of FBAT-RV. RESULTS MONSTER had much higher overall power than FBAT-RV and FBAT-Min P. Interestingly, FBAT-LC had similar overall power as MONSTER. MONSTER had the highest power for a gene accounting for a larger percent of the phenotypic variance, whereas MONSTER and FBAT-LC both had the highest power for a gene accounting for moderate variance. FBAT-LC had the highest power for a gene accounting for the least variance. CONCLUSIONS Based on the simulated data from GAW19, MONSTER and FBAT-LC were the most powerful of the methods assessed. However, there are limitations to each of these methods that should be carefully considered when conducting an analysis of rare variants in related individuals. This emphasizes the need for methods that can incorporate the advantages of each of these methods into 1 family-based association test for rare variants.
Collapse
Affiliation(s)
- Burcu F. Darst
- University of Wisconsin, Madison, WI USA
- Department of Population Health Sciences, University of Wisconsin School of Medicine and Public Health, Madison, WI USA
| | - Corinne D. Engelman
- University of Wisconsin, Madison, WI USA
- Department of Population Health Sciences, University of Wisconsin School of Medicine and Public Health, Madison, WI USA
| |
Collapse
|
36
|
Prokopenko D, Hecker J, Silverman EK, Pagano M, Nöthen MM, Dina C, Lange C, Fier HL. Utilizing the Jaccard index to reveal population stratification in sequencing data: a simulation study and an application to the 1000 Genomes Project. Bioinformatics 2015; 32:1366-72. [PMID: 26722118 DOI: 10.1093/bioinformatics/btv752] [Citation(s) in RCA: 32] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2015] [Accepted: 12/19/2015] [Indexed: 11/12/2022] Open
Abstract
MOTIVATION Population stratification is one of the major sources of confounding in genetic association studies, potentially causing false-positive and false-negative results. Here, we present a novel approach for the identification of population substructure in high-density genotyping data/next generation sequencing data. The approach exploits the co-appearances of rare genetic variants in individuals. The method can be applied to all available genetic loci and is computationally fast. Using sequencing data from the 1000 Genomes Project, the features of the approach are illustrated and compared to existing methodology (i.e. EIGENSTRAT). We examine the effects of different cutoffs for the minor allele frequency on the performance of the approach. We find that our approach works particularly well for genetic loci with very small minor allele frequencies. The results suggest that the inclusion of rare-variant data/sequencing data in our approach provides a much higher resolution picture of population substructure than it can be obtained with existing methodology. Furthermore, in simulation studies, we find scenarios where our method was able to control the type 1 error more precisely and showed higher power. AVAILABILITY AND IMPLEMENTATION CONTACT dmitry.prokopenko@uni-bonn.de SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
| | - Julian Hecker
- Institute of Genomic Mathematics, University of Bonn, Bonn, Germany
| | | | - Marcello Pagano
- Department of Biostatistics, Harvard School of Public Health, Boston, USA
| | - Markus M Nöthen
- Institute of Human Genetics, University of Bonn, Bonn, Germany
| | - Christian Dina
- Institut National de la Santé et de la Recherche Médicale (INSERM) Unité Mixte de Recherche (UMR) 1087, l'institut du thorax, Nantes, France, Centre National de la Recherche Scientifique (CNRS) UMR 6291, l'institut du thorax, Nantes, France, Université de Nantes, l'institut du thorax, Nantes, France and Centre Hospitalier Universitaire (CHU) de Nantes, l'institut du thorax, Service de Cardiologie, Nantes, France
| | - Christoph Lange
- Channing Division of Network Medicine, Brigham and Women's Hospital, Department of Biostatistics, Harvard School of Public Health, Boston, USA
| | - Heide Loehlein Fier
- Institute of Genomic Mathematics, University of Bonn, Bonn, Germany, Department of Biostatistics, Harvard School of Public Health, Boston, USA
| |
Collapse
|
37
|
Genome-Wide Association Study of Staphylococcus aureus Carriage in a Community-Based Sample of Mexican-Americans in Starr County, Texas. PLoS One 2015; 10:e0142130. [PMID: 26569114 PMCID: PMC4646511 DOI: 10.1371/journal.pone.0142130] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2015] [Accepted: 10/16/2015] [Indexed: 02/07/2023] Open
Abstract
Staphylococcus aureus is the number one cause of hospital-acquired infections. Understanding host pathogen interactions is paramount to the development of more effective treatment and prevention strategies. Therefore, whole exome sequence and chip-based genotype data were used to conduct rare variant and genome-wide association analyses in a Mexican-American cohort from Starr County, Texas to identify genes and variants associated with S. aureus nasal carriage. Unlike most studies of S. aureus that are based on hospitalized populations, this study used a representative community sample. Two nasal swabs were collected from participants (n = 858) 11–17 days apart between October 2009 and December 2013, screened for the presence of S. aureus, and then classified as either persistent, intermittent, or non-carriers. The chip-based and exome sequence-based single variant association analyses identified 1 genome-wide significant region (KAT2B) for intermittent and 11 regions suggestively associated with persistent or intermittent S. aureus carriage. We also report top findings from gene-based burden analyses of rare functional variation. Notably, we observed marked differences between signals associated with persistent and intermittent carriage. In single variant analyses of persistent carriage, 7 of 9 genes in suggestively associated regions and all 5 top gene-based findings are associated with cell growth or tight junction integrity or are structural constituents of the cytoskeleton, suggesting that variation in genes associated with persistent carriage impact cellular integrity and morphology.
Collapse
|
38
|
Lee S, Fuchsberger C, Kim S, Scott L. An efficient resampling method for calibrating single and gene-based rare variant association analysis in case-control studies. Biostatistics 2015; 17:1-15. [PMID: 26363037 DOI: 10.1093/biostatistics/kxv033] [Citation(s) in RCA: 37] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2014] [Accepted: 08/10/2015] [Indexed: 11/14/2022] Open
Abstract
For aggregation tests of genes or regions, the set of included variants often have small total minor allele counts (MACs), and this is particularly true when the most deleterious sets of variants are considered. When MAC is low, commonly used asymptotic tests are not well calibrated for binary phenotypes and can have conservative or anti-conservative results and potential power loss. Empirical p-values obtained via resampling methods are computationally costly for highly significant p-values and the results can be conservative due to the discrete nature of resampling tests. Based on the observation that only the individuals containing minor alleles contribute to the score statistics, we develop an efficient resampling method for single and multiple variant score-based tests that can adjust for covariates. Our method can improve computational efficiency >1000-fold over conventional resampling for low MAC variant sets. We ameliorate the conservativeness of results through the use of mid-p-values. Using the estimated minimum achievable p-value for each test, we calibrate QQ plots and provide an effective number of tests. In analysis of a case-control study with deep exome sequence, we demonstrate that our methods are both well calibrated and also reduce computation time significantly compared with resampling methods.
Collapse
Affiliation(s)
- Seunggeun Lee
- Department of Biostatistics, University of Michigan, Ann Arbor, MI 48109, USA and Center for Statistical Genetics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Christian Fuchsberger
- Department of Biostatistics, University of Michigan, Ann Arbor, MI 48109, USA and Center for Statistical Genetics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Sehee Kim
- Department of Biostatistics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Laura Scott
- Department of Biostatistics, University of Michigan, Ann Arbor, MI 48109, USA and Center for Statistical Genetics, University of Michigan, Ann Arbor, MI 48109, USA
| |
Collapse
|
39
|
Zhou YH. Pathway analysis for RNA-Seq data using a score-based approach. Biometrics 2015; 72:165-74. [PMID: 26259845 DOI: 10.1111/biom.12372] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2014] [Revised: 06/01/2015] [Accepted: 06/01/2015] [Indexed: 11/27/2022]
Abstract
A variety of pathway/gene-set approaches have been proposed to provide evidence of higher-level biological phenomena in the association of expression with experimental condition or clinical outcome. Among these approaches, it has been repeatedly shown that resampling methods are far preferable to approaches that implicitly assume independence of genes. However, few approaches have been optimized for the specific characteristics of RNA-Seq transcription data, in which mapped tags produce discrete counts with varying library sizes, and with potential outliers or skewness patterns that violate parametric assumptions. We describe transformations to RNA-Seq data to improve power for linear associations with outcome and flexibly handle normalization factors. Using these transformations or alternate transformations, we apply recently developed null approximations to quadratic form statistics for both self-contained and competitive pathway testing. The approach provides a convenient integrated platform for RNA-Seq pathway testing. We demonstrate that the approach provides appropriate type I error control without actual permutation and is powerful under many settings in comparison to competing approaches. Pathway analysis of data from a study of F344 vs. HIV1Tg rats, and of sex differences in lymphoblastoid cell lines from humans, strongly supports the biological interpretability of the findings.
Collapse
Affiliation(s)
- Yi-Hui Zhou
- Bioinformatics Research Center, Department of Biological Sciences, North Carolina State University, Raleigh, North Carolina, U.S.A
| |
Collapse
|
40
|
Testing in Microbiome-Profiling Studies with MiRKAT, the Microbiome Regression-Based Kernel Association Test. Am J Hum Genet 2015; 96:797-807. [PMID: 25957468 DOI: 10.1016/j.ajhg.2015.04.003] [Citation(s) in RCA: 194] [Impact Index Per Article: 21.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2015] [Accepted: 04/07/2015] [Indexed: 01/05/2023] Open
Abstract
High-throughput sequencing technology has enabled population-based studies of the role of the human microbiome in disease etiology and exposure response. Distance-based analysis is a popular strategy for evaluating the overall association between microbiome diversity and outcome, wherein the phylogenetic distance between individuals' microbiome profiles is computed and tested for association via permutation. Despite their practical popularity, distance-based approaches suffer from important challenges, especially in selecting the best distance and extending the methods to alternative outcomes, such as survival outcomes. We propose the microbiome regression-based kernel association test (MiRKAT), which directly regresses the outcome on the microbiome profiles via the semi-parametric kernel machine regression framework. MiRKAT allows for easy covariate adjustment and extension to alternative outcomes while non-parametrically modeling the microbiome through a kernel that incorporates phylogenetic distance. It uses a variance-component score statistic to test for the association with analytical p value calculation. The model also allows simultaneous examination of multiple distances, alleviating the problem of choosing the best distance. Our simulations demonstrated that MiRKAT provides correctly controlled type I error and adequate power in detecting overall association. "Optimal" MiRKAT, which considers multiple candidate distances, is robust in that it suffers from little power loss in comparison to when the best distance is used and can achieve tremendous power gain in comparison to when a poor distance is chosen. Finally, we applied MiRKAT to real microbiome datasets to show that microbial communities are associated with smoking and with fecal protease levels after confounders are controlled for.
Collapse
|
41
|
Abney M. Permutation testing in the presence of polygenic variation. Genet Epidemiol 2015; 39:249-58. [PMID: 25758362 DOI: 10.1002/gepi.21893] [Citation(s) in RCA: 33] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2014] [Revised: 01/09/2015] [Accepted: 01/26/2015] [Indexed: 01/08/2023]
Abstract
This article discusses problems with and solutions to performing valid permutation tests for quantitative trait loci in the presence of polygenic effects. Although permutation testing is a popular approach for determining statistical significance of a test statistic with an unknown distribution--for instance, the maximum of multiple correlated statistics or some omnibus test statistic for a gene, gene-set, or pathway--naive application of permutations may result in an invalid test. The risk of performing an invalid permutation test is particularly acute in complex trait mapping where polygenicity may combine with a structured population resulting from the presence of families, cryptic relatedness, admixture, or population stratification. I give both analytical derivations and a conceptual understanding of why typical permutation procedures fail and suggest an alternative permutation-based algorithm, MVNpermute, that succeeds. In particular, I examine the case where a linear mixed model is used to analyze a quantitative trait and show that both phenotype and genotype permutations may result in an invalid permutation test. I provide a formula that predicts the amount of inflation of the type 1 error rate depending on the degree of misspecification of the covariance structure of the polygenic effect and the heritability of the trait. I validate this formula by doing simulations, showing that the permutation distribution matches the theoretical expectation, and that my suggested permutation-based test obtains the correct null distribution. Finally, I discuss situations where naive permutations of the phenotype or genotype are valid and the applicability of the results to other test statistics.
Collapse
Affiliation(s)
- Mark Abney
- Department of Human Genetics, University of Chicago, Chicago, Illinois, United States of America
| |
Collapse
|
42
|
Turkmen AS, Yan Z, Hu YQ, Lin S. Kullback-Leibler distance methods for detecting disease association with rare variants from sequencing data. Ann Hum Genet 2015; 79:199-208. [PMID: 25875492 DOI: 10.1111/ahg.12103] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2014] [Accepted: 12/07/2014] [Indexed: 11/26/2022]
Abstract
Because next generation sequencing technology that can rapidly genotype most genetic variations genome, there is considerable interest in investigating the effects of rare variants on complex diseases. In this paper, we propose four Kullback-Leibler distance-based Tests (KLTs) for detecting genotypic differences between cases and controls. There are several features that set the proposed tests apart from existing ones. First, by explicitly considering and comparing the distributions of genotypes, existence of variants with opposite directional effects does not compromise the power of KLTs. Second, it is not necessary to set a threshold for rare variants as the KL definition makes it reasonable to consider rare and common variants together without worrying about the contribution from one type overshadowing the other. Third, KLTs are robust to null variants thanks to a built-in noise fighting mechanism. Finally, correlation among variants is taken into account implicitly so the KLTs work well regardless of the underlying LD structure. Through extensive simulations, we demonstrated good performance of KLTs compared to the sum of squared score test (SSU) and optimal sequence kernel association test (SKAT-O). Moreover, application to the Dallas Heart Study data illustrates the feasibility and performance of KLTs in a realistic setting.
Collapse
Affiliation(s)
- Asuman S Turkmen
- Statistics Department, The Ohio State University, Columbus, OH, USA; The Ohio State University, Newark, OH, USA
| | | | | | | |
Collapse
|
43
|
Wang M, Lin S. Detecting associations of rare variants with common diseases: collapsing or haplotyping? Brief Bioinform 2015; 16:759-68. [PMID: 25596401 DOI: 10.1093/bib/bbu050] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2014] [Indexed: 01/11/2023] Open
Abstract
In recent years, a myriad of new statistical methods have been proposed for detecting associations of rare single-nucleotide variants (SNVs) with common diseases. These methods can be generally classified as 'collapsing' or 'haplotyping' based. The former is the predominant class, composed of most of the rare variant association methods proposed to date. However, recent works have suggested that haplotyping-based methods may offer advantages and can even be more powerful than collapsing methods in certain situations. In this article, we review and compare collapsing- versus haplotyping-based methods/software in terms of both power and type I error. For collapsing methods, we consider three approaches: Combined Multivariate and Collapsing, Sequence Kernel Association Test and Family-Based Association Test (FBAT): the first two are population based and are among the most popular; the last test is family based, a modification from the popular FBAT to accommodate rare SNVs. For haplotyping-based methods, we include Logistic Bayesian Lasso (LBL) for population data and family-based LBL (famLBL) for family (trio) data. These two methods are selected, as they can be used to test association for specific rare and common haplotypes. Our results show that haplotype methods can be more powerful than collapsing methods if there are interacting SNVs leading to larger haplotype effects. Even if only common SNVs are genotyped, haplotype methods can still detect specific rare haplotypes that tag rare causal SNVs. As expected, family-based methods are robust, whereas population-based methods are susceptible, to population substructure. However, the population-based haplotype approach appears to have smaller inflation of type I error than its collapsing counterparts.
Collapse
|
44
|
Satten GA, Biswas S, Papachristou C, Turkmen A, König IR. Population-based association and gene by environment interactions in Genetic Analysis Workshop 18. Genet Epidemiol 2014; 38 Suppl 1:S49-56. [PMID: 25112188 DOI: 10.1002/gepi.21825] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023]
Abstract
In the past decade, genome-wide association studies have been successful in identifying genetic loci that play a role in many complex diseases. Despite this, it has become clear that for many traits, investigation of single common variants does not give a complete picture of the genetic contribution to the phenotype. Therefore a number of new approaches are currently being investigated to further the search for susceptibility loci or regions. We summarize the contributions to Genetic Analysis Workshop 18 (GAW18) that concern this search using methods for population-based association analysis. Many of the members of our GAW18 working group made use of data types that have only recently become available through the use of next-generation sequencing technologies, with many focusing on the investigation of rare variants instead of or in combination with common variants. Some contributors used a haplotype-based approach, which to date has been used relatively infrequently but may become more important for analyzing rare variant association data. Others analyzed gene-gene or gene-environment interactions, where novel statistical approaches were needed to make the best use of the available information without requiring an excessive computational burden. GAW18 provided participants with the chance to make use of state-of-the-art data, statistical techniques, and technology. We report here some of the experiences and conclusions that were reached by workshop participants who analyzed the GAW18 data as a population-based association study.
Collapse
Affiliation(s)
- Glen A Satten
- Centers for Disease Control and Prevention, Atlanta, Georgia, United States of America
| | | | | | | | | |
Collapse
|
45
|
Jiang Y, Conneely KN, Epstein MP. Flexible and robust methods for rare-variant testing of quantitative traits in trios and nuclear families. Genet Epidemiol 2014; 38:542-51. [PMID: 25044337 DOI: 10.1002/gepi.21839] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2014] [Revised: 05/21/2014] [Accepted: 05/29/2014] [Indexed: 11/07/2022]
Abstract
Most rare-variant association tests for complex traits are applicable only to population-based or case-control resequencing studies. There are fewer rare-variant association tests for family-based resequencing studies, which is unfortunate because pedigrees possess many attractive characteristics for such analyses. Family-based studies can be more powerful than their population-based counterparts due to increased genetic load and further enable the implementation of rare-variant association tests that, by design, are robust to confounding due to population stratification. With this in mind, we propose a rare-variant association test for quantitative traits in families; this test integrates the QTDT approach of Abecasis et al. [Abecasis et al., ] into the kernel-based SNP association test KMFAM of Schifano et al. [Schifano et al., ]. The resulting within-family test enjoys the many benefits of the kernel framework for rare-variant association testing, including rapid evaluation of P-values and preservation of power when a region harbors rare causal variation that acts in different directions on phenotype. Additionally, by design, this within-family test is robust to confounding due to population stratification. Although within-family association tests are generally less powerful than their counterparts that use all genetic information, we show that we can recover much of this power (although still ensuring robustness to population stratification) using a straightforward screening procedure. Our method accommodates covariates and allows for missing parental genotype data, and we have written software implementing the approach in R for public use.
Collapse
Affiliation(s)
- Yunxuan Jiang
- Department of Biostatistics and Bioinformatics, Emory University, Atlanta, Georgia, United States of America
| | | | | |
Collapse
|
46
|
Hu H, Roach JC, Coon H, Guthery SL, Voelkerding KV, Margraf RL, Durtschi JD, Tavtigian SV, Shankaracharya, Wu W, Scheet P, Wang S, Xing J, Glusman G, Hubley R, Li H, Garg V, Moore B, Hood L, Galas DJ, Srivastava D, Reese MG, Jorde LB, Yandell M, Huff CD. A unified test of linkage analysis and rare-variant association for analysis of pedigree sequence data. Nat Biotechnol 2014; 32:663-9. [PMID: 24837662 PMCID: PMC4157619 DOI: 10.1038/nbt.2895] [Citation(s) in RCA: 86] [Impact Index Per Article: 8.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2013] [Accepted: 04/04/2014] [Indexed: 01/02/2023]
Abstract
High-throughput sequencing of related individuals has become an important tool for studying human disease. However, owing to technical complexity and lack of available tools, most pedigree-based sequencing studies rely on an ad hoc combination of suboptimal analyses. Here we present pedigree-VAAST (pVAAST), a disease-gene identification tool designed for high-throughput sequence data in pedigrees. pVAAST uses a sequence-based model to perform variant and gene-based linkage analysis. Linkage information is then combined with functional prediction and rare variant case-control association information in a unified statistical framework. pVAAST outperformed linkage and rare-variant association tests in simulations and identified disease-causing genes from whole-genome sequence data in three human pedigrees with dominant, recessive and de novo inheritance patterns. The approach is robust to incomplete penetrance and locus heterogeneity and is applicable to a wide variety of genetic traits. pVAAST maintains high power across studies of monogenic, high-penetrance phenotypes in a single pedigree to highly polygenic, common phenotypes involving hundreds of pedigrees.
Collapse
Affiliation(s)
- Hao Hu
- Department of Epidemiology, The University of Texas M.D. Anderson Cancer Center, Houston, Texas, USA
| | - Jared C Roach
- Institute for Systems Biology, Seattle, Washington, USA
| | - Hilary Coon
- Department of Psychiatry, University of Utah, Salt Lake City, Utah, USA
| | - Stephen L Guthery
- Department of Pediatrics, University of Utah, Salt Lake City, Utah, USA
| | - Karl V Voelkerding
- 1] Department of Pathology, University of Utah School of Medicine, Salt Lake City, Utah, USA. [2] ARUP Institute for Clinical and Experimental Pathology, ARUP Laboratories, Salt Lake City, Utah, USA
| | - Rebecca L Margraf
- ARUP Institute for Clinical and Experimental Pathology, ARUP Laboratories, Salt Lake City, Utah, USA
| | - Jacob D Durtschi
- ARUP Institute for Clinical and Experimental Pathology, ARUP Laboratories, Salt Lake City, Utah, USA
| | - Sean V Tavtigian
- Department of Oncological Sciences, Huntsman Cancer Institute, University of Utah, Salt Lake City, Utah, USA
| | - Shankaracharya
- Department of Epidemiology, The University of Texas M.D. Anderson Cancer Center, Houston, Texas, USA
| | - Wilfred Wu
- Department of Human Genetics and USTAR Center for Genetic Discovery, University of Utah, Salt Lake City, Utah, USA
| | - Paul Scheet
- Department of Epidemiology, The University of Texas M.D. Anderson Cancer Center, Houston, Texas, USA
| | - Shuoguo Wang
- Department of Genetics, Rutgers, the State University of New Jersey, Piscataway, New Jersey, USA
| | - Jinchuan Xing
- Department of Genetics, Rutgers, the State University of New Jersey, Piscataway, New Jersey, USA
| | | | - Robert Hubley
- Institute for Systems Biology, Seattle, Washington, USA
| | - Hong Li
- Institute for Systems Biology, Seattle, Washington, USA
| | - Vidu Garg
- 1] Department of Pediatrics, The Ohio State University, Columbus, Ohio, USA. [2] Center for Cardiovascular and Pulmonary Research, Research Institute at Nationwide Children's Hospital, Columbus, Ohio, USA
| | - Barry Moore
- Department of Human Genetics and USTAR Center for Genetic Discovery, University of Utah, Salt Lake City, Utah, USA
| | - Leroy Hood
- Institute for Systems Biology, Seattle, Washington, USA
| | - David J Galas
- 1] Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Esch-sur-Alzette, Luxembourg. [2] Pacific Northwest Diabetes Research Institute, Seattle, Washington, USA
| | - Deepak Srivastava
- Gladstone Institute of Cardiovascular Disease and University of California, San Francisco, San Francisco, California, USA
| | | | - Lynn B Jorde
- Department of Human Genetics and USTAR Center for Genetic Discovery, University of Utah, Salt Lake City, Utah, USA
| | - Mark Yandell
- Department of Human Genetics and USTAR Center for Genetic Discovery, University of Utah, Salt Lake City, Utah, USA
| | - Chad D Huff
- Department of Epidemiology, The University of Texas M.D. Anderson Cancer Center, Houston, Texas, USA
| |
Collapse
|
47
|
Shen X, Espin-Garcia O, Qiu X, Brhane Y, Liu G, Xu W. Haplotype approach for association analysis on hypertension. BMC Proc 2014; 8:S57. [PMID: 25519392 PMCID: PMC4143719 DOI: 10.1186/1753-6561-8-s1-s57] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2023] Open
Abstract
We applied a gene-based haplotype approach for the genome-wide association analysis on hypertension using Genetic Analysis Workshop 18 data for unrelated individuals. Association of single-nucleotide polymorphisms and clinical outcome were first assessed and haplotypes were then constructed based on the gene information and the linkage disequilibrium plot. Extensive haplotype analysis was also conducted for the whole chromosome 3. We found 1 block from the ULK4 gene and 2 blocks from the LOC64690 gene that were significantly associated with hypertension.
Collapse
Affiliation(s)
- Xiaowei Shen
- Department of Biostatistics, Princess Margaret Cancer Centre, 610 University Avenue, Toronto, Ontario, Canada M5G 2M9 ; Department of Statistics and Actuarial Science, University of Waterloo, 200 University Avenue West, Waterloo, Ontario, Canada N2L 4G1
| | - Osvaldo Espin-Garcia
- Department of Biostatistics, Princess Margaret Cancer Centre, 610 University Avenue, Toronto, Ontario, Canada M5G 2M9 ; Department of Statistics and Actuarial Science, University of Waterloo, 200 University Avenue West, Waterloo, Ontario, Canada N2L 4G1
| | - Xin Qiu
- Department of Biostatistics, Princess Margaret Cancer Centre, 610 University Avenue, Toronto, Ontario, Canada M5G 2M9
| | - Yonathan Brhane
- Samuel Lunenfeld Research Institute, Mount Sinai Hospital, 60 Murray Street, Toronto, Ontario, Canada M5T 3L9
| | - Geoffrey Liu
- Ontario Cancer Institute/Princess Margaret Cancer Centre, 610 University Avenue, Toronto, Ontario, Canada M5G 2M9 ; Dalla Lana School of Public Health, University of Toronto, 155 College Street, Toronto, Ontario, Canada M5T 3M7
| | - Wei Xu
- Department of Biostatistics, Princess Margaret Cancer Centre, 610 University Avenue, Toronto, Ontario, Canada M5G 2M9 ; Dalla Lana School of Public Health, University of Toronto, 155 College Street, Toronto, Ontario, Canada M5T 3M7
| |
Collapse
|
48
|
Faino A, Powell A, Williams A, Silveira L. Identifying rare variants associated with hypertension using the C-alpha test. BMC Proc 2014; 8:S56. [PMID: 25519391 PMCID: PMC4143634 DOI: 10.1186/1753-6561-8-s1-s56] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
Important rare variants may be near significantly associated common variants based on genetic distance. For this reason, we conducted an analysis of rare variants informed by tests of single-marker association at loci with common variants. We identified highly significant common variants within chromosome 3, as well as rare variants around these locations. Based on a predetermined window size, we then analyzed these rare variants with the C-alpha test to determine significant associations with hypertension. We found significant rare variants around common variants; however, the C-alpha test was sensitive to the specified window size. When comparing markers in genes to markers not in genes, we found that markers not in genes had more significant C-alpha test p values than markers in genes.
Collapse
Affiliation(s)
- Anna Faino
- Department of Biostatistics and Bioinformatics, National Jewish Health, 1400 Jackson Street, Denver, Colorado 80206, USA
| | - Amber Powell
- Department of Biostatistics and Bioinformatics, National Jewish Health, 1400 Jackson Street, Denver, Colorado 80206, USA
| | - André Williams
- Department of Biostatistics and Bioinformatics, National Jewish Health, 1400 Jackson Street, Denver, Colorado 80206, USA
| | - Lori Silveira
- Department of Biostatistics and Bioinformatics, National Jewish Health, 1400 Jackson Street, Denver, Colorado 80206, USA
| |
Collapse
|
49
|
Kinnamon DD, Martin ER. Valid Monte Carlo permutation tests for genetic case-control studies with missing genotypes. Genet Epidemiol 2014; 38:325-44. [PMID: 24723341 PMCID: PMC6391735 DOI: 10.1002/gepi.21805] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2013] [Revised: 12/30/2013] [Accepted: 02/28/2014] [Indexed: 02/04/2023]
Abstract
Monte Carlo permutation tests can be formally constructed by choosing a set of permutations of individual indices and a real-valued test statistic measuring the association between genotypes and affection status. In this paper, we develop a rigorous theoretical framework for verifying the validity of these tests when there are missing genotypes. We begin by specifying a nonparametric probability model for the observed genotype data in a genetic case-control study with unrelated subjects. Under this model and some minimal assumptions about the test statistic, we establish that the resulting Monte Carlo permutation test is exact level α if (1) the chosen set of permutations of individual indices is a group under composition and (2) the distribution of the observed genotype score matrix under the null hypothesis does not change if the assignment of individuals to rows is shuffled according to an arbitrary permutation in this set. We apply these conditions to show that frequently used Monte Carlo permutation tests based on the set of all permutations of individual indices are guaranteed to be exact level α only for missing data processes satisfying a rather restrictive additional assumption. However, if the missing data process depends on covariates that are all identified and recorded, we also show that Monte Carlo permutation tests based on the set of permutations within strata of individuals with identical covariate values are exact level α. Our theoretical results are verified and supplemented by simulations for a variety of missing data processes and test statistics.
Collapse
Affiliation(s)
- Daniel D. Kinnamon
- Division of Human Genetics, Department of Internal Medicine, The
Ohio State University Wexner Medical Center, Columbus, OH, USA
- Dr. John T. Macdonald Foundation Department of Human Genetics,
University of Miami Miller School of Medicine, Miami, FL, USA
| | - Eden R. Martin
- Dr. John T. Macdonald Foundation Department of Human Genetics,
University of Miami Miller School of Medicine, Miami, FL, USA
| |
Collapse
|
50
|
Yan S, Li Y. BETASEQ: a powerful novel method to control type-I error inflation in partially sequenced data for rare variant association testing. ACTA ACUST UNITED AC 2014; 30:480-7. [PMID: 24336643 DOI: 10.1093/bioinformatics/btt719] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]
Abstract
SUMMARY Despite its great capability to detect rare variant associations, next-generation sequencing is still prohibitively expensive when applied to large samples. In case-control studies, it is thus appealing to sequence only a subset of cases to discover variants and genotype the identified variants in controls and the remaining cases under the reasonable assumption that causal variants are usually enriched among cases. However, this approach leads to inflated type-I error if analyzed naively for rare variant association. Several methods have been proposed in recent literature to control type-I error at the cost of either excluding some sequenced cases or correcting the genotypes of discovered rare variants. All of these approaches thus suffer from certain extent of information loss and thus are underpowered. We propose a novel method (BETASEQ), which corrects inflation of type-I error by supplementing pseudo-variants while keeps the original sequence and genotype data intact. Extensive simulations and real data analysis demonstrate that, in most practical situations, BETASEQ leads to higher testing powers than existing approaches with guaranteed (controlled or conservative) type-I error. AVAILABILITY AND IMPLEMENTATION BETASEQ and associated R files, including documentation, examples, are available at http://www.unc.edu/~yunmli/betaseq
Collapse
Affiliation(s)
- Song Yan
- Department of Biostatistics, University of North Carolina, 3101 McGavran-Greenberg Hall, Chapel Hill, NC 27599, USA, Department of Genetics, University of North Carolina, Chapel Hill, NC 27599, USA and Department of Computer Science, University of North Carolina, Chapel Hill, NC 27599, USA
| | | |
Collapse
|