1
|
Barrie W, Yang Y, Irving-Pease EK, Attfield KE, Scorrano G, Jensen LT, Armen AP, Dimopoulos EA, Stern A, Refoyo-Martinez A, Pearson A, Ramsøe A, Gaunitz C, Demeter F, Jørkov MLS, Møller SB, Springborg B, Klassen L, Hyldgård IM, Wickmann N, Vinner L, Korneliussen TS, Allentoft ME, Sikora M, Kristiansen K, Rodriguez S, Nielsen R, Iversen AKN, Lawson DJ, Fugger L, Willerslev E. Elevated genetic risk for multiple sclerosis emerged in steppe pastoralist populations. Nature 2024; 625:321-328. [PMID: 38200296 PMCID: PMC10781639 DOI: 10.1038/s41586-023-06618-z] [Citation(s) in RCA: 15] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2022] [Accepted: 09/06/2023] [Indexed: 01/12/2024]
Abstract
Multiple sclerosis (MS) is a neuro-inflammatory and neurodegenerative disease that is most prevalent in Northern Europe. Although it is known that inherited risk for MS is located within or in close proximity to immune-related genes, it is unknown when, where and how this genetic risk originated1. Here, by using a large ancient genome dataset from the Mesolithic period to the Bronze Age2, along with new Medieval and post-Medieval genomes, we show that the genetic risk for MS rose among pastoralists from the Pontic steppe and was brought into Europe by the Yamnaya-related migration approximately 5,000 years ago. We further show that these MS-associated immunogenetic variants underwent positive selection both within the steppe population and later in Europe, probably driven by pathogenic challenges coinciding with changes in diet, lifestyle and population density. This study highlights the critical importance of the Neolithic period and Bronze Age as determinants of modern immune responses and their subsequent effect on the risk of developing MS in a changing environment.
Collapse
Affiliation(s)
- William Barrie
- Department of Zoology, University of Cambridge, Cambridge, UK
- Department of Genetics, University of Cambridge, Cambridge, UK
| | - Yaoling Yang
- Department of Statistical Sciences, School of Mathematics, University of Bristol, Bristol, UK
- MRC Integrative Epidemiology Unit, Population Health Sciences, University of Bristol, Bristol, UK
| | - Evan K Irving-Pease
- Lundbeck Foundation GeoGenetics Centre, Globe Institute, University of Copenhagen, Copenhagen, Denmark
| | - Kathrine E Attfield
- Oxford Centre for Neuroinflammation, Nuffield Department of Clinical Neurosciences, John Radcliffe Hospital, University of Oxford, Oxford, UK
| | - Gabriele Scorrano
- Lundbeck Foundation GeoGenetics Centre, Globe Institute, University of Copenhagen, Copenhagen, Denmark
| | - Lise Torp Jensen
- Oxford Centre for Neuroinflammation, Nuffield Department of Clinical Neurosciences, John Radcliffe Hospital, University of Oxford, Oxford, UK
- Department of Clinical Medicine, Aarhus University Hospital, Aarhus, Denmark
| | - Angelos P Armen
- Oxford Centre for Neuroinflammation, Nuffield Department of Clinical Neurosciences, John Radcliffe Hospital, University of Oxford, Oxford, UK
| | | | - Aaron Stern
- Departments of Integrative Biology and Statistics, University of California, Berkeley, Berkeley, CA, USA
| | - Alba Refoyo-Martinez
- Lundbeck Foundation GeoGenetics Centre, Globe Institute, University of Copenhagen, Copenhagen, Denmark
| | - Alice Pearson
- Department of Genetics, University of Cambridge, Cambridge, UK
| | - Abigail Ramsøe
- Lundbeck Foundation GeoGenetics Centre, Globe Institute, University of Copenhagen, Copenhagen, Denmark
| | - Charleen Gaunitz
- Lundbeck Foundation GeoGenetics Centre, Globe Institute, University of Copenhagen, Copenhagen, Denmark
| | - Fabrice Demeter
- Lundbeck Foundation GeoGenetics Centre, Globe Institute, University of Copenhagen, Copenhagen, Denmark
- Eco-anthropologie (EA), Muséum National d'Histoire Naturelle, CNRS, Université de Paris, Musée de l'Homme, Paris, France
| | - Marie Louise S Jørkov
- Laboratory of Biological Anthropology, Department of Forensic Medicine, University of Copenhagen, Copenhagen, Denmark
| | | | | | - Lutz Klassen
- Museum Østdanmark-Djursland og Randers, Randers, Denmark
| | | | | | - Lasse Vinner
- Lundbeck Foundation GeoGenetics Centre, Globe Institute, University of Copenhagen, Copenhagen, Denmark
| | | | - Morten E Allentoft
- Lundbeck Foundation GeoGenetics Centre, Globe Institute, University of Copenhagen, Copenhagen, Denmark
- Trace and Environmental DNA (TrEnD) Laboratory, School of Molecular and Life Sciences, Curtin University, Perth, Western Australia, Australia
| | - Martin Sikora
- Lundbeck Foundation GeoGenetics Centre, Globe Institute, University of Copenhagen, Copenhagen, Denmark
| | - Kristian Kristiansen
- Lundbeck Foundation GeoGenetics Centre, Globe Institute, University of Copenhagen, Copenhagen, Denmark
- Department of Historical Studies, University of Gothenburg, Gothenburg, Sweden
| | - Santiago Rodriguez
- MRC Integrative Epidemiology Unit, Population Health Sciences, University of Bristol, Bristol, UK
| | - Rasmus Nielsen
- Lundbeck Foundation GeoGenetics Centre, Globe Institute, University of Copenhagen, Copenhagen, Denmark
- Departments of Integrative Biology and Statistics, University of California, Berkeley, Berkeley, CA, USA
| | - Astrid K N Iversen
- Oxford Centre for Neuroinflammation, Nuffield Department of Clinical Neurosciences, John Radcliffe Hospital, University of Oxford, Oxford, UK.
- Nuffield Department of Clinical Neurosciences, John Radcliffe Hospital, University of Oxford, Oxford, UK.
| | - Daniel J Lawson
- Department of Statistical Sciences, School of Mathematics, University of Bristol, Bristol, UK.
- MRC Integrative Epidemiology Unit, Population Health Sciences, University of Bristol, Bristol, UK.
| | - Lars Fugger
- Oxford Centre for Neuroinflammation, Nuffield Department of Clinical Neurosciences, John Radcliffe Hospital, University of Oxford, Oxford, UK.
- Department of Clinical Medicine, Aarhus University Hospital, Aarhus, Denmark.
- MRC Human Immunology Unit, John Radcliffe Hospital, University of Oxford, Oxford, UK.
| | - Eske Willerslev
- Department of Zoology, University of Cambridge, Cambridge, UK.
- Lundbeck Foundation GeoGenetics Centre, Globe Institute, University of Copenhagen, Copenhagen, Denmark.
- MARUM Center for Marine Environmental Sciences and Faculty of Geosciences, University of Bremen, Bremen, Germany.
| |
Collapse
|
2
|
Witte J, Foraita R, Didelez V. Multiple imputation and test-wise deletion for causal discovery with incomplete cohort data. Stat Med 2022; 41:4716-4743. [PMID: 35908775 DOI: 10.1002/sim.9535] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2021] [Revised: 06/12/2022] [Accepted: 07/11/2022] [Indexed: 11/08/2022]
Abstract
Causal discovery algorithms estimate causal graphs from observational data. This can provide a valuable complement to analyses focusing on the causal relation between individual treatment-outcome pairs. Constraint-based causal discovery algorithms rely on conditional independence testing when building the graph. Until recently, these algorithms have been unable to handle missing values. In this article, we investigate two alternative solutions: test-wise deletion and multiple imputation. We establish necessary and sufficient conditions for the recoverability of causal structures under test-wise deletion, and argue that multiple imputation is more challenging in the context of causal discovery than for estimation. We conduct an extensive comparison by simulating from benchmark causal graphs: as one might expect, we find that test-wise deletion and multiple imputation both clearly outperform list-wise deletion and single imputation. Crucially, our results further suggest that multiple imputation is especially useful in settings with a small number of either Gaussian or discrete variables, but when the dataset contains a mix of both neither method is uniformly best. The methods we compare include random forest imputation and a hybrid procedure combining test-wise deletion and multiple imputation. An application to data from the IDEFICS cohort study on diet- and lifestyle-related diseases in European children serves as an illustrating example.
Collapse
Affiliation(s)
- Janine Witte
- Leibniz Institute for Prevention Research and Epidemiology - BIPS, Bremen, Germany.,Faculty of Mathematics and Computer Science, University of Bremen, Bremen, Germany
| | - Ronja Foraita
- Leibniz Institute for Prevention Research and Epidemiology - BIPS, Bremen, Germany
| | - Vanessa Didelez
- Leibniz Institute for Prevention Research and Epidemiology - BIPS, Bremen, Germany.,Faculty of Mathematics and Computer Science, University of Bremen, Bremen, Germany
| |
Collapse
|
3
|
Saxe GN, Bickman L, Ma S, Aliferis C. Mental health progress requires causal diagnostic nosology and scalable causal discovery. Front Psychiatry 2022; 13:898789. [PMID: 36458123 PMCID: PMC9705733 DOI: 10.3389/fpsyt.2022.898789] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/17/2022] [Accepted: 10/10/2022] [Indexed: 11/17/2022] Open
Abstract
Nine hundred and seventy million individuals across the globe are estimated to carry the burden of a mental disorder. Limited progress has been achieved in alleviating this burden over decades of effort, compared to progress achieved for many other medical disorders. Progress on outcome improvement for all medical disorders, including mental disorders, requires research capable of discovering causality at sufficient scale and speed, and a diagnostic nosology capable of encoding the causal knowledge that is discovered. Accordingly, the field's guiding paradigm limits progress by maintaining: (a) a diagnostic nosology (DSM-5) with a profound lack of causality; (b) a misalignment between mental health etiologic research and nosology; (c) an over-reliance on clinical trials beyond their capabilities; and (d) a limited adoption of newer methods capable of discovering the complex etiology of mental disorders. We detail feasible directions forward, to achieve greater levels of progress on improving outcomes for mental disorders, by: (a) the discovery of knowledge on the complex etiology of mental disorders with application of Causal Data Science methods; and (b) the encoding of the etiological knowledge that is discovered within a causal diagnostic system for mental disorders.
Collapse
Affiliation(s)
- Glenn N Saxe
- Department of Child and Adolescent Psychiatry, New York University Grossman School of Medicine, New York, NY, United States
| | - Leonard Bickman
- Ontrak Health, Inc., Henderson, NV, United States.,Department of Psychology, Florida International University, Miami, FL, United States
| | - Sisi Ma
- Program in Data Science, Department of Medicine, Clinical and Translational Science Institute, Institute for Health Informatics, School of Medicine, University of Minnesota, Minneapolis, MN, United States
| | - Constantin Aliferis
- Program in Data Science, Department of Medicine, Clinical and Translational Science Institute, Institute for Health Informatics, School of Medicine, University of Minnesota, Minneapolis, MN, United States
| |
Collapse
|
4
|
Saxe GN, Ma S, Morales LJ, Galatzer-Levy IR, Aliferis C, Marmar CR. Computational causal discovery for post-traumatic stress in police officers. Transl Psychiatry 2020; 10:233. [PMID: 32778671 PMCID: PMC7417525 DOI: 10.1038/s41398-020-00910-6] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/29/2019] [Revised: 06/15/2020] [Accepted: 06/18/2020] [Indexed: 11/09/2022] Open
Abstract
This article reports on a study aimed to elucidate the complex etiology of post-traumatic stress (PTS) in a longitudinal cohort of police officers, by applying rigorous computational causal discovery (CCD) methods with observational data. An existing observational data set was used, which comprised a sample of 207 police officers who were recruited upon entry to police academy training. Participants were evaluated on a comprehensive set of clinical, self-report, genetic, neuroendocrine and physiological measures at baseline during academy training and then were re-evaluated at 12 months after training was completed. A data-processing pipeline-the Protocol for Computational Causal Discovery in Psychiatry (PCCDP)-was applied to this data set to determine a causal model for PTS severity. A causal model of 146 variables and 345 bivariate relations was discovered. This model revealed 5 direct causes and 83 causal pathways (of four steps or less) to PTS at 12 months of police service. Direct causes included single-nucleotide polymorphisms (SNPs) for the Histidine Decarboxylase (HDC) and Mineralocorticoid Receptor (MR) genes, acoustic startle in the context of low perceived threat during training, peritraumatic distress to incident exposure during first year of service, and general symptom severity during training at 1 year of service. The application of CCD methods can determine variables and pathways related to the complex etiology of PTS in a cohort of police officers. This knowledge may inform new approaches to treatment and prevention of critical incident related PTS.
Collapse
Affiliation(s)
- Glenn N. Saxe
- grid.137628.90000 0004 1936 8753Department of Child and Adolescent Psychiatry, New York University School of Medicine, New York, NY USA
| | - Sisi Ma
- grid.17635.360000000419368657Institute of Health Informatics, University of Minnesota School of Medicine, Minneapolis, MN USA
| | - Leah J. Morales
- grid.137628.90000 0004 1936 8753Perlmutter Cancer Center, New York University School of Medicine, New York, NY USA
| | - Isaac R. Galatzer-Levy
- grid.137628.90000 0004 1936 8753Department of Psychiatry, New York University School of Medicine, New York, NY USA
| | - Constantin Aliferis
- grid.17635.360000000419368657Institute of Health Informatics, University of Minnesota School of Medicine, Minneapolis, MN USA
| | - Charles R. Marmar
- grid.137628.90000 0004 1936 8753Department of Psychiatry, New York University School of Medicine, New York, NY USA
| |
Collapse
|
5
|
Saad MN, Mabrouk MS, Eldeib AM, Shaker OG. Comparative study for haplotype block partitioning methods - Evidence from chromosome 6 of the North American Rheumatoid Arthritis Consortium (NARAC) dataset. PLoS One 2019; 13:e0209603. [PMID: 30596705 PMCID: PMC6312333 DOI: 10.1371/journal.pone.0209603] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2018] [Accepted: 12/07/2018] [Indexed: 11/19/2022] Open
Abstract
Haplotype-based methods compete with “one-SNP-at-a-time” approaches on being preferred for association studies. Chromosome 6 contains most of the known genetic biomarkers for rheumatoid arthritis (RA) disease. Therefore, chromosome 6 serves as a benchmark for the haplotype methods testing. The aim of this study is to test the North American Rheumatoid Arthritis Consortium (NARAC) dataset to find out if haplotype block methods or single-locus approaches alone can sufficiently provide the significant single nucleotide polymorphisms (SNPs) associated with RA. In addition, could we be satisfied with only one method of the haplotype block methods for partitioning chromosome 6 of the NARAC dataset? In the NARAC dataset, chromosome 6 comprises 35,574 SNPs for 2,062 individuals (868 cases, 1,194 controls). Individual SNP approach and three haplotype block methods were applied to the NARAC dataset to identify the RA biomarkers. We employed three haplotype partitioning methods which are confidence interval test (CIT), four gamete test (FGT), and solid spine of linkage disequilibrium (SSLD). P-values after stringent Bonferroni correction for multiple testing were measured to assess the strength of association between the genetic variants and RA susceptibility. Moreover, the block size (in base pairs (bp) and number of SNPs included), number of blocks, percentage of uncovered SNPs by the block method, percentage of significant blocks from the total number of blocks, number of significant haplotypes and SNPs were used to compare among the three haplotype block methods. Individual SNP, CIT, FGT, and SSLD methods detected 432, 1,086, 1,099, and 1,322 associated SNPs, respectively. Each method identified significant SNPs that were not detected by any other method (Individual SNP: 12, FGT: 37, CIT: 55, and SSLD: 189 SNPs). 916 SNPs were discovered by all the three haplotype block methods. 367 SNPs were discovered by the haplotype block methods and the individual SNP approach. The P-values of these 367 SNPs were lower than those of the SNPs uniquely detected by only one method. The 367 SNPs detected by all the methods represent promising candidates for RA susceptibility. They should be further investigated for the European population. A hybrid technique including the four methods should be applied to detect the significant SNPs associated with RA for chromosome 6 of the NARAC dataset. Moreover, SSLD method may be preferred for its favored benefits in case of selecting only one method.
Collapse
Affiliation(s)
- Mohamed N. Saad
- Biomedical Engineering Department, Faculty of Engineering, Minia University, Minia, Egypt
- * E-mail: ,
| | - Mai S. Mabrouk
- Biomedical Engineering Department, Faculty of Engineering, Misr University for Science and Technology (MUST), 6th of October City, Egypt
| | - Ayman M. Eldeib
- Systems and Biomedical Engineering Department, Faculty of Engineering, Cairo University, Giza, Egypt
| | - Olfat G. Shaker
- Medical Biochemistry and Molecular Biology Department, Faculty of Medicine, Cairo University, Cairo, Egypt
| |
Collapse
|
6
|
Kang M, Park J, Kim DC, Biswas AK, Liu C, Gao J. Multi-Block Bipartite Graph for Integrative Genomic Analysis. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2017; 14:1350-1358. [PMID: 27429442 DOI: 10.1109/tcbb.2016.2591521] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Human diseases involve a sequence of complex interactions between multiple biological processes. In particular, multiple genomic data such as Single Nucleotide Polymorphism (SNP), Copy Number Variation (CNV), DNA Methylation (DM), and their interactions simultaneously play an important role in human diseases. However, despite the widely known complex multi-layer biological processes and increased availability of the heterogeneous genomic data, most research has considered only a single type of genomic data. Furthermore, recent integrative genomic studies for the multiple genomic data have also been facing difficulties due to the high-dimensionality and complexity, especially when considering their intra- and inter-block interactions. In this paper, we introduce a novel multi-block bipartite graph and its inference methods, MB2I and sMB2I, for the integrative genomic study. The proposed methods not only integrate multiple genomic data but also incorporate intra/inter-block interactions by using a multi-block bipartite graph. In addition, the methods can be used to predict quantitative traits (e.g., gene expression, survival time) from the multi-block genomic data. The performance was assessed by simulation experiments that implement practical situations. We also applied the method to the human brain data of psychiatric disorders. The experimental results were analyzed by maximum edge biclique and biclustering, and biological findings were discussed.
Collapse
|
7
|
|
8
|
Goff DC, Romero K, Paul J, Mercedes Perez-Rodriguez M, Crandall D, Potkin SG. Biomarkers for drug development in early psychosis: Current issues and promising directions. Eur Neuropsychopharmacol 2016; 26:923-37. [PMID: 27005595 DOI: 10.1016/j.euroneuro.2016.01.009] [Citation(s) in RCA: 33] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/15/2015] [Revised: 01/20/2016] [Accepted: 01/23/2016] [Indexed: 12/14/2022]
Abstract
A major goal of current research in schizophrenia is to understand the biology underlying onset and early progression and to develop interventions that modify these processes. Biomarkers can play a critical role in identifying disease state, factors contributing to underlying progression, as well as predicting and monitoring response to treatment. Once biomarker-based therapeutics are established, biomarkers can guide treatment selection. It is increasingly clear that a wide range of potential biomarkers should be examined in schizophrenia, given the large number of genetic and environmental factors that have been identified as risk factors. New models for analysis of biomarkers are needed that represent the central nervous system as a highly complex, dynamic, and interactive system. Many tools are available with which to study relevant brain chemistry, but most are indirect measures and represent only a small fraction of the potential etiologic factors contributing to the molecular, structural and functional components of schizophrenia. This review represents the work of the International Society for CNS Clinical Trials and Methodology (ISCTM) Biomarkers Working Group. It discusses advantages and disadvantages of different categories of biomarkers and provides a summary of evidence that biomarkers representing inflammation, oxidative stress, endocannabinoids, glucocorticoid, and biogenic amines systems are dysregulated and potentially interactive in early phase schizophrenia. As has been recently demonstrated in several neurodevelopmental and neurodegenerative disorders, a multi-modal, longitudinal strategy involving a diverse array of biomarkers and new approaches to statistical modeling are needed to improve early interventions based on the fuller understanding.
Collapse
Affiliation(s)
| | | | - Jeffrey Paul
- Astellas Pharma Global Development, Northbrook, IL, USA
| | | | | | | |
Collapse
|
9
|
Attur M, Statnikov A, Samuels J, Li Z, Alekseyenko AV, Greenberg JD, Krasnokutsky S, Rybak L, Lu QA, Todd J, Zhou H, Jordan JM, Kraus VB, Aliferis CF, Abramson SB. Plasma levels of interleukin-1 receptor antagonist (IL1Ra) predict radiographic progression of symptomatic knee osteoarthritis. Osteoarthritis Cartilage 2015; 23:1915-24. [PMID: 26521737 PMCID: PMC4630783 DOI: 10.1016/j.joca.2015.08.006] [Citation(s) in RCA: 52] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/04/2014] [Revised: 07/21/2015] [Accepted: 08/18/2015] [Indexed: 02/02/2023]
Abstract
OBJECTIVE Pro- and anti-inflammatory mediators, such as IL-1β and IL1Ra, are produced by joint tissues in osteoarthritis (OA), where they may contribute to pathogenesis. We examined whether inflammatory events occurring within joints are reflected in plasma of patients with symptomatic knee osteoarthritis (SKOA). DESIGN 111 SKOA subjects with medial disease completed a 24-month prospective study of clinical and radiographic progression, with clinical assessment and specimen collection at 6-month intervals. The plasma biochemical marker IL1Ra was assessed at baseline and 18 months; other plasma biochemical markers were assessed only at 18 months, including IL-1β, TNFα, VEGF, IL-6, IL-6Rα, IL-17A, IL-17A/F, IL-17F, CRP, sTNF-RII, and MMP-2. RESULTS In cross-sectional studies, WOMAC (total, pain, function) and plasma IL1Ra were modestly associated with radiographic severity after adjustment for age, gender and body mass index (BMI). In addition, elevation of plasma IL1Ra predicted joint space narrowing (JSN) at 24 months. BMI did associate with progression in some but not all analyses. Causal graph analysis indicated a positive association of IL1Ra with JSN; an interaction between IL1Ra and BMI suggested either that BMI influences IL1Ra or that a hidden confounder influences both BMI and IL1Ra. Other protein biomarkers examined in this study did not associate with radiographic progression or severity. CONCLUSIONS Plasma levels of IL1Ra were modestly associated with the severity and progression of SKOA in a causal fashion, independent of other risk factors. The findings may be useful in the search for prognostic biomarkers and development of disease-modifying OA drugs.
Collapse
Affiliation(s)
- M Attur
- Division of Rheumatology, New York University School of Medicine, New York, NY 10016, USA; Department of Medicine, NYU School of Medicine, USA.
| | - A Statnikov
- Division of Translational Medicine, NYU School of Medicine, USA; Center for Health Informatics and Bioinformatics (CHIBI), NYU School of Medicine, USA.
| | - J Samuels
- Division of Rheumatology, New York University School of Medicine, New York, NY 10016, USA; Department of Medicine, NYU School of Medicine, USA.
| | - Z Li
- Department of Medicine, NYU School of Medicine, USA.
| | - A V Alekseyenko
- Division of Translational Medicine, NYU School of Medicine, USA; Center for Health Informatics and Bioinformatics (CHIBI), NYU School of Medicine, USA.
| | - J D Greenberg
- Division of Rheumatology, New York University School of Medicine, New York, NY 10016, USA; Department of Medicine, NYU School of Medicine, USA.
| | - S Krasnokutsky
- Division of Rheumatology, New York University School of Medicine, New York, NY 10016, USA; Department of Medicine, NYU School of Medicine, USA.
| | - L Rybak
- Department of Radiology, NYU School of Medicine, New York, NY, USA.
| | - Q A Lu
- Singulex, Inc., Alameda, CA 94502, USA.
| | - J Todd
- Singulex, Inc., Alameda, CA 94502, USA.
| | - H Zhou
- Division of Translational Medicine, NYU School of Medicine, USA; Center for Health Informatics and Bioinformatics (CHIBI), NYU School of Medicine, USA.
| | - J M Jordan
- Thurston Arthritis Research Center, University of North Carolina, Chapel Hill, NC 27599, USA.
| | - V B Kraus
- Duke Molecular Physiology Institute and Division of Rheumatology, Duke University School of Medicine, Durham, NC 27701, USA.
| | - C F Aliferis
- Department of Medicine, NYU School of Medicine, USA; Center for Health Informatics and Bioinformatics (CHIBI), NYU School of Medicine, USA; Department of Pathology, NYU School of Medicine, New York, NY, USA.
| | - S B Abramson
- Division of Rheumatology, New York University School of Medicine, New York, NY 10016, USA; Department of Medicine, NYU School of Medicine, USA; Department of Pathology, NYU School of Medicine, New York, NY, USA.
| |
Collapse
|
10
|
Niel C, Sinoquet C, Dina C, Rocheleau G. A survey about methods dedicated to epistasis detection. Front Genet 2015; 6:285. [PMID: 26442103 PMCID: PMC4564769 DOI: 10.3389/fgene.2015.00285] [Citation(s) in RCA: 67] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2015] [Accepted: 08/27/2015] [Indexed: 12/25/2022] Open
Abstract
During the past decade, findings of genome-wide association studies (GWAS) improved our knowledge and understanding of disease genetics. To date, thousands of SNPs have been associated with diseases and other complex traits. Statistical analysis typically looks for association between a phenotype and a SNP taken individually via single-locus tests. However, geneticists admit this is an oversimplified approach to tackle the complexity of underlying biological mechanisms. Interaction between SNPs, namely epistasis, must be considered. Unfortunately, epistasis detection gives rise to analytic challenges since analyzing every SNP combination is at present impractical at a genome-wide scale. In this review, we will present the main strategies recently proposed to detect epistatic interactions, along with their operating principle. Some of these methods are exhaustive, such as multifactor dimensionality reduction, likelihood ratio-based tests or receiver operating characteristic curve analysis; some are non-exhaustive, such as machine learning techniques (random forests, Bayesian networks) or combinatorial optimization approaches (ant colony optimization, computational evolution system).
Collapse
Affiliation(s)
- Clément Niel
- Computer Science Institute of Nantes-Atlantic (Lina), Centre National de la Recherche Scientifique UMR 6241, Ecole Polytechnique de l'Université de Nantes Nantes, France
| | - Christine Sinoquet
- Computer Science Institute of Nantes-Atlantic (Lina), Centre National de la Recherche Scientifique UMR 6241, University of Nantes Nantes, France
| | - Christian Dina
- Institut du Thorax, Institut National de la Santé et de la Recherche Médicale UMR 1087, Centre National de la Recherche Scientifique UMR 6291, University of Nantes Nantes, France
| | - Ghislain Rocheleau
- European Genomic Institute for Diabetes FR3508, Centre National de la Recherche Scientifique UMR 8199, Lille 2 University Lille, France
| |
Collapse
|
11
|
Ray B, Henaff M, Ma S, Efstathiadis E, Peskin ER, Picone M, Poli T, Aliferis CF, Statnikov A. Information content and analysis methods for multi-modal high-throughput biomedical data. Sci Rep 2014; 4:4411. [PMID: 24651673 PMCID: PMC3961740 DOI: 10.1038/srep04411] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2013] [Accepted: 02/27/2014] [Indexed: 01/30/2023] Open
Abstract
The spectrum of modern molecular high-throughput assaying includes diverse technologies such as microarray gene expression, miRNA expression, proteomics, DNA methylation, among many others. Now that these technologies have matured and become increasingly accessible, the next frontier is to collect "multi-modal" data for the same set of subjects and conduct integrative, multi-level analyses. While multi-modal data does contain distinct biological information that can be useful for answering complex biology questions, its value for predicting clinical phenotypes and contributions of each type of input remain unknown. We obtained 47 datasets/predictive tasks that in total span over 9 data modalities and executed analytic experiments for predicting various clinical phenotypes and outcomes. First, we analyzed each modality separately using uni-modal approaches based on several state-of-the-art supervised classification and feature selection methods. Then, we applied integrative multi-modal classification techniques. We have found that gene expression is the most predictively informative modality. Other modalities such as protein expression, miRNA expression, and DNA methylation also provide highly predictive results, which are often statistically comparable but not superior to gene expression data. Integrative multi-modal analyses generally do not increase predictive signal compared to gene expression data.
Collapse
Affiliation(s)
- Bisakha Ray
- Center for Health Informatics and Bioinformatics, New York University Langone Medical Center, New York, NY, USA
| | - Mikael Henaff
- Center for Health Informatics and Bioinformatics, New York University Langone Medical Center, New York, NY, USA
- Department of Computer Science, New York University, NY, USA
| | - Sisi Ma
- Center for Health Informatics and Bioinformatics, New York University Langone Medical Center, New York, NY, USA
| | - Efstratios Efstathiadis
- Center for Health Informatics and Bioinformatics, New York University Langone Medical Center, New York, NY, USA
| | - Eric R. Peskin
- Center for Health Informatics and Bioinformatics, New York University Langone Medical Center, New York, NY, USA
| | - Marco Picone
- Department of Information Engineering, University of Parma, Parma, Italy
- MultiMed Srl, Cremona, Italy
| | - Tito Poli
- Maxillofacial Surgery Section of the Head and Neck Department, University Hospital of Parma, Parma, Italy
| | - Constantin F. Aliferis
- Center for Health Informatics and Bioinformatics, New York University Langone Medical Center, New York, NY, USA
- Department of Pathology, New York University School of Medicine, New York, NY, USA
| | - Alexander Statnikov
- Center for Health Informatics and Bioinformatics, New York University Langone Medical Center, New York, NY, USA
- Department of Medicine, New York University School of Medicine, New York, NY, USA
| |
Collapse
|
12
|
Genome-wide association study identified the human leukocyte antigen region as a novel locus for plasma beta-2 microglobulin. Hum Genet 2013. [PMID: 23417110 DOI: 10.1007/s00439‐013‐1274‐7] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
Abstract
Beta-2 microglobulin (B2M) is a component of the major histocompatibility complex (MHC) class I molecule and has been studied as a biomarker of kidney function, cardiovascular diseases and mortality. Little is known about the genes influencing its levels directly or through glomerular filtration rate (GFR). We conducted a genome-wide association study of plasma B2M levels in 6738 European Americans from the Atherosclerosis Risk in Communities study to identify novel loci for B2M and assessed its association with known estimated GFR (eGFR) loci. We identified 2 genome-wide significant loci. One was in the human leukocyte antigen (HLA) region on chromosome 6 (lowest p value = 1.8 × 10(-23) for rs9264638). At this locus, 6 index SNPs accounted for 3.2 % of log(B2M) variance, and their association with B2M could largely be explained by imputed classical alleles of the MHC class I genes: HLA-A, HLA-B, or HLA-C. The index SNPs at this locus were not associated with eGFR based on serum creatinine (eGFRcr). The other locus of B2M was on chromosome 12 (rs3184504 at SH2B3, beta = 0.02, p value = 3.1 × 10(-8)), which was previously implicated as an eGFR locus. In conclusion, although B2M is known to be a component of MHC class I molecule, the association between HLA class I alleles and plasma B2M levels in a community-based population is novel. The identification of the two novel loci for B2M extends our understanding of its metabolism and informs its use as a kidney filtration biomarker.
Collapse
|
13
|
Tin A, Astor BC, Boerwinkle E, Hoogeveen RC, Coresh J, Kao WHL. Genome-wide association study identified the human leukocyte antigen region as a novel locus for plasma beta-2 microglobulin. Hum Genet 2013; 132:619-27. [PMID: 23417110 DOI: 10.1007/s00439-013-1274-7] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2012] [Accepted: 02/06/2013] [Indexed: 01/11/2023]
Abstract
Beta-2 microglobulin (B2M) is a component of the major histocompatibility complex (MHC) class I molecule and has been studied as a biomarker of kidney function, cardiovascular diseases and mortality. Little is known about the genes influencing its levels directly or through glomerular filtration rate (GFR). We conducted a genome-wide association study of plasma B2M levels in 6738 European Americans from the Atherosclerosis Risk in Communities study to identify novel loci for B2M and assessed its association with known estimated GFR (eGFR) loci. We identified 2 genome-wide significant loci. One was in the human leukocyte antigen (HLA) region on chromosome 6 (lowest p value = 1.8 × 10(-23) for rs9264638). At this locus, 6 index SNPs accounted for 3.2 % of log(B2M) variance, and their association with B2M could largely be explained by imputed classical alleles of the MHC class I genes: HLA-A, HLA-B, or HLA-C. The index SNPs at this locus were not associated with eGFR based on serum creatinine (eGFRcr). The other locus of B2M was on chromosome 12 (rs3184504 at SH2B3, beta = 0.02, p value = 3.1 × 10(-8)), which was previously implicated as an eGFR locus. In conclusion, although B2M is known to be a component of MHC class I molecule, the association between HLA class I alleles and plasma B2M levels in a community-based population is novel. The identification of the two novel loci for B2M extends our understanding of its metabolism and informs its use as a kidney filtration biomarker.
Collapse
Affiliation(s)
- Adrienne Tin
- Department of Epidemiology, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD 21205, USA.
| | | | | | | | | | | |
Collapse
|
14
|
Hoffman S, Podgurski A. The use and misuse of biomedical data: is bigger really better? AMERICAN JOURNAL OF LAW & MEDICINE 2013; 39:497-538. [PMID: 24494442 DOI: 10.1177/009885881303900401] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]
Abstract
Very large biomedical research databases, containing electronic health records (EHR) and genomic data from millions of patients, have been heralded recently for their potential to accelerate scientific discovery and produce dramatic improvements in medical treatments. Research enabled by these databases may also lead to profound changes in law, regulation, social policy, and even litigation strategies. Yet, is "big data" necessarily better data? This paper makes an original contribution to the legal literature by focusing on what can go wrong in the process of biomedical database research and what precautions are necessary to avoid critical mistakes. We address three main reasons for approaching such research with care and being cautious in relying on its outcomes for purposes of public policy or litigation. First, the data contained in biomedical databases is surprisingly likely to be incorrect or incomplete. Second, systematic biases, arising from both the nature of the data and the preconceptions of investigators, are serious threats to the validity of research results, especially in answering causal questions. Third, data mining of biomedical databases makes it easier for individuals with political, social, or economic agendas to generate ostensibly scientific but misleading research findings for the purpose of manipulating public opinion and swaying policymakers. In short, this paper sheds much-needed light on the problems of credulous and uninformed acceptance of research results derived from biomedical databases. An understanding of the pitfalls of big data analysis is of critical importance to anyone who will rely on or dispute its outcomes, including lawyers, policymakers, and the public at large. The Article also recommends technical, methodological, and educational interventions to combat the dangers of database errors and abuses.
Collapse
Affiliation(s)
- Sharona Hoffman
- Law-Medicine Center, Case Western Reserve University School of Law, USA
| | | |
Collapse
|
15
|
Statnikov A, Alekseyenko AV, Li Z, Henaff M, Perez-Perez GI, Blaser MJ, Aliferis CF. Microbiomic signatures of psoriasis: feasibility and methodology comparison. Sci Rep 2013; 3:2620. [PMID: 24018484 PMCID: PMC3965359 DOI: 10.1038/srep02620] [Citation(s) in RCA: 66] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2013] [Accepted: 08/22/2013] [Indexed: 01/21/2023] Open
Abstract
Psoriasis is a common chronic inflammatory disease of the skin. We sought to use bacterial community abundance data to assess the feasibility of developing multivariate molecular signatures for differentiation of cutaneous psoriatic lesions, clinically unaffected contralateral skin from psoriatic patients, and similar cutaneous loci in matched healthy control subjects. Using 16S rRNA high-throughput DNA sequencing, we assayed the cutaneous microbiome for 51 such matched specimen triplets including subjects of both genders, different age groups, ethnicities and multiple body sites. None of the subjects had recently received relevant treatments or antibiotics. We found that molecular signatures for the diagnosis of psoriasis result in significant accuracy ranging from 0.75 to 0.89 AUC, depending on the classification task. We also found a significant effect of DNA sequencing and downstream analysis protocols on the accuracy of molecular signatures. Our results demonstrate that it is feasible to develop accurate molecular signatures for the diagnosis of psoriasis from microbiomic data.
Collapse
Affiliation(s)
- Alexander Statnikov
- Center for Health Informatics and Bioinformatics (CHIBI), New York University Langone Medical Center, New York, New York
- Department of Medicine, New York University School of Medicine, New York, New York
| | - Alexander V. Alekseyenko
- Center for Health Informatics and Bioinformatics (CHIBI), New York University Langone Medical Center, New York, New York
- Department of Medicine, New York University School of Medicine, New York, New York
| | - Zhiguo Li
- Center for Health Informatics and Bioinformatics (CHIBI), New York University Langone Medical Center, New York, New York
| | - Mikael Henaff
- Center for Health Informatics and Bioinformatics (CHIBI), New York University Langone Medical Center, New York, New York
| | - Guillermo I. Perez-Perez
- Department of Medicine, New York University School of Medicine, New York, New York
- Department of Microbiology, New York University School of Medicine, New York, New York
| | - Martin J. Blaser
- Department of Medicine, New York University School of Medicine, New York, New York
- Department of Microbiology, New York University School of Medicine, New York, New York
- Medical Service, Department of Veterans Affairs New York Harbor Healthcare System, New York, New York
| | - Constantin F. Aliferis
- Center for Health Informatics and Bioinformatics (CHIBI), New York University Langone Medical Center, New York, New York
- Department of Pathology, New York University School of Medicine, New York, New York
| |
Collapse
|
16
|
|
17
|
Russu A, Malovini A, Puca AA, Bellazzi R. Stochastic model search with binary outcomes for genome-wide association studies. J Am Med Inform Assoc 2012; 19:e13-20. [PMID: 22534080 PMCID: PMC3392850 DOI: 10.1136/amiajnl-2011-000741] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022] Open
Abstract
Objective The spread of case–control genome-wide association studies (GWASs) has stimulated the development of new variable selection methods and predictive models. We introduce a novel Bayesian model search algorithm, Binary Outcome Stochastic Search (BOSS), which addresses the model selection problem when the number of predictors far exceeds the number of binary responses. Materials and methods Our method is based on a latent variable model that links the observed outcomes to the underlying genetic variables. A Markov Chain Monte Carlo approach is used for model search and to evaluate the posterior probability of each predictor. Results BOSS is compared with three established methods (stepwise regression, logistic lasso, and elastic net) in a simulated benchmark. Two real case studies are also investigated: a GWAS on the genetic bases of longevity, and the type 2 diabetes study from the Wellcome Trust Case Control Consortium. Simulations show that BOSS achieves higher precisions than the reference methods while preserving good recall rates. In both experimental studies, BOSS successfully detects genetic polymorphisms previously reported to be associated with the analyzed phenotypes. Discussion BOSS outperforms the other methods in terms of F-measure on simulated data. In the two real studies, BOSS successfully detects biologically relevant features, some of which are missed by univariate analysis and the three reference techniques. Conclusion The proposed algorithm is an advance in the methodology for model selection with a large number of features. Our simulated and experimental results showed that BOSS proves effective in detecting relevant markers while providing a parsimonious model.
Collapse
Affiliation(s)
- Alberto Russu
- Department of Industrial and Information Engineering, University of Pavia, Pavia, Italy.
| | | | | | | |
Collapse
|