1
|
Li S, Cheng L, Zhang T, Zhao H, Li J. Graph-guided Bayesian matrix completion for ocean sound speed field reconstruction. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2023; 153:689. [PMID: 36732248 DOI: 10.1121/10.0017064] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/16/2022] [Accepted: 01/09/2023] [Indexed: 06/18/2023]
Abstract
Reconstructing ocean sound speed field (SSF) from limited and noisy measurements/estimates is crucial for many ocean acoustic applications, including underwater tomography, target localization/tracking, and communications. Classical reconstruction methods include deterministic approaches (e.g., spline interpolation) and geostatistical methods (e.g., kriging). They exhibit a strong link to linear regression and Gaussian process regression in machine learning (ML) literature, by uniformly viewing them as supervised regression models that learn the mapping from the geographical locations to the sound speed outputs. From a unified ML perspective, theoretical analysis indicates that classical reconstruction methods have several drawbacks, such as the sensitivity to noises and high computational cost. To overcome these drawbacks, inspired by the recent thriving development of graph machine learning, we introduce graph-guided Bayesian low-rank matrix completions (LRMCs) for fine-scale and accurate ocean SSF reconstruction. In particular, a more general graph-guided LRMC model is proposed that encompasses the state-of-the-art one as a special case. The proposed model and the associated inference algorithm simultaneously exploit the global (low-rankness) and local (graph structure) information of ocean sound speed data, thus striking an outstanding balance of reconstruction accuracy and computational complexity. Numerical results using real-life ocean SSF data have demonstrated the encouraging performances of the proposed approaches.
Collapse
Affiliation(s)
- Siyuan Li
- College of Information Science and Electronic Engineering, Zhejiang University, Hangzhou, 310027, China
| | - Lei Cheng
- College of Information Science and Electronic Engineering, Zhejiang University, Hangzhou, 310027, China
| | - Ting Zhang
- College of Information Science and Electronic Engineering, Zhejiang University, Hangzhou, 310027, China
| | - Hangfang Zhao
- College of Information Science and Electronic Engineering, Zhejiang University, Hangzhou, 310027, China
| | - Jianlong Li
- College of Information Science and Electronic Engineering, Zhejiang University, Hangzhou, 310027, China
| |
Collapse
|
2
|
Ye F, Cho H, Rouayheb SE. Mechanisms for Hiding Sensitive Genotypes with Information-Theoretic Privacy. IEEE TRANSACTIONS ON INFORMATION THEORY 2022; 68:4090-4105. [PMID: 37283781 PMCID: PMC10243750 DOI: 10.1109/tit.2022.3156276] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Motivated by the growing availability of personal genomics services, we study an information-theoretic privacy problem that arises when sharing genomic data: a user wants to share his or her genome sequence while keeping the genotypes at certain positions hidden, which could otherwise reveal critical health-related information. A straightforward solution of erasing (masking) the chosen genotypes does not ensure privacy, because the correlation between nearby positions can leak the masked genotypes. We introduce an erasure-based privacy mechanism with perfect information-theoretic privacy, whereby the released sequence is statistically independent of the sensitive genotypes. Our mechanism can be interpreted as a locally-optimal greedy algorithm for a given processing order of sequence positions, where utility is measured by the number of positions released without erasure. We show that finding an optimal order is NP-hard in general and provide an upper bound on the optimal utility. For sequences from hidden Markov models, a standard modeling approach in genetics, we propose an efficient algorithmic implementation of our mechanism with complexity polynomial in sequence length. Moreover, we illustrate the robustness of the mechanism by bounding the privacy leakage from erroneous prior distributions. Our work is a step towards more rigorous control of privacy in genomic data sharing.
Collapse
Affiliation(s)
- Fangwei Ye
- Department of Electrical and Computer Engineering, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA; Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Hyunghoon Cho
- Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Salim El Rouayheb
- Department of Electrical and Computer Engineering, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| |
Collapse
|
3
|
Sherpa S, Kebaïli C, Rioux D, Guéguen M, Renaud J, Després L. Population decline at distribution margins: Assessing extinction risk in the last glacial relictual but still functional metapopulation of a European butterfly. DIVERS DISTRIB 2021. [DOI: 10.1111/ddi.13460] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022] Open
Affiliation(s)
- Stéphanie Sherpa
- Laboratoire d'Ecologie Alpine UMR CNRS‐UGA‐USMB 5553 Université Grenoble Alpes Grenoble Cedex 9 France
| | - Caroline Kebaïli
- Laboratoire d'Ecologie Alpine UMR CNRS‐UGA‐USMB 5553 Université Grenoble Alpes Grenoble Cedex 9 France
- Parc Naturel Régional du Haut Jura Lajoux France
| | - Delphine Rioux
- Laboratoire d'Ecologie Alpine UMR CNRS‐UGA‐USMB 5553 Université Grenoble Alpes Grenoble Cedex 9 France
| | - Maya Guéguen
- Laboratoire d'Ecologie Alpine UMR CNRS‐UGA‐USMB 5553 Université Grenoble Alpes Grenoble Cedex 9 France
| | - Julien Renaud
- Laboratoire d'Ecologie Alpine UMR CNRS‐UGA‐USMB 5553 Université Grenoble Alpes Grenoble Cedex 9 France
| | - Laurence Després
- Laboratoire d'Ecologie Alpine UMR CNRS‐UGA‐USMB 5553 Université Grenoble Alpes Grenoble Cedex 9 France
| |
Collapse
|
4
|
Fan M, Zhang Y, Fu Z, Xu M, Wang S, Xie S, Gao X, Wang Y, Li L. A deep matrix completion method for imputing missing histological data in breast cancer by integrating DCE-MRI radiomics. Med Phys 2021; 48:7685-7697. [PMID: 34724248 DOI: 10.1002/mp.15316] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2021] [Revised: 10/13/2021] [Accepted: 10/14/2021] [Indexed: 02/01/2023] Open
Abstract
PURPOSE Clinical indicators of histological information are important for breast cancer treatment and operational decision making, but these histological data suffer from frequent missing values due to various experimental/clinical reasons. The limited amount of histological information from breast cancer samples impedes the accuracy of data imputation. The purpose of this study was to impute missing histological data, including Ki-67 expression level, luminal A subtype, and histological grade, by integrating tumor radiomics. METHODS To this end, a deep matrix completion (DMC) method was proposed for imputing missing histological data using nonmissing features composed of histological and tumor radiomics (termed radiohistological features). DMC finds a latent nonlinear association between radiohistological features across all samples and samples for all the features. Radiomic features of morphologic, statistical, and texture were extracted from dynamic contrast-enhanced magnetic resonance imaging (DCE-MRI) inside the tumor. Experiments on missing histological data imputation were performed with a variable number of features and missing data rates. The performance of the DMC method was compared with those of the nonnegative matrix factorization (NMF) and collaborative filtering (MCF)-based data imputation methods. The area under the curve (AUC) was used to assess the performance of missing histological data imputation. RESULTS By integrating radiomics from DCE-MRI, the DMC method showed significantly better performance in terms of AUC than that using only histological data. Additionally, DMC using 120 radiomic features showed an optimal prediction performance (AUC = 0.793), which was better than the NMF (AUC = 0.756) and MCF methods (AUC = 0.706; corrected p = 0.001). The DMC method consistently performed better than the NMF and MCF methods with a variable number of radiomic features and missing data rates. CONCLUSIONS DMC improves imputation performance by integrating tumor histological and radiomics data. This study transforms latent imaging-scale patterns for interactions with molecular-scale histological information and is promising in the tumor characterization and management of patients.
Collapse
Affiliation(s)
- Ming Fan
- Institute of Biomedical Engineering and Instrumentation, Hangzhou Dianzi University, Hangzhou, China
| | - You Zhang
- Institute of Biomedical Engineering and Instrumentation, Hangzhou Dianzi University, Hangzhou, China
| | - Zhenyu Fu
- Institute of Biomedical Engineering and Instrumentation, Hangzhou Dianzi University, Hangzhou, China
| | - Maosheng Xu
- Department of Radiology, First Affiliated Hospital of Zhejiang Chinese Medical University, Hangzhou, China
| | - Shiwei Wang
- Department of Radiology, First Affiliated Hospital of Zhejiang Chinese Medical University, Hangzhou, China
| | - Sangma Xie
- Institute of Biomedical Engineering and Instrumentation, Hangzhou Dianzi University, Hangzhou, China
| | - Xin Gao
- Computational Bioscience Research Center (CBRC), Computer, Electrical and Mathematical Sciences and Engineering Division (CEMSE), King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia
| | - Yue Wang
- Department of Electrical and Computer Engineering, Virginia Polytechnic Institute and State University, Arlington, USA
| | - Lihua Li
- Institute of Biomedical Engineering and Instrumentation, Hangzhou Dianzi University, Hangzhou, China
| |
Collapse
|
5
|
Wu H, Wang X, Chu M, Xiang R, Zhou K. FRMC: a fast and robust method for the imputation of scRNA-seq data. RNA Biol 2021; 18:172-181. [PMID: 34459719 DOI: 10.1080/15476286.2021.1960688] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022] Open
Abstract
The high-resolution feature of single-cell transcriptome sequencing technology allows researchers to observe cellular gene expression profiles at the single-cell level, offering numerous possibilities for subsequent biomedical investigation. However, the unavoidable technical impact of high missing values in the gene-cell expression matrices generated by insufficient RNA input severely hampers the accuracy of downstream analysis. To address this problem, it is essential to develop a more rapid and stable imputation method with greater accuracy, which should not only be able to recover the missing data, but also effectively facilitate the following biological mechanism analysis. The existing imputation methods all have their drawbacks and limitations, some require pre-assumed data distribution, some cannot distinguish between technical and biological zeros, and some have poor computational performance. In this paper, we presented a novel imputation software FRMC for single-cell RNA-Seq data, which innovates a fast and accurate singular value thresholding approximation method. The experiments demonstrated that FRMC can not only precisely distinguish 'true zeros' from dropout events and correctly impute missing values attributed to technical noises, but also effectively enhance intracellular and intergenic connections and achieve accurate clustering of cells in biological applications. In summary, FRMC can be a powerful tool for analysing single-cell data because it ensures biological significance, accuracy, and rapidity simultaneously. FRMC is implemented in Python and is freely accessible to non-commercial users on GitHub: https://github.com/HUST-DataMan/FRMC.
Collapse
Affiliation(s)
- Honglong Wu
- Wuhan National Laboratory for Optoelectronics, Huazhong University of Science & Technology, Wuhan, Hubei, China.,BGI PathoGenesis Pharmaceutical Technology, BGI-Shenzhen, Shenzhen 518083, China
| | - Xuebin Wang
- BGI PathoGenesis Pharmaceutical Technology, BGI-Shenzhen, Shenzhen 518083, China
| | - Mengtian Chu
- BGI PathoGenesis Pharmaceutical Technology, BGI-Shenzhen, Shenzhen 518083, China
| | - Ruizhi Xiang
- BGI PathoGenesis Pharmaceutical Technology, BGI-Shenzhen, Shenzhen 518083, China
| | - Ke Zhou
- Wuhan National Laboratory for Optoelectronics, Huazhong University of Science & Technology, Wuhan, Hubei, China
| |
Collapse
|
6
|
Chu BB, Sobel EM, Wasiolek R, Ko S, Sinsheimer JS, Zhou H, Lange K. A fast Data-Driven method for genotype imputation, phasing, and local ancestry inference: MendelImpute.jl. Bioinformatics 2021; 37:4756-4763. [PMID: 34289008 PMCID: PMC8665755 DOI: 10.1093/bioinformatics/btab489] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2021] [Revised: 05/18/2021] [Accepted: 07/19/2021] [Indexed: 11/12/2022] Open
Abstract
MOTIVATION Current methods for genotype imputation and phasing exploit the volume of data in haplotype reference panels and rely on hidden Markov models. Existing programs all have essentially the same imputation accuracy, are computationally intensive, and generally require pre-phasing the typed markers. RESULTS We introduce a novel data-mining method for genotype imputation and phasing that substitutes highly efficient linear algebra routines for hidden Markov model calculations. This strategy, embodied in our Julia program MendelImpute.jl, avoids explicit assumptions about recombination and population structure while delivering similar prediction accuracy, better memory usage, and an order of magnitude or better run-times compared to the fastest competing method. MendelImpute operates on both dosage data and unphased genotype data and simultaneously imputes missing genotypes and phase at both the typed and untyped SNPs. Finally, MendelImpute naturally extends to global and local ancestry estimation and lends itself to new strategies for data compression and hence faster data transport and sharing. AVAILABILITY Software, documentation, and scripts to reproduce our results are available from https://github.com/OpenMendel/MendelImpute.jl. SUPPLEMENTARY INFORMATION Supplementary data are available from Bioinformatics online.
Collapse
Affiliation(s)
- Benjamin B Chu
- Department of Computational Medicine, David Geffen School of Medicine at UCLA, Los Angeles, USA
| | - Eric M Sobel
- Department of Computational Medicine, David Geffen School of Medicine at UCLA, Los Angeles, USA.,Department of Human Genetics, David Geffen School of Medicine at UCLA, Los Angeles, USA
| | - Rory Wasiolek
- Department of Computational Medicine, David Geffen School of Medicine at UCLA, Los Angeles, USA
| | - Seyoon Ko
- Department of Biostatistics, Fielding School of Public Health at UCLA, Los Angeles, USA
| | - Janet S Sinsheimer
- Department of Computational Medicine, David Geffen School of Medicine at UCLA, Los Angeles, USA.,Department of Human Genetics, David Geffen School of Medicine at UCLA, Los Angeles, USA.,Department of Biostatistics, Fielding School of Public Health at UCLA, Los Angeles, USA
| | - Hua Zhou
- Department of Biostatistics, Fielding School of Public Health at UCLA, Los Angeles, USA
| | - Kenneth Lange
- Department of Computational Medicine, David Geffen School of Medicine at UCLA, Los Angeles, USA.,Department of Human Genetics, David Geffen School of Medicine at UCLA, Los Angeles, USA
| |
Collapse
|
7
|
Gain C, François O. LEA 3: Factor models in population genetics and ecological genomics with R. Mol Ecol Resour 2021; 21:2738-2748. [PMID: 33638893 DOI: 10.1111/1755-0998.13366] [Citation(s) in RCA: 23] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2020] [Revised: 01/21/2021] [Accepted: 02/23/2021] [Indexed: 12/12/2022]
Abstract
A major objective of evolutionary biology is to understand the processes by which organisms have adapted to various environments, and to predict the response of organisms to new or future conditions. The availability of large genomic and environmental data sets provides an opportunity to address those questions, and the R package LEA has been introduced to facilitate population and ecological genomic analyses in this context. By using latent factor models, the program computes ancestry coefficients from population genetic data and performs genotype-environment association analyses with correction for unobserved confounding variables. In this study, we present new functionalities of LEA, which include imputation of missing genotypes, fast algorithms for latent factor mixed models using multivariate predictors for genotype-environment association studies, population differentiation tests for admixed or continuous populations, and estimation of genetic offset based on climate models. The new functionalities are implemented in version 3.1 and higher releases of the package. Using simulated and real data sets, our study provides evaluations and examples of applications, outlining important practical considerations when analysing ecological genomic data in R.
Collapse
Affiliation(s)
- Clément Gain
- Centre National de la Recherche Scientifique, Grenoble INP, TIMC-IMAG CNRS UMR 5525, Université Grenoble-Alpes, Grenoble, France
| | - Olivier François
- Centre National de la Recherche Scientifique, Grenoble INP, TIMC-IMAG CNRS UMR 5525, Université Grenoble-Alpes, Grenoble, France
| |
Collapse
|
8
|
Hasan MK, Alam MA, Roy S, Dutta A, Jawad MT, Das S. Missing value imputation affects the performance of machine learning: A review and analysis of the literature (2010–2021). INFORMATICS IN MEDICINE UNLOCKED 2021. [DOI: 10.1016/j.imu.2021.100799] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022] Open
|
9
|
|
10
|
Sherpa S, Blum MGB, Després L. Cold adaptation in the Asian tiger mosquito's native range precedes its invasion success in temperate regions. Evolution 2019; 73:1793-1808. [DOI: 10.1111/evo.13801] [Citation(s) in RCA: 20] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2019] [Revised: 06/06/2019] [Accepted: 06/14/2019] [Indexed: 12/25/2022]
Affiliation(s)
- Stéphanie Sherpa
- Université Grenoble Alpes CNRS, UMR 5553 LECA F‐38000 Grenoble France
| | - Michael G. B. Blum
- Université Grenoble Alpes CNRS, UMR 5525 TIMC‐IMAG F‐38000 Grenoble France
| | - Laurence Després
- Université Grenoble Alpes CNRS, UMR 5553 LECA F‐38000 Grenoble France
| |
Collapse
|
11
|
Chi EC, Li T. Matrix completion from a computational statistics perspective. ACTA ACUST UNITED AC 2019. [DOI: 10.1002/wics.1469] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Affiliation(s)
- Eric C. Chi
- Department of Statistics North Carolina State University Raleigh North Carolina
| | - Tianxi Li
- Department of Statistics University of Virginia Charlottesville Virginia
| |
Collapse
|
12
|
Zhou H, Sinsheimer JS, Bates DM, Chu BB, German CA, Ji SS, Keys KL, Kim J, Ko S, Mosher GD, Papp JC, Sobel EM, Zhai J, Zhou JJ, Lange K. OPENMENDEL: a cooperative programming project for statistical genetics. Hum Genet 2019; 139:61-71. [PMID: 30915546 DOI: 10.1007/s00439-019-02001-z] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2018] [Accepted: 03/15/2019] [Indexed: 01/06/2023]
Abstract
Statistical methods for genome-wide association studies (GWAS) continue to improve. However, the increasing volume and variety of genetic and genomic data make computational speed and ease of data manipulation mandatory in future software. In our view, a collaborative effort of statistical geneticists is required to develop open source software targeted to genetic epidemiology. Our attempt to meet this need is called the OPENMENDEL project (https://openmendel.github.io). It aims to (1) enable interactive and reproducible analyses with informative intermediate results, (2) scale to big data analytics, (3) embrace parallel and distributed computing, (4) adapt to rapid hardware evolution, (5) allow cloud computing, (6) allow integration of varied genetic data types, and (7) foster easy communication between clinicians, geneticists, statisticians, and computer scientists. This article reviews and makes recommendations to the genetic epidemiology community in the context of the OPENMENDEL project.
Collapse
Affiliation(s)
- Hua Zhou
- Department of Biostatistics, UCLA Fielding School of Public Health, Los Angeles, USA.
| | - Janet S Sinsheimer
- Department of Human Genetics, David Geffen School of Medicine at UCLA, Los Angeles, USA.
| | - Douglas M Bates
- Department of Statistics, University of Wisconsin, Madison, USA
| | - Benjamin B Chu
- Department of Biomathematics, David Geffen School of Medicine at UCLA, Los Angeles, USA
| | - Christopher A German
- Department of Biostatistics, UCLA Fielding School of Public Health, Los Angeles, USA
| | - Sarah S Ji
- Department of Biostatistics, UCLA Fielding School of Public Health, Los Angeles, USA
| | - Kevin L Keys
- Department of Medicine, University of California, San Francisco, USA
| | - Juhyun Kim
- Department of Biostatistics, UCLA Fielding School of Public Health, Los Angeles, USA
| | - Seyoon Ko
- Department of Statistics, Seoul National University, Seoul, South Korea
| | - Gordon D Mosher
- Departments of Statistics and Computer Science, University of California, Riverside, USA
| | - Jeanette C Papp
- Department of Human Genetics, David Geffen School of Medicine at UCLA, Los Angeles, USA
| | - Eric M Sobel
- Department of Human Genetics, David Geffen School of Medicine at UCLA, Los Angeles, USA
| | - Jing Zhai
- Department of Epidemiology and Biostatistics, Mel and Enid Zuckerman College of Public Health, University of Arizona, Tucson, USA
| | - Jin J Zhou
- Department of Epidemiology and Biostatistics, Mel and Enid Zuckerman College of Public Health, University of Arizona, Tucson, USA
| | - Kenneth Lange
- Department of Biomathematics, David Geffen School of Medicine at UCLA, Los Angeles, USA.
| |
Collapse
|
13
|
Carpentier A, Klopp O, Löffler M, Nickl R. Adaptive confidence sets for matrix completion. BERNOULLI 2018. [DOI: 10.3150/17-bej933] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
14
|
Sherpa S, Rioux D, Goindin D, Fouque F, François O, Després L. At the Origin of a Worldwide Invasion: Unraveling the Genetic Makeup of the Caribbean Bridgehead Populations of the Dengue Vector Aedes aegypti. Genome Biol Evol 2018; 10:56-71. [PMID: 29267872 PMCID: PMC5758905 DOI: 10.1093/gbe/evx267] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 12/15/2017] [Indexed: 12/21/2022] Open
Abstract
Human-driven global environmental changes have considerably increased the risk of biological invasions, especially the spread of human parasites and their vectors. Among exotic species that have major impacts on public health, the dengue fever mosquito Aedes aegypti originating from Africa has spread worldwide during the last three centuries. Although considerable progress has been recently made in understanding the history of this invasion, the respective roles of human and abiotic factors in shaping patterns of genetic diversity remain largely unexplored. Using a genome-wide sample of genetic variants (3,530 ddRAD SNPs), we analyzed the genetic structure of Ae. aegypti populations in the Caribbean, the first introduced territories in the Americas. Fourteen populations were sampled in Guyane and in four islands of the Antilles that differ in climatic conditions, intensity of urbanization, and vector control history. The genetic diversity in the Caribbean was low (He = 0.14–0.17), as compared with a single African collection from Benin (He = 0.26) and site-frequency spectrum analysis detected an ancient bottleneck dating back ∼300 years ago, supporting a founder event during the introduction of Ae. aegypti. Evidence for a more recent bottleneck may be related to the eradication program undertaken on the American continent in the 1950s. Among 12 loci detected as FST-outliers, two were located in candidate genes for insecticide resistance (cytochrome P450 and voltage-gated sodium channel). Genome–environment association tests identified additional loci associated with human density and/or deltamethrin resistance. Our results highlight the high impact of human pressures on the demographic history and genetic variation of Ae. aegypti Caribbean populations.
Collapse
Affiliation(s)
- Stéphanie Sherpa
- Laboratoire d'Ecologie Alpine, CNRS UMR 5553, Université Grenoble Alpes, Grenoble, France
| | - Delphine Rioux
- Laboratoire d'Ecologie Alpine, CNRS UMR 5553, Université Grenoble Alpes, Grenoble, France
| | - Daniella Goindin
- Laboratoire d'Entomologie Médicale, Institut Pasteur de Guadeloupe, Les Abymes, France
| | - Florence Fouque
- Special Programme for Research and Training in Tropical Diseases, World Health Organization, Geneva, Switzerland
| | - Olivier François
- Laboratoire Techniques de l'Ingénierie Médicale et de la Complexité, CNRS UMR 5525, Université Grenoble Alpes, Grenoble, France
| | - Laurence Després
- Laboratoire d'Ecologie Alpine, CNRS UMR 5553, Université Grenoble Alpes, Grenoble, France
| |
Collapse
|
15
|
Chi EC, Hu L, Saibaba AK, Rao AUK. Going Off the Grid: Iterative Model Selection for Biclustered Matrix Completion. J Comput Graph Stat 2018. [DOI: 10.1080/10618600.2018.1482763] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/14/2022]
Affiliation(s)
- Eric C. Chi
- Department of Statistics, North Carolina State University, Raleigh, NC
| | - Liuyi Hu
- Department of Statistics, North Carolina State University, Raleigh, NC
| | - Arvind K. Saibaba
- Department of Mathematics, North Carolina State University, Raleigh, NC
| | - Arvind U. K. Rao
- Department of Computational Medicine and Bioinformatics, Department of Radiation Oncology, University of Michigan, Ann Arbor, MI
| |
Collapse
|
16
|
Louzoun Y, Alter I, Gragert L, Albrecht M, Maiers M. Modeling coverage gaps in haplotype frequencies via Bayesian inference to improve stem cell donor selection. Immunogenetics 2017; 70:279-292. [DOI: 10.1007/s00251-017-1040-4] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2017] [Accepted: 10/23/2017] [Indexed: 11/24/2022]
|
17
|
Abstract
Many statistical learning methods such as matrix completion, matrix regression, and multiple response regression estimate a matrix of parameters. The nuclear norm regularization is frequently employed to achieve shrinkage and low rank solutions. To minimize a nuclear norm regularized loss function, a vital and most time-consuming step is singular value thresholding, which seeks the singular values of a large matrix exceeding a threshold and their associated singular vectors. Currently MATLAB lacks a function for singular value thresholding. Its built-in svds function computes the top r singular values/vectors by Lanczos iterative method but is only efficient for sparse matrix input, while aforementioned statistical learning algorithms perform singular value thresholding on dense but structured matrices. To address this issue, we provide a MATLAB wrapper function svt that implements singular value thresholding. It encompasses both top singular value decomposition and thresholding, handles both large sparse matrices and structured matrices, and reduces the computation cost in matrix learning algorithms.
Collapse
Affiliation(s)
- Cai Li
- North Carolina State University
| | - Hua Zhou
- University of California, Los Angeles
| |
Collapse
|
18
|
|
19
|
Zhu L, Guo WL, Lu C, Huang DS. Collaborative Completion of Transcription Factor Binding Profiles via Local Sensitive Unified Embedding. IEEE Trans Nanobioscience 2016; 15:946-958. [PMID: 27845669 DOI: 10.1109/tnb.2016.2625823] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Although the newly available ChIP-seq data provides immense opportunities for comparative study of regulatory activities across different biological conditions, due to cost, time or sample material availability, it is not always possible for researchers to obtain binding profiles for every protein in every sample of interest, which considerably limits the power of integrative studies. Recently, by leveraging related information from measured data, Ernst et al. proposed ChromImpute for predicting additional ChIP-seq and other types of datasets, it is demonstrated that the imputed signal tracks accurately approximate the experimentally measured signals, and thereby could potentially enhance the power of integrative analysis. Despite the success of ChromImpute, in this paper, we reexamine its learning process, and show that its performance may degrade substantially and sometimes may even fail to output a prediction when the available data is scarce. This limitation could hurt its applicability to important predictive tasks, such as the imputation of TF binding data. To alleviate this problem, we propose a novel method called Local Sensitive Unified Embedding (LSUE) for imputing new ChIP-seq datasets. In LSUE, the ChIP-seq data compendium are fused together by mapping proteins, samples, and genomic positions simultaneously into the Euclidean space, thereby making their underling associations directly evaluable using simple calculations. In contrast to ChromImpute which mainly makes use of the local correlations between available datasets, LSUE can better estimate the overall data structure by formulating the representation learning of all involved entities as a single unified optimization problem. Meanwhile, a novel form of local sensitive low rank regularization is also proposed to further improve the performance of LSUE. Experimental evaluations on the ENCODE TF ChIP-seq data illustrate the performance of the proposed model. The code of LSUE is available at https://github.com/ekffar/LSUE.
Collapse
|
20
|
SparRec: An effective matrix completion framework of missing data imputation for GWAS. Sci Rep 2016; 6:35534. [PMID: 27762341 PMCID: PMC5071878 DOI: 10.1038/srep35534] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2016] [Accepted: 09/30/2016] [Indexed: 11/08/2022] Open
Abstract
Genome-wide association studies present computational challenges for missing data imputation, while the advances of genotype technologies are generating datasets of large sample sizes with sample sets genotyped on multiple SNP chips. We present a new framework SparRec (Sparse Recovery) for imputation, with the following properties: (1) The optimization models of SparRec, based on low-rank and low number of co-clusters of matrices, are different from current statistics methods. While our low-rank matrix completion (LRMC) model is similar to Mendel-Impute, our matrix co-clustering factorization (MCCF) model is completely new. (2) SparRec, as other matrix completion methods, is flexible to be applied to missing data imputation for large meta-analysis with different cohorts genotyped on different sets of SNPs, even when there is no reference panel. This kind of meta-analysis is very challenging for current statistics based methods. (3) SparRec has consistent performance and achieves high recovery accuracy even when the missing data rate is as high as 90%. Compared with Mendel-Impute, our low-rank based method achieves similar accuracy and efficiency, while the co-clustering based method has advantages in running time. The testing results show that SparRec has significant advantages and competitive performance over other state-of-the-art existing statistics methods including Beagle and fastPhase.
Collapse
|
21
|
Cai T, Cai TT, Zhang A. Structured Matrix Completion with Applications to Genomic Data Integration. J Am Stat Assoc 2016; 111:621-633. [PMID: 28042188 PMCID: PMC5198844 DOI: 10.1080/01621459.2015.1021005] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2014] [Revised: 01/01/2015] [Indexed: 10/23/2022]
Abstract
Matrix completion has attracted significant recent attention in many fields including statistics, applied mathematics and electrical engineering. Current literature on matrix completion focuses primarily on independent sampling models under which the individual observed entries are sampled independently. Motivated by applications in genomic data integration, we propose a new framework of structured matrix completion (SMC) to treat structured missingness by design. Specifically, our proposed method aims at efficient matrix recovery when a subset of the rows and columns of an approximately low-rank matrix are observed. We provide theoretical justification for the proposed SMC method and derive lower bound for the estimation errors, which together establish the optimal rate of recovery over certain classes of approximately low-rank matrices. Simulation studies show that the method performs well in finite sample under a variety of configurations. The method is applied to integrate several ovarian cancer genomic studies with different extent of genomic measurements, which enables us to construct more accurate prediction rules for ovarian cancer survival.
Collapse
Affiliation(s)
- Tianxi Cai
- Professor of Biostatistics, Department of Biostatistics, Harvard University, Boston, MA
| | - T Tony Cai
- Dorothy Silberberg Professor of Statistics, Department of Statistics, The Wharton School, University of Pennsylvania, Philadelphia, PA
| | - Anru Zhang
- Student, Department of Statistics, The Wharton School, University of Pennsylvania, Philadelphia, PA
| |
Collapse
|
22
|
Hodos RA, Kidd BA, Khader S, Readhead BP, Dudley JT. In silico methods for drug repurposing and pharmacology. WILEY INTERDISCIPLINARY REVIEWS. SYSTEMS BIOLOGY AND MEDICINE 2016; 8:186-210. [PMID: 27080087 PMCID: PMC4845762 DOI: 10.1002/wsbm.1337] [Citation(s) in RCA: 179] [Impact Index Per Article: 22.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/30/2015] [Revised: 02/08/2016] [Accepted: 02/11/2016] [Indexed: 12/18/2022]
Abstract
Data in the biological, chemical, and clinical domains are accumulating at ever-increasing rates and have the potential to accelerate and inform drug development in new ways. Challenges and opportunities now lie in developing analytic tools to transform these often complex and heterogeneous data into testable hypotheses and actionable insights. This is the aim of computational pharmacology, which uses in silico techniques to better understand and predict how drugs affect biological systems, which can in turn improve clinical use, avoid unwanted side effects, and guide selection and development of better treatments. One exciting application of computational pharmacology is drug repurposing-finding new uses for existing drugs. Already yielding many promising candidates, this strategy has the potential to improve the efficiency of the drug development process and reach patient populations with previously unmet needs such as those with rare diseases. While current techniques in computational pharmacology and drug repurposing often focus on just a single data modality such as gene expression or drug-target interactions, we argue that methods such as matrix factorization that can integrate data within and across diverse data types have the potential to improve predictive performance and provide a fuller picture of a drug's pharmacological action. WIREs Syst Biol Med 2016, 8:186-210. doi: 10.1002/wsbm.1337 For further resources related to this article, please visit the WIREs website.
Collapse
Affiliation(s)
- Rachel A Hodos
- New York University and Icahn School of Medicine at Mt. Sinai, New York, NY
| | - Brian A Kidd
- Icahn School of Medicine at Mt. Sinai, New York, NY
| | | | | | | |
Collapse
|
23
|
Imputing Genotypes in Biallelic Populations from Low-Coverage Sequence Data. Genetics 2015; 202:487-95. [PMID: 26715670 DOI: 10.1534/genetics.115.182071] [Citation(s) in RCA: 34] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2015] [Accepted: 12/16/2015] [Indexed: 12/31/2022] Open
Abstract
Low-coverage next-generation sequencing methodologies are routinely employed to genotype large populations. Missing data in these populations manifest both as missing markers and markers with incomplete allele recovery. False homozygous calls at heterozygous sites resulting from incomplete allele recovery confound many existing imputation algorithms. These types of systematic errors can be minimized by incorporating depth-of-sequencing read coverage into the imputation algorithm. Accordingly, we developed Low-Coverage Biallelic Impute (LB-Impute) to resolve missing data issues. LB-Impute uses a hidden Markov model that incorporates marker read coverage to determine variable emission probabilities. Robust, highly accurate imputation results were reliably obtained with LB-Impute, even at extremely low (<1×) average per-marker coverage. This finding will have implications for the design of genotype imputation algorithms in the future. LB-Impute is publicly available on GitHub at https://github.com/dellaporta-laboratory/LB-Impute.
Collapse
|
24
|
Abstract
Matrix completion discriminant analysis (MCDA) is designed for semi-supervised learning where the rate of missingness is high and predictors vastly outnumber cases. MCDA operates by mapping class labels to the vertices of a regular simplex. With c classes, these vertices are arranged on the surface of the unit sphere in c - 1 dimensional Euclidean space. Because all pairs of vertices are equidistant, the classes are treated symmetrically. To assign unlabeled cases to classes, the data is entered into a large matrix (cases along rows and predictors along columns) that is augmented by vertex coordinates stored in the last c - 1 columns. Once the matrix is constructed, its missing entries can be filled in by matrix completion. To carry out matrix completion, one minimizes a sum of squares plus a nuclear norm penalty. The simplest solution invokes an MM algorithm and singular value decomposition. Choice of the penalty tuning constant can be achieved by cross validation on randomly withheld case labels. Once the matrix is completed, an unlabeled case is assigned to the class vertex closest to the point deposited in its last c - 1 columns. A variety of examples drawn from the statistical literature demonstrate that MCDA is competitive on traditional problems and outperforms alternatives on large-scale problems.
Collapse
Affiliation(s)
- Tong Tong Wu
- Associate Professor in the Departments of Biostatistics and Computational Biology, University of Rochester, NY 14642
| | - Kenneth Lange
- Professor of Biomathematics, Human Genetics, and Statistics at the University of California, Los Angeles, CA 90095
| |
Collapse
|
25
|
Wang Y, Wylie T, Stothard P, Lin G. Whole genome SNP genotype piecemeal imputation. BMC Bioinformatics 2015; 16:340. [PMID: 26498158 PMCID: PMC4619096 DOI: 10.1186/s12859-015-0770-2] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2015] [Accepted: 10/09/2015] [Indexed: 11/10/2022] Open
Abstract
Background Despite ongoing reductions in the cost of sequencing technologies, whole genome SNP genotype imputation is often used as an alternative for obtaining abundant SNP genotypes for genome wide association studies. Several existing genotype imputation methods can be efficient for this purpose, while achieving various levels of imputation accuracy. Recent empirical results have shown that the two-step imputation may improve accuracy by imputing the low density genotyped study animals to a medium density array first and then to the target density. We are interested in building a series of staircase arrays that lead the low density array to the high density array or even the whole genome, such that genotype imputation along these staircases can achieve the highest accuracy. Results For genotype imputation from a lower density to a higher density, we first show how to select untyped SNPs to construct a medium density array. Subsequently, we determine for each selected SNP those untyped SNPs to be imputed in the add-one two-step imputation, and lastly how the clusters of imputed genotype are pieced together as the final imputation result. We design extensive empirical experiments using several hundred sequenced and genotyped animals to demonstrate that our novel two-step piecemeal imputation always achieves an improvement compared to the one-step imputation by the state-of-the-art methods Beagle and FImpute. Using the two-step piecemeal imputation, we present some preliminary success on whole genome SNP genotype imputation for genotyped animals via a series of staircase arrays. Conclusions From a low SNP density to the whole genome, intermediate pseudo-arrays can be computationally constructed by selecting the most informative SNPs for untyped SNP genotype imputation. Such pseudo-array staircases are able to impute more accurately than the classic one-step imputation.
Collapse
Affiliation(s)
- Yining Wang
- Department of Computing Science, University of Alberta, Edmonton, Alberta T6G 2E8, Canada.
| | - Tim Wylie
- Department of Computing Science, University of Alberta, Edmonton, Alberta T6G 2E8, Canada. .,Currently with Department of Computer Science, University of Texas - Rio Grande Valley, Edinburg, 78539, Texas, USA.
| | - Paul Stothard
- Department of Agricultural, Food, and Nutritional Science, University of Alberta, Edmonton, T6G 2C8, Alberta, Canada.
| | - Guohui Lin
- Department of Computing Science, University of Alberta, Edmonton, Alberta T6G 2E8, Canada.
| |
Collapse
|
26
|
Cahsai A, Anagnostopoulos C, Triantafillou P. Scalable Data Quality for Big Data: The Pythia Framework for Handling Missing Values. BIG DATA 2015; 3:159-172. [PMID: 27442958 DOI: 10.1089/big.2015.0002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Solving the missing-value (MV) problem with small estimation errors in large-scale data environments is a notoriously resource-demanding task. The most widely used MV imputation approaches are computationally expensive because they explicitly depend on the volume and the dimension of the data. Moreover, as datasets and their user community continuously grow, the problem can only be exacerbated. In an attempt to deal with such a problem, in our previous work, we introduced a novel framework coined Pythia, which employs a number of distributed data nodes (cohorts), each of which contains a partition of the original dataset. To perform MV imputation, the Pythia, based on specific machine and statistical learning structures (signatures), selects the most appropriate subset of cohorts to perform locally a missing value substitution algorithm (MVA). This selection relies on the principle that particular subset of cohorts maintains the most relevant partition of the dataset. In addition to this, as Pythia uses only part of the dataset for imputation and accesses different cohorts in parallel, it improves efficiency, scalability, and accuracy compared to a single machine (coined Godzilla), which uses the entire massive dataset to compute imputation requests. Although this article is an extension of our previous work, we particularly investigate the robustness of the Pythia framework and show that the Pythia is independent from any MVA and signature construction algorithms. In order to facilitate our research, we considered two well-known MVAs (namely K-nearest neighbor and expectation-maximization imputation algorithms), as well as two machine and neural computational learning signature construction algorithms based on adaptive vector quantization and competitive learning. We prove comprehensive experiments to assess the performance of the Pythia against Godzilla and showcase the benefits stemmed from this framework.
Collapse
Affiliation(s)
- Atoshum Cahsai
- School of Computing Science, University of Glasgow , Glasgow, United Kingdom
| | | | - Peter Triantafillou
- School of Computing Science, University of Glasgow , Glasgow, United Kingdom
| |
Collapse
|
27
|
Tiesinga P, Bakker R, Hill S, Bjaalie JG. Feeding the human brain model. Curr Opin Neurobiol 2015; 32:107-14. [DOI: 10.1016/j.conb.2015.02.003] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2014] [Revised: 02/06/2015] [Accepted: 02/06/2015] [Indexed: 10/23/2022]
|
28
|
Singer M, Pachter L. Controlling for conservation in genome-wide DNA methylation studies. BMC Genomics 2015; 16:420. [PMID: 26024968 PMCID: PMC4448855 DOI: 10.1186/s12864-015-1604-3] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2015] [Accepted: 05/01/2015] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND A commonplace analysis in high-throughput DNA methylation studies is the comparison of methylation extent between different functional regions, computed by averaging methylation states within region types and then comparing averages between regions. For example, it has been reported that methylation is more prevalent in coding regions as compared to their neighboring introns or UTRs, leading to hypotheses about novel forms of epigenetic regulation. RESULTS We have identified and characterized a bias present in these seemingly straightforward comparisons that results in the false detection of differences in methylation intensities across region types. This bias arises due to differences in conservation rates, rather than methylation rates, and is broadly present in the published literature. When controlling for conservation at coding start sites the differences in DNA methylation rates disappear. Moreover, a re-evaluation of methylation rates at intronexon junctions reveals that the magnitude of previously reported differences is greatly exaggerated. We introduce two correction methods to address this bias, an inferencebased matrix completion algorithm and an averaging approach, tailored to address different underlying biological questions. We evaluate how analysis using these corrections affects the detection of differences in DNA methylation across functional boundaries. CONCLUSIONS We report here on a bias in DNA methylation comparative studies that originates in conservation rate differences and manifests itself in the false discovery of differences in DNA methylation intensities and their extents. We have characterized this bias and its broad implications, and show how to control for it so as to enable the study of a variety of biological questions.
Collapse
Affiliation(s)
- Meromit Singer
- Division of Computer Science, University of California at Berkeley, 94720, Berkeley, CA, USA. .,Current address: Broad Institute of MIT and Harvard, 415 Main Street, 02142, Cambridge, MA, USA.
| | - Lior Pachter
- Division of Computer Science, University of California at Berkeley, 94720, Berkeley, CA, USA. .,Department of Mathematics, University of California at Berkeley, 94720, Berkeley, CA, USA. .,Department of Molecular and Cell Biology, University of California at Berkeley, 94720, Berkeley, CA, USA.
| |
Collapse
|
29
|
Chen W, Schaid DJ. PedBLIMP: extending linear predictors to impute genotypes in pedigrees. Genet Epidemiol 2014; 38:531-41. [PMID: 25044249 DOI: 10.1002/gepi.21838] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2014] [Revised: 05/15/2014] [Accepted: 05/19/2014] [Indexed: 12/13/2022]
Abstract
Recently, Wen and Stephens (Wen and Stephens [2010] Ann Appl Stat 4(3):1158-1182) proposed a linear predictor, called BLIMP, that uses conditional multivariate normal moments to impute genotypes with accuracy similar to current state-of-the-art methods. One novelty is that it regularized the estimated covariance matrix based on a model from population genetics. We extended multivariate moments to impute genotypes in pedigrees. Our proposed method, PedBLIMP, utilizes both the linkage-disequilibrium (LD) information estimated from external panel data and the pedigree structure or identity-by-descent (IBD) information. The proposed method was evaluated on a pedigree design where some individuals were genotyped with dense markers and the rest with sparse markers. We found that incorporating the pedigree/IBD information can improve imputation accuracy compared to BLIMP. Because rare variants usually have low LD with other single-nucleotide polymorphisms (SNPs), incorporating pedigree/IBD information largely improved imputation accuracy for rare variants. We also compared PedBLIMP with IMPUTE2 and GIGI. Results show that when sparse markers are in a certain density range, our method can outperform both IMPUTE2 and GIGI.
Collapse
Affiliation(s)
- Wenan Chen
- Division of Biomedical Statistics and Informatics, Department of Health Sciences Research, Mayo Clinic, Rochester, Minnesota, United States of America
| | | |
Collapse
|
30
|
Pasaniuc B, Zaitlen N, Shi H, Bhatia G, Gusev A, Pickrell J, Hirschhorn J, Strachan DP, Patterson N, Price AL. Fast and accurate imputation of summary statistics enhances evidence of functional enrichment. ACTA ACUST UNITED AC 2014; 30:2906-14. [PMID: 24990607 DOI: 10.1093/bioinformatics/btu416] [Citation(s) in RCA: 121] [Impact Index Per Article: 12.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]
Abstract
MOTIVATION Imputation using external reference panels (e.g. 1000 Genomes) is a widely used approach for increasing power in genome-wide association studies and meta-analysis. Existing hidden Markov models (HMM)-based imputation approaches require individual-level genotypes. Here, we develop a new method for Gaussian imputation from summary association statistics, a type of data that is becoming widely available. RESULTS In simulations using 1000 Genomes (1000G) data, this method recovers 84% (54%) of the effective sample size for common (>5%) and low-frequency (1-5%) variants [increasing to 87% (60%) when summary linkage disequilibrium information is available from target samples] versus the gold standard of 89% (67%) for HMM-based imputation, which cannot be applied to summary statistics. Our approach accounts for the limited sample size of the reference panel, a crucial step to eliminate false-positive associations, and it is computationally very fast. As an empirical demonstration, we apply our method to seven case-control phenotypes from the Wellcome Trust Case Control Consortium (WTCCC) data and a study of height in the British 1958 birth cohort (1958BC). Gaussian imputation from summary statistics recovers 95% (105%) of the effective sample size (as quantified by the ratio of [Formula: see text] association statistics) compared with HMM-based imputation from individual-level genotypes at the 227 (176) published single nucleotide polymorphisms (SNPs) in the WTCCC (1958BC height) data. In addition, for publicly available summary statistics from large meta-analyses of four lipid traits, we publicly release imputed summary statistics at 1000G SNPs, which could not have been obtained using previously published methods, and demonstrate their accuracy by masking subsets of the data. We show that 1000G imputation using our approach increases the magnitude and statistical evidence of enrichment at genic versus non-genic loci for these traits, as compared with an analysis without 1000G imputation. Thus, imputation of summary statistics will be a valuable tool in future functional enrichment analyses. AVAILABILITY AND IMPLEMENTATION Publicly available software package available at http://bogdan.bioinformatics.ucla.edu/software/. CONTACT bpasaniuc@mednet.ucla.edu or aprice@hsph.harvard.edu SUPPLEMENTARY INFORMATION Supplementary materials are available at Bioinformatics online.
Collapse
Affiliation(s)
- Bogdan Pasaniuc
- Department of Pathology and Laboratory Medicine, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, 90024, Bioinformatics Interdepartmental Program, University of California Los Angeles, Los Angeles, 90024, Department of Medicine, Lung Biology Center, University of California San Francisco, San Francisco, 94143, Program in Genetic Epidemiology and Statistical Genetics, Harvard School of Public Health, Boston, 02115, Departments of Epidemiology and Biostatistics, Harvard School of Public Health, Boston, MA, 02115, Program in Medical and Population Genetics, Broad Institute, Cambridge, MA, 02142, Department of Genetics Harvard Medical School, Boston, MA, 02115 and Division of Population Health Sciences and Education, St George's, University of London, UK Department of Pathology and Laboratory Medicine, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, 90024, Bioinformatics Interdepartmental Program, University of California Los Angeles, Los Angeles, 90024, Department of Medicine, Lung Biology Center, University of California San Francisco, San Francisco, 94143, Program in Genetic Epidemiology and Statistical Genetics, Harvard School of Public Health, Boston, 02115, Departments of Epidemiology and Biostatistics, Harvard School of Public Health, Boston, MA, 02115, Program in Medical and Population Genetics, Broad Institute, Cambridge, MA, 02142, Department of Genetics Harvard Medical School, Boston, MA, 02115 and Division of Population Health Sciences and Education, St George's, University of London, UK
| | - Noah Zaitlen
- Department of Pathology and Laboratory Medicine, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, 90024, Bioinformatics Interdepartmental Program, University of California Los Angeles, Los Angeles, 90024, Department of Medicine, Lung Biology Center, University of California San Francisco, San Francisco, 94143, Program in Genetic Epidemiology and Statistical Genetics, Harvard School of Public Health, Boston, 02115, Departments of Epidemiology and Biostatistics, Harvard School of Public Health, Boston, MA, 02115, Program in Medical and Population Genetics, Broad Institute, Cambridge, MA, 02142, Department of Genetics Harvard Medical School, Boston, MA, 02115 and Division of Population Health Sciences and Education, St George's, University of London, UK
| | - Huwenbo Shi
- Department of Pathology and Laboratory Medicine, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, 90024, Bioinformatics Interdepartmental Program, University of California Los Angeles, Los Angeles, 90024, Department of Medicine, Lung Biology Center, University of California San Francisco, San Francisco, 94143, Program in Genetic Epidemiology and Statistical Genetics, Harvard School of Public Health, Boston, 02115, Departments of Epidemiology and Biostatistics, Harvard School of Public Health, Boston, MA, 02115, Program in Medical and Population Genetics, Broad Institute, Cambridge, MA, 02142, Department of Genetics Harvard Medical School, Boston, MA, 02115 and Division of Population Health Sciences and Education, St George's, University of London, UK
| | - Gaurav Bhatia
- Department of Pathology and Laboratory Medicine, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, 90024, Bioinformatics Interdepartmental Program, University of California Los Angeles, Los Angeles, 90024, Department of Medicine, Lung Biology Center, University of California San Francisco, San Francisco, 94143, Program in Genetic Epidemiology and Statistical Genetics, Harvard School of Public Health, Boston, 02115, Departments of Epidemiology and Biostatistics, Harvard School of Public Health, Boston, MA, 02115, Program in Medical and Population Genetics, Broad Institute, Cambridge, MA, 02142, Department of Genetics Harvard Medical School, Boston, MA, 02115 and Division of Population Health Sciences and Education, St George's, University of London, UK Department of Pathology and Laboratory Medicine, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, 90024, Bioinformatics Interdepartmental Program, University of California Los Angeles, Los Angeles, 90024, Department of Medicine, Lung Biology Center, University of California San Francisco, San Francisco, 94143, Program in Genetic Epidemiology and Statistical Genetics, Harvard School of Public Health, Boston, 02115, Departments of Epidemiology and Biostatistics, Harvard School of Public Health, Boston, MA, 02115, Program in Medical and Population Genetics, Broad Institute, Cambridge, MA, 02142, Department of Genetics Harvard Medical School, Boston, MA, 02115 and Division of Population Health Sciences and Education, St George's, University of London, UK Department of Pathology and Laboratory Medicine, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, 90024, Bioinformatics Interdepartmental Program, University of California Los Angeles, Los Angeles, 90024, Department of Medicine, Lung Biology Center, University of California San Francisco, San Francisco, 94143, Program in Genetic Epidemiology and Statistical Genetics, Har
| | - Alexander Gusev
- Department of Pathology and Laboratory Medicine, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, 90024, Bioinformatics Interdepartmental Program, University of California Los Angeles, Los Angeles, 90024, Department of Medicine, Lung Biology Center, University of California San Francisco, San Francisco, 94143, Program in Genetic Epidemiology and Statistical Genetics, Harvard School of Public Health, Boston, 02115, Departments of Epidemiology and Biostatistics, Harvard School of Public Health, Boston, MA, 02115, Program in Medical and Population Genetics, Broad Institute, Cambridge, MA, 02142, Department of Genetics Harvard Medical School, Boston, MA, 02115 and Division of Population Health Sciences and Education, St George's, University of London, UK Department of Pathology and Laboratory Medicine, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, 90024, Bioinformatics Interdepartmental Program, University of California Los Angeles, Los Angeles, 90024, Department of Medicine, Lung Biology Center, University of California San Francisco, San Francisco, 94143, Program in Genetic Epidemiology and Statistical Genetics, Harvard School of Public Health, Boston, 02115, Departments of Epidemiology and Biostatistics, Harvard School of Public Health, Boston, MA, 02115, Program in Medical and Population Genetics, Broad Institute, Cambridge, MA, 02142, Department of Genetics Harvard Medical School, Boston, MA, 02115 and Division of Population Health Sciences and Education, St George's, University of London, UK Department of Pathology and Laboratory Medicine, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, 90024, Bioinformatics Interdepartmental Program, University of California Los Angeles, Los Angeles, 90024, Department of Medicine, Lung Biology Center, University of California San Francisco, San Francisco, 94143, Program in Genetic Epidemiology and Statistical Genetics, Har
| | - Joseph Pickrell
- Department of Pathology and Laboratory Medicine, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, 90024, Bioinformatics Interdepartmental Program, University of California Los Angeles, Los Angeles, 90024, Department of Medicine, Lung Biology Center, University of California San Francisco, San Francisco, 94143, Program in Genetic Epidemiology and Statistical Genetics, Harvard School of Public Health, Boston, 02115, Departments of Epidemiology and Biostatistics, Harvard School of Public Health, Boston, MA, 02115, Program in Medical and Population Genetics, Broad Institute, Cambridge, MA, 02142, Department of Genetics Harvard Medical School, Boston, MA, 02115 and Division of Population Health Sciences and Education, St George's, University of London, UK Department of Pathology and Laboratory Medicine, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, 90024, Bioinformatics Interdepartmental Program, University of California Los Angeles, Los Angeles, 90024, Department of Medicine, Lung Biology Center, University of California San Francisco, San Francisco, 94143, Program in Genetic Epidemiology and Statistical Genetics, Harvard School of Public Health, Boston, 02115, Departments of Epidemiology and Biostatistics, Harvard School of Public Health, Boston, MA, 02115, Program in Medical and Population Genetics, Broad Institute, Cambridge, MA, 02142, Department of Genetics Harvard Medical School, Boston, MA, 02115 and Division of Population Health Sciences and Education, St George's, University of London, UK
| | - Joel Hirschhorn
- Department of Pathology and Laboratory Medicine, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, 90024, Bioinformatics Interdepartmental Program, University of California Los Angeles, Los Angeles, 90024, Department of Medicine, Lung Biology Center, University of California San Francisco, San Francisco, 94143, Program in Genetic Epidemiology and Statistical Genetics, Harvard School of Public Health, Boston, 02115, Departments of Epidemiology and Biostatistics, Harvard School of Public Health, Boston, MA, 02115, Program in Medical and Population Genetics, Broad Institute, Cambridge, MA, 02142, Department of Genetics Harvard Medical School, Boston, MA, 02115 and Division of Population Health Sciences and Education, St George's, University of London, UK
| | - David P Strachan
- Department of Pathology and Laboratory Medicine, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, 90024, Bioinformatics Interdepartmental Program, University of California Los Angeles, Los Angeles, 90024, Department of Medicine, Lung Biology Center, University of California San Francisco, San Francisco, 94143, Program in Genetic Epidemiology and Statistical Genetics, Harvard School of Public Health, Boston, 02115, Departments of Epidemiology and Biostatistics, Harvard School of Public Health, Boston, MA, 02115, Program in Medical and Population Genetics, Broad Institute, Cambridge, MA, 02142, Department of Genetics Harvard Medical School, Boston, MA, 02115 and Division of Population Health Sciences and Education, St George's, University of London, UK
| | - Nick Patterson
- Department of Pathology and Laboratory Medicine, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, 90024, Bioinformatics Interdepartmental Program, University of California Los Angeles, Los Angeles, 90024, Department of Medicine, Lung Biology Center, University of California San Francisco, San Francisco, 94143, Program in Genetic Epidemiology and Statistical Genetics, Harvard School of Public Health, Boston, 02115, Departments of Epidemiology and Biostatistics, Harvard School of Public Health, Boston, MA, 02115, Program in Medical and Population Genetics, Broad Institute, Cambridge, MA, 02142, Department of Genetics Harvard Medical School, Boston, MA, 02115 and Division of Population Health Sciences and Education, St George's, University of London, UK
| | - Alkes L Price
- Department of Pathology and Laboratory Medicine, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, 90024, Bioinformatics Interdepartmental Program, University of California Los Angeles, Los Angeles, 90024, Department of Medicine, Lung Biology Center, University of California San Francisco, San Francisco, 94143, Program in Genetic Epidemiology and Statistical Genetics, Harvard School of Public Health, Boston, 02115, Departments of Epidemiology and Biostatistics, Harvard School of Public Health, Boston, MA, 02115, Program in Medical and Population Genetics, Broad Institute, Cambridge, MA, 02142, Department of Genetics Harvard Medical School, Boston, MA, 02115 and Division of Population Health Sciences and Education, St George's, University of London, UK Department of Pathology and Laboratory Medicine, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, 90024, Bioinformatics Interdepartmental Program, University of California Los Angeles, Los Angeles, 90024, Department of Medicine, Lung Biology Center, University of California San Francisco, San Francisco, 94143, Program in Genetic Epidemiology and Statistical Genetics, Harvard School of Public Health, Boston, 02115, Departments of Epidemiology and Biostatistics, Harvard School of Public Health, Boston, MA, 02115, Program in Medical and Population Genetics, Broad Institute, Cambridge, MA, 02142, Department of Genetics Harvard Medical School, Boston, MA, 02115 and Division of Population Health Sciences and Education, St George's, University of London, UK Department of Pathology and Laboratory Medicine, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, 90024, Bioinformatics Interdepartmental Program, University of California Los Angeles, Los Angeles, 90024, Department of Medicine, Lung Biology Center, University of California San Francisco, San Francisco, 94143, Program in Genetic Epidemiology and Statistical Genetics, Har
| |
Collapse
|
31
|
Lange K, Chi EC, Zhou H. A Brief Survey of Modern Optimization for Statisticians. Int Stat Rev 2014; 82:46-70. [PMID: 25242858 PMCID: PMC4166522 DOI: 10.1111/insr.12022] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2012] [Accepted: 04/20/2013] [Indexed: 11/30/2022]
Abstract
Modern computational statistics is turning more and more to high-dimensional optimization to handle the deluge of big data. Once a model is formulated, its parameters can be estimated by optimization. Because model parsimony is important, models routinely include nondifferentiable penalty terms such as the lasso. This sober reality complicates minimization and maximization. Our broad survey stresses a few important principles in algorithm design. Rather than view these principles in isolation, it is more productive to mix and match them. A few well chosen examples illustrate this point. Algorithm derivation is also emphasized, and theory is downplayed, particularly the abstractions of the convex calculus. Thus, our survey should be useful and accessible to a broad audience.
Collapse
Affiliation(s)
- Kenneth Lange
- Departments of Biomathematics, Human Genetics, and Statistics University of California Los Angeles, CA 90095-1766
| | - Eric C Chi
- Department of Human Genetics University of California Los Angeles, CA 90095
| | - Hua Zhou
- Department of Statistics North Carolina State University Raleigh, NC 27695-8203
| |
Collapse
|
32
|
Lange K, Chi EC, Zhou H. Rejoinder. Int Stat Rev 2014. [DOI: 10.1111/insr.12030] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Affiliation(s)
- Kenneth Lange
- Departments of Biomathematics and Statistics; University of California; Los Angeles CA 90095-1766 USA
- Department of Human Genetics; University of California; Los Angeles CA 90095-1766 USA
| | - Eric C. Chi
- Department of Human Genetics; University of California; Los Angeles CA 90095-1766 USA
| | - Hua Zhou
- Department of Statistics; North Carolina State University; Raleigh NC 27695-8203 USA
| |
Collapse
|
33
|
Zhang L, Pei YF, Fu X, Lin Y, Wang YP, Deng HW. FISH: fast and accurate diploid genotype imputation via segmental hidden Markov model. ACTA ACUST UNITED AC 2014; 30:1876-83. [PMID: 24618466 DOI: 10.1093/bioinformatics/btu143] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]
Abstract
MOTIVATION Fast and accurate genotype imputation is necessary for facilitating gene-mapping studies, especially with the ever increasing numbers of both common and rare variants generated by high-throughput-sequencing experiments. However, most of the existing imputation approaches suffer from either inaccurate results or heavy computational demand. RESULTS In this article, aiming to perform fast and accurate genotype-imputation analysis, we propose a novel, fast and yet accurate method to impute diploid genotypes. Specifically, we extend a hidden Markov model that is widely used to describe haplotype structures. But we model hidden states onto single reference haplotypes rather than onto pairs of haplotypes. Consequently the computational complexity is linear to size of reference haplotypes. We further develop an algorithm 'merge-and-recover (MAR)' to speed up the calculation. Working on compact representation of segmental reference haplotypes, the MAR algorithm always calculates an exact form of transition probabilities regardless of partition of segments. Both simulation studies and real-data analyses demonstrated that our proposed method was comparable to most of the existing popular methods in terms of imputation accuracy, but was much more efficient in terms of computation. The MAR algorithm can further speed up the calculation by several folds without loss of accuracy. The proposed method will be useful in large-scale imputation studies with a large number of reference subjects. AVAILABILITY The implemented multi-threading software FISH is freely available for academic use at https://sites.google.com/site/lzhanghomepage/FISH.
Collapse
Affiliation(s)
- Lei Zhang
- School of Public Health, Xi'an Jiaotong University, Shaanxi, China, Department of Biostatistics and Bioinformatics, Tulane University, New Orleans, USA and Center of System Biomedical Sciences, University of Shanghai for Science and Technology, Shanghai, ChinaSchool of Public Health, Xi'an Jiaotong University, Shaanxi, China, Department of Biostatistics and Bioinformatics, Tulane University, New Orleans, USA and Center of System Biomedical Sciences, University of Shanghai for Science and Technology, Shanghai, China
| | - Yu-Fang Pei
- School of Public Health, Xi'an Jiaotong University, Shaanxi, China, Department of Biostatistics and Bioinformatics, Tulane University, New Orleans, USA and Center of System Biomedical Sciences, University of Shanghai for Science and Technology, Shanghai, ChinaSchool of Public Health, Xi'an Jiaotong University, Shaanxi, China, Department of Biostatistics and Bioinformatics, Tulane University, New Orleans, USA and Center of System Biomedical Sciences, University of Shanghai for Science and Technology, Shanghai, China
| | - Xiaoying Fu
- School of Public Health, Xi'an Jiaotong University, Shaanxi, China, Department of Biostatistics and Bioinformatics, Tulane University, New Orleans, USA and Center of System Biomedical Sciences, University of Shanghai for Science and Technology, Shanghai, China
| | - Yong Lin
- School of Public Health, Xi'an Jiaotong University, Shaanxi, China, Department of Biostatistics and Bioinformatics, Tulane University, New Orleans, USA and Center of System Biomedical Sciences, University of Shanghai for Science and Technology, Shanghai, China
| | - Yu-Ping Wang
- School of Public Health, Xi'an Jiaotong University, Shaanxi, China, Department of Biostatistics and Bioinformatics, Tulane University, New Orleans, USA and Center of System Biomedical Sciences, University of Shanghai for Science and Technology, Shanghai, China
| | - Hong-Wen Deng
- School of Public Health, Xi'an Jiaotong University, Shaanxi, China, Department of Biostatistics and Bioinformatics, Tulane University, New Orleans, USA and Center of System Biomedical Sciences, University of Shanghai for Science and Technology, Shanghai, China
| |
Collapse
|
34
|
Lange K, Papp JC, Sinsheimer JS, Sobel EM. Next Generation Statistical Genetics: Modeling, Penalization, and Optimization in High-Dimensional Data. ANNUAL REVIEW OF STATISTICS AND ITS APPLICATION 2014; 1:279-300. [PMID: 24955378 PMCID: PMC4062304 DOI: 10.1146/annurev-statistics-022513-115638] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/30/2023]
Abstract
Statistical genetics is undergoing the same transition to big data that all branches of applied statistics are experiencing. With the advent of inexpensive DNA sequencing, the transition is only accelerating. This brief review highlights some modern techniques with recent successes in statistical genetics. These include: (a) lasso penalized regression and association mapping, (b) ethnic admixture estimation, (c) matrix completion for genotype and sequence data, (d) the fused lasso and copy number variation, (e) haplotyping, (f) estimation of relatedness, (g) variance components models, and (h) rare variant testing. For more than a century, genetics has been both a driver and beneficiary of statistical theory and practice. This symbiotic relationship will persist for the foreseeable future.
Collapse
Affiliation(s)
- Kenneth Lange
- Depts of Biomathematics, Human Genetics, and Statistics, UCLA
| | | | - Janet S. Sinsheimer
- Depts of Biomathematics, Human Genetics, Statistics, and Biostatistics, UCLA
| | | |
Collapse
|
35
|
Zhang L, Choi HJ, Estrada K, Leo PJ, Li J, Pei YF, Zhang Y, Lin Y, Shen H, Liu YZ, Liu Y, Zhao Y, Zhang JG, Tian Q, Wang YP, Han Y, Ran S, Hai R, Zhu XZ, Wu S, Yan H, Liu X, Yang TL, Guo Y, Zhang F, Guo YF, Chen Y, Chen X, Tan L, Zhang L, Deng FY, Deng H, Rivadeneira F, Duncan EL, Lee JY, Han BG, Cho NH, Nicholson GC, McCloskey E, Eastell R, Prince RL, Eisman JA, Jones G, Reid IR, Sambrook PN, Dennison EM, Danoy P, Yerges-Armstrong LM, Streeten EA, Hu T, Xiang S, Papasian CJ, Brown MA, Shin CS, Uitterlinden AG, Deng HW. Multistage genome-wide association meta-analyses identified two new loci for bone mineral density. Hum Mol Genet 2013; 23:1923-33. [PMID: 24249740 DOI: 10.1093/hmg/ddt575] [Citation(s) in RCA: 116] [Impact Index Per Article: 10.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/20/2023] Open
Abstract
Aiming to identify novel genetic variants and to confirm previously identified genetic variants associated with bone mineral density (BMD), we conducted a three-stage genome-wide association (GWA) meta-analysis in 27 061 study subjects. Stage 1 meta-analyzed seven GWA samples and 11 140 subjects for BMDs at the lumbar spine, hip and femoral neck, followed by a Stage 2 in silico replication of 33 SNPs in 9258 subjects, and by a Stage 3 de novo validation of three SNPs in 6663 subjects. Combining evidence from all the stages, we have identified two novel loci that have not been reported previously at the genome-wide significance (GWS; 5.0 × 10(-8)) level: 14q24.2 (rs227425, P-value 3.98 × 10(-13), SMOC1) in the combined sample of males and females and 21q22.13 (rs170183, P-value 4.15 × 10(-9), CLDN14) in the female-specific sample. The two newly identified SNPs were also significant in the GEnetic Factors for OSteoporosis consortium (GEFOS, n = 32 960) summary results. We have also independently confirmed 13 previously reported loci at the GWS level: 1p36.12 (ZBTB40), 1p31.3 (GPR177), 4p16.3 (FGFRL1), 4q22.1 (MEPE), 5q14.3 (MEF2C), 6q25.1 (C6orf97, ESR1), 7q21.3 (FLJ42280, SHFM1), 7q31.31 (FAM3C, WNT16), 8q24.12 (TNFRSF11B), 11p15.3 (SOX6), 11q13.4 (LRP5), 13q14.11 (AKAP11) and 16q24 (FOXL1). Gene expression analysis in osteogenic cells implied potential functional association of the two candidate genes (SMOC1 and CLDN14) in bone metabolism. Our findings independently confirm previously identified biological pathways underlying bone metabolism and contribute to the discovery of novel pathways, thus providing valuable insights into the intervention and treatment of osteoporosis.
Collapse
Affiliation(s)
- Lei Zhang
- Center of System Biomedical Sciences, University of Shanghai for Science and Technology, Shanghai, China
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
36
|
Lange K, Papp JC, Sinsheimer JS, Sripracha R, Zhou H, Sobel EM. Mendel: the Swiss army knife of genetic analysis programs. ACTA ACUST UNITED AC 2013; 29:1568-70. [PMID: 23610370 DOI: 10.1093/bioinformatics/btt187] [Citation(s) in RCA: 93] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023]
Abstract
UNLABELLED Mendel is one of the few statistical genetics packages that provide a full spectrum of gene mapping methods, ranging from parametric linkage in large pedigrees to genome-wide association with rare variants. Our latest additions to Mendel anticipate and respond to the needs of the genetics community. Compared with earlier versions, Mendel is faster and easier to use and has a wider range of applications. Supported platforms include Linux, MacOS and Windows. AVAILABILITY Free from www.genetics.ucla.edu/software/mendel.
Collapse
Affiliation(s)
- Kenneth Lange
- Department of Human Genetics and Department of Biomathematics, David Geffen School of Medicine at UCLA, Los Angeles, CA 90095, USA.
| | | | | | | | | | | |
Collapse
|