1
|
Emani PS, Geradi MN, Gürsoy G, Grasty MR, Miranker A, Gerstein MB. Assessing and mitigating privacy risks of sparse, noisy genotypes by local alignment to haplotype databases. Genome Res 2023; 33:gr.278322.123. [PMID: 38097386 PMCID: PMC10760520 DOI: 10.1101/gr.278322.123] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2023] [Accepted: 11/18/2023] [Indexed: 01/04/2024]
Abstract
Single nucleotide polymorphisms (SNPs) from omics data create a reidentification risk for individuals and their relatives. Although the ability of thousands of SNPs (especially rare ones) to identify individuals has been repeatedly shown, the availability of small sets of noisy genotypes, from environmental DNA samples or functional genomics data, motivated us to quantify their informativeness. We present a computational tool suite, termed Privacy Leakage by Inference across Genotypic HMM Trajectories (PLIGHT), using population-genetics-based hidden Markov models (HMMs) of recombination and mutation to find piecewise alignment of small, noisy SNP sets to reference haplotype databases. We explore cases in which query individuals are either known to be in the database, or not, and consider several genotype queries, including those from environmental sample swabs from known individuals and from simulated "mosaics" (two-individual composites). Using PLIGHT on a database with ∼5000 haplotypes, we find for common, noise-free SNPs that only ten are sufficient to identify individuals, ∼20 can identify both components in two-individual mosaics, and 20-30 can identify first-order relatives. Using noisy environmental-sample-derived SNPs, PLIGHT identifies individuals in a database using ∼30 SNPs. Even when the individuals are not in the database, local genotype matches allow for some phenotypic information leakage based on coarse-grained SNP imputation. Finally, by quantifying privacy leakage from sparse SNP sets, PLIGHT helps determine the value of selectively sanitizing released SNPs without explicit assumptions about population membership or allele frequency. To make this practical, we provide a sanitization tool to remove the most identifying SNPs from genomic data.
Collapse
Affiliation(s)
- Prashant S Emani
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, Connecticut 06520, USA
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, Connecticut 06520, USA
| | - Maya N Geradi
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, Connecticut 06520, USA
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, Connecticut 06520, USA
| | - Gamze Gürsoy
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, Connecticut 06520, USA
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, Connecticut 06520, USA
| | - Monica R Grasty
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, Connecticut 06520, USA
| | - Andrew Miranker
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, Connecticut 06520, USA
| | - Mark B Gerstein
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, Connecticut 06520, USA;
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, Connecticut 06520, USA
- Department of Computer Science, Yale University, New Haven, Connecticut 06520, USA
- Department of Statistics and Data Science, Yale University, New Haven, Connecticut 06520, USA
| |
Collapse
|
2
|
Jin Y, Terhorst J. The solution surface of the Li-Stephens haplotype copying model. Algorithms Mol Biol 2023; 18:12. [PMID: 37559098 PMCID: PMC10410957 DOI: 10.1186/s13015-023-00237-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2023] [Accepted: 07/30/2023] [Indexed: 08/11/2023] Open
Abstract
The Li-Stephens (LS) haplotype copying model forms the basis of a number of important statistical inference procedures in genetics. LS is a probabilistic generative model which supposes that a sampled chromosome is an imperfect mosaic of other chromosomes found in a population. In the frequentist setting which is the focus of this paper, the output of LS is a "copying path" through chromosome space. The behavior of LS depends crucially on two user-specified parameters, [Formula: see text] and [Formula: see text], which are respectively interpreted as the rates of mutation and recombination. However, because LS is not based on a realistic model of ancestry, the precise connection between these parameters and the biological phenomena they represent is unclear. Here, we offer an alternative perspective, which considers [Formula: see text] and [Formula: see text] as tuning parameters, and seeks to understand their impact on the LS output. We derive an algorithm which, for a given dataset, efficiently partitions the [Formula: see text] plane into regions where the output of the algorithm is constant, thereby enumerating all possible solutions to the LS model in one go. We extend this approach to the "diploid LS" model commonly used for phasing. We demonstrate the usefulness of our method by studying the effects of changing [Formula: see text] and [Formula: see text] when using LS for common bioinformatic tasks. Our findings indicate that using the conventional (i.e., population-scaled) values for [Formula: see text] and [Formula: see text] produces near optimal results for imputation, but may systematically inflate switch error in the case of phasing diploid genotypes.
Collapse
Affiliation(s)
- Yifan Jin
- Department of Statistics, University of Michigan, 1085 South University Avenue, Ann Arbor, MI, 48103, USA
| | - Jonathan Terhorst
- Department of Statistics, University of Michigan, 1085 South University Avenue, Ann Arbor, MI, 48103, USA.
| |
Collapse
|
3
|
Sanaullah A, Zhi D, Zhang S. Minimal positional substring cover is a haplotype threading alternative to Li and Stephens model. Genome Res 2023; 33:1007-1014. [PMID: 37316352 PMCID: PMC10538481 DOI: 10.1101/gr.277673.123] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2023] [Accepted: 06/06/2023] [Indexed: 06/16/2023]
Abstract
The Li and Stephens (LS) hidden Markov model (HMM) models the process of reconstructing a haplotype as a mosaic copy of haplotypes in a reference panel. For small panels, the probabilistic parameterization of LS enables modeling the uncertainties of such mosaics. However, LS becomes inefficient when sample size is large, because of its linear time complexity. Recently the PBWT, an efficient data structure capturing the local haplotype matching among haplotypes, was proposed to offer a fast method for giving some optimal solution (Viterbi) to the LS HMM. Previously, we introduced the minimal positional substring cover (MPSC) problem as an alternative formulation of LS whose objective is to cover a query haplotype by a minimum number of segments from haplotypes in a reference panel. The MPSC formulation allows the generation of a haplotype threading in time constant to sample size (O(N)). This allows haplotype threading on very large biobank-scale panels on which the LS model is infeasible. Here, we present new results on the solution space of the MPSC. In addition, we derived a number of optimal algorithms for MPSC, including solution enumerations, the length maximal MPSC, and h-MPSC solutions. In doing so, our algorithms reveal the solution space of LS for large panels. We show that our method is informative in terms of revealing the characteristics of biobank-scale data sets and can improve genotype imputation.
Collapse
Affiliation(s)
- Ahsan Sanaullah
- Department of Computer Science, University of Central Florida, Orlando, Florida 32816, USA
| | - Degui Zhi
- Center for AI and Genome Informatics, School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, Texas 77030, USA
| | - Shaojie Zhang
- Department of Computer Science, University of Central Florida, Orlando, Florida 32816, USA;
| |
Collapse
|
4
|
Sanaullah A, Zhi D, Zhang S. Minimal Positional Substring Cover: A Haplotype Threading Alternative to Li & Stephens Model. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.01.04.522803. [PMID: 36711469 PMCID: PMC9881975 DOI: 10.1101/2023.01.04.522803] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/09/2023]
Abstract
The Li & Stephens (LS) hidden Markov model (HMM) models the process of reconstructing a haplotype as a mosaic copy of haplotypes in a reference panel (haplotype threading). For small panels the probabilistic parameterization of LS enables modeling the uncertainties of such mosaics, and has been the foundational model for haplotype phasing and imputation. However, LS becomes inefficient when sample size is large (tens of thousands to millions), because of its linear time complexity ( O ( MN ), where M is the number of haplotypes and N is the number of sites in the panel). Recently the PBWT, an efficient data structure capturing the local haplotype matching among haplotypes, was proposed to offer fast methods for giving some optimal solution (Viterbi) to the LS HMM. But the solution space of the LS for large panels is still elusive. Previously we introduced the Minimal Positional Substring Cover (MPSC) problem as an alternative formulation of LS whose objective is to cover a query haplotype by a minimum number of segments from haplotypes in a reference panel. The MPSC formulation allows the generation of a haplotype threading in time constant to sample size ( O ( N )). This allows haplotype threading on very large biobank scale panels on which the LS model is infeasible. Here we present new results on the solution space of the MPSC by first identifying a property that any MPSC will have a set of required regions, and then proposing a MPSC graph. In addition, we derived a number of optimal algorithms for MPSC, including solution enumerations, the Length Maximal MPSC, and h -MPSC solutions. In doing so, our algorithms reveal the solution space of LS for large panels. Even though we only solved an extreme case of LS where the emission probability is 0, our algorithms can be made more robust by PBWT smoothing. We show that our method is informative in terms of revealing the characteristics of biobank-scale data sets and can improve genotype imputation.
Collapse
Affiliation(s)
- Ahsan Sanaullah
- Department of Computer Science, University of Central Florida, Orlando, FL 32816, USA
| | - Degui Zhi
- Center for AI and Genome Informatics, School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, TX 77030, USA
| | - Shaojie Zhang
- Department of Computer Science, University of Central Florida, Orlando, FL 32816, USA
| |
Collapse
|
5
|
De Marino A, Mahmoud AA, Bose M, Bircan KO, Terpolovsky A, Bamunusinghe V, Bohn S, Khan U, Novković B, Yazdi PG. A comparative analysis of current phasing and imputation software. PLoS One 2022; 17:e0260177. [PMID: 36260643 PMCID: PMC9581364 DOI: 10.1371/journal.pone.0260177] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2021] [Accepted: 09/01/2022] [Indexed: 12/02/2022] Open
Abstract
Whole-genome data has become significantly more accessible over the last two decades. This can largely be attributed to both reduced sequencing costs and imputation models which make it possible to obtain nearly whole-genome data from less expensive genotyping methods, such as microarray chips. Although there are many different approaches to imputation, the Hidden Markov Model (HMM) remains the most widely used. In this study, we compared the latest versions of the most popular HMM-based tools for phasing and imputation: Beagle5.4, Eagle2.4.1, Shapeit4, Impute5 and Minimac4. We benchmarked them on four input datasets with three levels of chip density. We assessed each imputation software on the basis of accuracy, speed and memory usage, and showed how the choice of imputation accuracy metric can result in different interpretations. The highest average concordance rate was achieved by Beagle5.4, followed by Impute5 and Minimac4, using a reference-based approach during phasing and the highest density chip. IQS and R2 metrics revealed that Impute5 and Minimac4 obtained better results for low frequency markers, while Beagle5.4 remained more accurate for common markers (MAF>5%). Computational load as measured by run time was lower for Beagle5.4 than Minimac4 and Impute5, while Minimac4 utilized the least memory of the imputation tools we compared. ShapeIT4, used the least memory of the phasing tools examined with genotype chip data, while Eagle2.4.1 used the least memory phasing WGS data. Finally, we determined the combination of phasing software, imputation software, and reference panel, best suited for different situations and analysis needs and created an automated pipeline that provides a way for users to create customized chips designed to optimize their imputation results.
Collapse
Affiliation(s)
- Adriano De Marino
- Research & Development, SelfDecode, Miami, FL, United States of America
| | | | - Madhuchanda Bose
- Research & Development, SelfDecode, Miami, FL, United States of America
| | | | | | | | - Sandra Bohn
- Research & Development, SelfDecode, Miami, FL, United States of America
| | - Umar Khan
- Research & Development, SelfDecode, Miami, FL, United States of America
| | - Biljana Novković
- Research & Development, SelfDecode, Miami, FL, United States of America
| | - Puya G. Yazdi
- Research & Development, SelfDecode, Miami, FL, United States of America
- * E-mail:
| |
Collapse
|
6
|
Huang Y, Ringbauer H. hapCon: Estimating Contamination of Ancient Genomes by Copying from Reference Haplotypes. Bioinformatics 2022; 38:3768-3777. [PMID: 35695771 PMCID: PMC9344841 DOI: 10.1093/bioinformatics/btac390] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2021] [Revised: 06/03/2022] [Accepted: 06/09/2022] [Indexed: 11/14/2022] Open
Abstract
Motivation Human ancient DNA (aDNA) studies have surged in recent years, revolutionizing the study of the human past. Typically, aDNA is preserved poorly, making such data prone to contamination from other human DNA. Therefore, it is important to rule out substantial contamination before proceeding to downstream analysis. As most aDNA samples can only be sequenced to low coverages (<1× average depth), computational methods that can robustly estimate contamination in the low coverage regime are needed. However, the ultra low-coverage regime (0.1× and below) remains a challenging task for existing approaches. Results We present a new method to estimate contamination in aDNA for male modern humans. It utilizes a Li&Stephens haplotype copying model for haploid X chromosomes, with mismatches modeled as errors or contamination. We assessed this new approach, hapCon, on simulated and down-sampled empirical aDNA data. Our experiments demonstrate that hapCon outperforms a commonly used tool for estimating male X contamination (ANGSD), with substantially lower variance and narrower confidence intervals, especially in the low coverage regime. We found that hapCon provides useful contamination estimates for coverages as low as 0.1× for SNP capture data (1240k) and 0.02× for whole genome sequencing data, substantially extending the coverage limit of previous male X chromosome-based contamination estimation methods. Our experiments demonstrate that hapCon has little bias for contamination up to 25–30% as long as the contaminating source is specified within continental genetic variation, and that its application range extends to human aDNA as old as ∼45 000 and various global ancestries. Availability and implementation We make hapCon available as part of a python package (hapROH), which is available at the Python Package Index (https://pypi.org/project/hapROH) and can be installed via pip. The documentation provides example use cases as blueprints for custom applications (https://haproh.readthedocs.io/en/latest/hapCon.html). The program can analyze either BAM files or pileup files produced with samtools. An implementation of our software (hapCon) using Python and C is deposited at https://github.com/hyl317/hapROH. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Yilei Huang
- Department of Archaeogenetics, Max Planck Institute for Evolutionary Anthropology, Leipzig, 04103, Germany
| | - Harald Ringbauer
- Department of Archaeogenetics, Max Planck Institute for Evolutionary Anthropology, Leipzig, 04103, Germany
| |
Collapse
|
7
|
Sanaullah A, Zhi D, Zhang S. d-PBWT: dynamic positional Burrows-Wheeler transform. Bioinformatics 2021; 37:2390-2397. [PMID: 33624749 DOI: 10.1093/bioinformatics/btab117] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2020] [Revised: 01/31/2021] [Accepted: 02/23/2021] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Durbin's positional Burrows-Wheeler transform (PBWT) is a scalable data structure for haplotype matching. It has been successfully applied to identical by descent (IBD) segment identification and genotype imputation. Once the PBWT of a haplotype panel is constructed, it supports efficient retrieval of all shared long segments among all individuals (long matches) and efficient query between an external haplotype and the panel. However, the standard PBWT is an array-based static data structure and does not support dynamic updates of the panel. RESULTS Here, we generalize the static PBWT to a dynamic data structure, d-PBWT, where the reverse prefix sorting at each position is stored with linked lists.We also developed efficient algorithms for insertion and deletion of individual haplotypes. In addition, we verified that d-PBWT can support all algorithms of PBWT. In doing so, we systematically investigated variations of set maximal match and long match query algorithms: while they all have average case time complexity independent of database size, they have different worst case complexities and dependencies on additional data structures. AVAILABILITY The benchmarking code is available at genome.ucf.edu/d-PBWT. SUPPLEMENTARY INFORMATION Supplementary Materials are available at Bioinformatics online.
Collapse
Affiliation(s)
- Ahsan Sanaullah
- Department of Computer Science, University of Central Florida, Orlando, FL, USA
| | - Degui Zhi
- Center for Precision Health, School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, TX 77030, USA
| | - Shaojie Zhang
- Department of Computer Science, University of Central Florida, Orlando, FL, USA
| |
Collapse
|
8
|
Alanko J, Bannai H, Cazaux B, Peterlongo P, Stoye J. Finding all maximal perfect haplotype blocks in linear time. Algorithms Mol Biol 2020; 15:2. [PMID: 32055252 PMCID: PMC7008532 DOI: 10.1186/s13015-020-0163-6] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2019] [Accepted: 01/28/2020] [Indexed: 11/10/2022] Open
Abstract
Recent large-scale community sequencing efforts allow at an unprecedented level of detail the identification of genomic regions that show signatures of natural selection. Traditional methods for identifying such regions from individuals' haplotype data, however, require excessive computing times and therefore are not applicable to current datasets. In 2019, Cunha et al. (Advances in bioinformatics and computational biology: 11th Brazilian symposium on bioinformatics, BSB 2018, Niterói, Brazil, October 30 - November 1, 2018, Proceedings, 2018. 10.1007/978-3-030-01722-4_3) suggested the maximal perfect haplotype block as a very simple combinatorial pattern, forming the basis of a new method to perform rapid genome-wide selection scans. The algorithm they presented for identifying these blocks, however, had a worst-case running time quadratic in the genome length. It was posed as an open problem whether an optimal, linear-time algorithm exists. In this paper we give two algorithms that achieve this time bound, one conceptually very simple one using suffix trees and a second one using the positional Burrows-Wheeler Transform, that is very efficient also in practice.
Collapse
Affiliation(s)
- Jarno Alanko
- Department of Computer Science, University of Helsinki, Helsinki, Finland
| | - Hideo Bannai
- Department of Informatics, Kyushu University, Fukuoka, Japan
| | - Bastien Cazaux
- Department of Computer Science, University of Helsinki, Helsinki, Finland
| | | | - Jens Stoye
- Faculty of Technology and Center for Biotechnology (CeBiTec), Bielefeld University, Bielefeld, Germany
| |
Collapse
|
9
|
Inferring whole-genome histories in large population datasets. Nat Genet 2019; 51:1330-1338. [PMID: 31477934 PMCID: PMC6726478 DOI: 10.1038/s41588-019-0483-y] [Citation(s) in RCA: 113] [Impact Index Per Article: 22.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2019] [Accepted: 07/15/2019] [Indexed: 01/01/2023]
Abstract
Inferring the full genealogical history of a set of DNA sequences is a core problem in evolutionary biology, because this history encodes information about the events and forces that have influenced a species. However, current methods are limited, and the most accurate techniques are able to process no more than a hundred samples. As datasets that consist of millions of genomes are now being collected, there is a need for scalable and efficient inference methods to fully utilize these resources. Here we introduce an algorithm that is able to not only infer whole-genome histories with comparable accuracy to the state-of-the-art but also process four orders of magnitude more sequences. The approach also provides an 'evolutionary encoding' of the data, enabling efficient calculation of relevant statistics. We apply the method to human data from the 1000 Genomes Project, Simons Genome Diversity Project and UK Biobank, showing that the inferred genealogies are rich in biological signal and efficient to process.
Collapse
|
10
|
Novak AM, Garrison E, Paten B. A graph extension of the positional Burrows-Wheeler transform and its applications. Algorithms Mol Biol 2017; 12:18. [PMID: 28702075 PMCID: PMC5505026 DOI: 10.1186/s13015-017-0109-9] [Citation(s) in RCA: 28] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2016] [Accepted: 06/17/2017] [Indexed: 01/23/2023] Open
Abstract
We present a generalization of the positional Burrows-Wheeler transform, or PBWT, to genome graphs, which we call the gPBWT. A genome graph is a collapsed representation of a set of genomes described as a graph. In a genome graph, a haplotype corresponds to a restricted form of walk. The gPBWT is a compressible representation of a set of these graph-encoded haplotypes that allows for efficient subhaplotype match queries. We give efficient algorithms for gPBWT construction and query operations. As a demonstration, we use the gPBWT to quickly count the number of haplotypes consistent with random walks in a genome graph, and with the paths taken by mapped reads; results suggest that haplotype consistency information can be practically incorporated into graph-based read mappers. We estimate that with the gPBWT of the order of 100,000 diploid genomes, including all forms structural variation, could be stored and made searchable for haplotype queries using a single large compute node.
Collapse
Affiliation(s)
- Adam M. Novak
- Genomics Institute, University of California Santa Cruz, CBSE, 501C Engineering 2, MS: CBSE, 1156 High St., Santa Cruz, CA 95064 USA
| | - Erik Garrison
- Wellcome Trust Sanger Institute, Cambridge, CB10 1SA UK
| | - Benedict Paten
- Genomics Institute, University of California Santa Cruz, CBSE, 501C Engineering 2, MS: CBSE, 1156 High St., Santa Cruz, CA 95064 USA
| |
Collapse
|
11
|
Novembre J, Peter BM. Recent advances in the study of fine-scale population structure in humans. Curr Opin Genet Dev 2016; 41:98-105. [PMID: 27662060 DOI: 10.1016/j.gde.2016.08.007] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2016] [Revised: 08/18/2016] [Accepted: 08/24/2016] [Indexed: 01/17/2023]
Abstract
Empowered by modern genotyping and large samples, population structure can be accurately described and quantified even when it only explains a fraction of a percent of total genetic variance. This is especially relevant and interesting for humans, where fine-scale population structure can both confound disease-mapping studies and reveal the history of migration and divergence that shaped our species' diversity. Here we review notable recent advances in the detection, use, and understanding of population structure. Our work addresses multiple areas where substantial progress is being made: improved statistics and models for better capturing differentiation, admixture, and the spatial distribution of variation; computational speed-ups that allow methods to scale to modern data; and advances in haplotypic modeling that have wide ranging consequences for the analysis of population structure. We conclude by outlining four important open challenges: the limitations of discrete population models, uncertainty in individual origins, the incorporation of both fine-scale structure and ancient DNA in parametric models, and the development of efficient computational tools, particularly for haplotype-based methods.
Collapse
Affiliation(s)
- John Novembre
- Department of Human Genetics, University of Chicago, IL 60636, United States; Department of Ecology and Evolutionary Biology, University of Chicago, IL 60636, United States
| | - Benjamin M Peter
- Department of Human Genetics, University of Chicago, IL 60636, United States
| |
Collapse
|