51
|
Matsunami M, Koganebuchi K, Imamura M, Ishida H, Kimura R, Maeda S. Fine-Scale Genetic Structure and Demographic History in the Miyako Islands of the Ryukyu Archipelago. Mol Biol Evol 2021; 38:2045-2056. [PMID: 33432348 PMCID: PMC8097307 DOI: 10.1093/molbev/msab005] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open
Abstract
The Ryukyu Archipelago is located in the southwest of the Japanese islands and is composed of dozens of islands, grouped into the Miyako Islands, Yaeyama Islands, and Okinawa Islands. Based on the results of principal component analysis on genome-wide single-nucleotide polymorphisms, genetic differentiation was observed among the island groups of the Ryukyu Archipelago. However, a detailed population structure analysis of the Ryukyu Archipelago has not yet been completed. We obtained genomic DNA samples from 1,240 individuals living in the Miyako Islands, and we genotyped 665,326 single-nucleotide polymorphisms to infer population history within the Miyako Islands, including Miyakojima, Irabu, and Ikema islands. The haplotype-based analysis showed that populations in the Miyako Islands were divided into three subpopulations located on Miyakojima northeast, Miyakojima southwest, and Irabu/Ikema. The results of haplotype sharing and the D statistics analyses showed that the Irabu/Ikema subpopulation received gene flows different from those of the Miyakojima subpopulations, which may be related with the historically attested immigration during the Gusuku period (900 − 500 BP). A coalescent-based demographic inference suggests that the Irabu/Ikema population firstly split away from the ancestral Ryukyu population about 41 generations ago, followed by a split of the Miyako southwest population from the ancestral Ryukyu population (about 16 generations ago), and the differentiation of the ancestral Ryukyu population into two populations (Miyako northeast and Okinawajima populations) about seven generations ago. Such genetic information is useful for explaining the population history of modern Miyako people and must be taken into account when performing disease association studies.
Collapse
Affiliation(s)
- Masatoshi Matsunami
- Department of Advanced Genomic and Laboratory Medicine, Graduate School of Medicine, University of the Ryukyus, Nishihara-Cho, Japan
| | - Kae Koganebuchi
- Advanced Medical Research Center, Faculty of Medicine, University of the Ryukyus, Nishihara-Cho, Japan
| | - Minako Imamura
- Department of Advanced Genomic and Laboratory Medicine, Graduate School of Medicine, University of the Ryukyus, Nishihara-Cho, Japan.,Division of Clinical Laboratory and Blood Transfusion, University of the Ryukyus Hospital, Nishihara-Cho, Japan
| | - Hajime Ishida
- Department of Human Biology and Anatomy, Graduate School of Medicine, University of the Ryukyus, Nishihara-Cho, Japan
| | - Ryosuke Kimura
- Department of Human Biology and Anatomy, Graduate School of Medicine, University of the Ryukyus, Nishihara-Cho, Japan
| | - Shiro Maeda
- Department of Advanced Genomic and Laboratory Medicine, Graduate School of Medicine, University of the Ryukyus, Nishihara-Cho, Japan.,Division of Clinical Laboratory and Blood Transfusion, University of the Ryukyus Hospital, Nishihara-Cho, Japan
| |
Collapse
|
52
|
Fang L, Zhao T, Hu Y, Si Z, Zhu X, Han Z, Liu G, Wang S, Ju L, Guo M, Mei H, Wang L, Qi B, Wang H, Guan X, Zhang T. Divergent improvement of two cultivated allotetraploid cotton species. PLANT BIOTECHNOLOGY JOURNAL 2021; 19:1325-1336. [PMID: 33448110 PMCID: PMC8313128 DOI: 10.1111/pbi.13547] [Citation(s) in RCA: 26] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/03/2020] [Revised: 12/24/2020] [Accepted: 01/03/2021] [Indexed: 05/21/2023]
Abstract
Interspecific genomic variation can provide a genetic basis for local adaptation and domestication. A series of studies have presented its role of interspecific haplotypes and introgressions in adaptive traits, but few studies have addressed their role in improving agronomic character. Two allotetraploid Gossypium species, Gossypium barbadense (Gb) and G. hirsutum (Gh) originating from the Americas, are cultivated independently. Here, through sequencing and the comparison of one GWAS panel in 229 Gb accessions and two GWAS panels in 491 Gh accessions, we found that most associated loci or functional haplotypes for agronomic traits were highly divergent, representing the strong divergent improvement between Gb and Gh. Using a comprehensive interspecific haplotype map, we revealed that six interspecific introgressions from Gh to Gb were significantly associated with the phenotypic performance of Gb, which could explain 5%-40% of phenotypic variation in yield and fibre qualities. In addition, three introgressions overlapped with six associated loci in Gb, indicating that these introgression regions were under further selection and stabilized during improvement. A single interspecific introgression often possessed yield-increasing potential but decreased fibre qualities, or the opposite, making it difficult to simultaneously improve yield and fibre qualities. Our study not only has proved the importance of interspecific functional haplotypes or introgressions in the divergent improvement of Gb and Gh, but also supports their potential value in further human-mediated hybridization or precision breeding.
Collapse
Affiliation(s)
- Lei Fang
- Zhejiang Provincial Key Laboratory of Crop Genetic ResourcesInstitute of Crop SciencePlant Precision Breeding AcademyCollege of Agriculture and BiotechnologyZhejiang UniversityHangzhouChina
| | - Ting Zhao
- Zhejiang Provincial Key Laboratory of Crop Genetic ResourcesInstitute of Crop SciencePlant Precision Breeding AcademyCollege of Agriculture and BiotechnologyZhejiang UniversityHangzhouChina
| | - Yan Hu
- Zhejiang Provincial Key Laboratory of Crop Genetic ResourcesInstitute of Crop SciencePlant Precision Breeding AcademyCollege of Agriculture and BiotechnologyZhejiang UniversityHangzhouChina
| | - Zhanfeng Si
- Zhejiang Provincial Key Laboratory of Crop Genetic ResourcesInstitute of Crop SciencePlant Precision Breeding AcademyCollege of Agriculture and BiotechnologyZhejiang UniversityHangzhouChina
| | - Xiefei Zhu
- State Key Laboratory of Crop Genetics and Germplasm EnhancementNanjing Agricultural UniversityNanjingChina
| | - Zegang Han
- Zhejiang Provincial Key Laboratory of Crop Genetic ResourcesInstitute of Crop SciencePlant Precision Breeding AcademyCollege of Agriculture and BiotechnologyZhejiang UniversityHangzhouChina
| | - Guizhen Liu
- State Key Laboratory of Crop Genetics and Germplasm EnhancementNanjing Agricultural UniversityNanjingChina
- Henan Province Seed StationZhengzhouChina
| | - Sen Wang
- State Key Laboratory of Crop Genetics and Germplasm EnhancementNanjing Agricultural UniversityNanjingChina
- Institute of Food CropsJiangsu Academy of Agricultural SciencesNanjingChina
| | - Longzhen Ju
- State Key Laboratory of Crop Genetics and Germplasm EnhancementNanjing Agricultural UniversityNanjingChina
| | - Menglan Guo
- Zhejiang Provincial Key Laboratory of Crop Genetic ResourcesInstitute of Crop SciencePlant Precision Breeding AcademyCollege of Agriculture and BiotechnologyZhejiang UniversityHangzhouChina
| | - Huan Mei
- Zhejiang Provincial Key Laboratory of Crop Genetic ResourcesInstitute of Crop SciencePlant Precision Breeding AcademyCollege of Agriculture and BiotechnologyZhejiang UniversityHangzhouChina
| | - Luyao Wang
- Zhejiang Provincial Key Laboratory of Crop Genetic ResourcesInstitute of Crop SciencePlant Precision Breeding AcademyCollege of Agriculture and BiotechnologyZhejiang UniversityHangzhouChina
| | - Bowen Qi
- Zhejiang Provincial Key Laboratory of Crop Genetic ResourcesInstitute of Crop SciencePlant Precision Breeding AcademyCollege of Agriculture and BiotechnologyZhejiang UniversityHangzhouChina
| | - Heng Wang
- Zhejiang Provincial Key Laboratory of Crop Genetic ResourcesInstitute of Crop SciencePlant Precision Breeding AcademyCollege of Agriculture and BiotechnologyZhejiang UniversityHangzhouChina
| | - Xueying Guan
- Zhejiang Provincial Key Laboratory of Crop Genetic ResourcesInstitute of Crop SciencePlant Precision Breeding AcademyCollege of Agriculture and BiotechnologyZhejiang UniversityHangzhouChina
| | - Tianzhen Zhang
- Zhejiang Provincial Key Laboratory of Crop Genetic ResourcesInstitute of Crop SciencePlant Precision Breeding AcademyCollege of Agriculture and BiotechnologyZhejiang UniversityHangzhouChina
| |
Collapse
|
53
|
Rapid detection of identity-by-descent tracts for mega-scale datasets. Nat Commun 2021; 12:3546. [PMID: 34112768 PMCID: PMC8192555 DOI: 10.1038/s41467-021-22910-w] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2019] [Accepted: 04/01/2021] [Indexed: 01/08/2023] Open
Abstract
The ability to identify segments of genomes identical-by-descent (IBD) is a part of standard workflows in both statistical and population genetics. However, traditional methods for finding local IBD across all pairs of individuals scale poorly leading to a lack of adoption in very large-scale datasets. Here, we present iLASH, an algorithm based on similarity detection techniques that shows equal or improved accuracy in simulations compared to current leading methods and speeds up analysis by several orders of magnitude on genomic datasets, making IBD estimation tractable for millions of individuals. We apply iLASH to the PAGE dataset of ~52,000 multi-ethnic participants, including several founder populations with elevated IBD sharing, identifying IBD segments in ~3 minutes per chromosome compared to over 6 days for a state-of-the-art algorithm. iLASH enables efficient analysis of very large-scale datasets, as we demonstrate by computing IBD across the UK Biobank (~500,000 individuals), detecting 12.9 billion pairwise connections.
Collapse
|
54
|
Carress H, Lawson DJ, Elhaik E. Population genetic considerations for using biobanks as international resources in the pandemic era and beyond. BMC Genomics 2021; 22:351. [PMID: 34001009 PMCID: PMC8127217 DOI: 10.1186/s12864-021-07618-x] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2020] [Accepted: 04/14/2021] [Indexed: 12/11/2022] Open
Abstract
The past years have seen the rise of genomic biobanks and mega-scale meta-analysis of genomic data, which promises to reveal the genetic underpinnings of health and disease. However, the over-representation of Europeans in genomic studies not only limits the global understanding of disease risk but also inhibits viable research into the genomic differences between carriers and patients. Whilst the community has agreed that more diverse samples are required, it is not enough to blindly increase diversity; the diversity must be quantified, compared and annotated to lead to insight. Genetic annotations from separate biobanks need to be comparable and computable and to operate without access to raw data due to privacy concerns. Comparability is key both for regular research and to allow international comparison in response to pandemics. Here, we evaluate the appropriateness of the most common genomic tools used to depict population structure in a standardized and comparable manner. The end goal is to reduce the effects of confounding and learn from genuine variation in genetic effects on phenotypes across populations, which will improve the value of biobanks (locally and internationally), increase the accuracy of association analyses and inform developmental efforts.
Collapse
Affiliation(s)
- Hannah Carress
- Department of Animal and Plant Sciences, University of Sheffield, Sheffield, UK
| | - Daniel John Lawson
- School of Mathematics and Integrative Epidemiology Unit, University of Bristol, Bristol, UK
| | - Eran Elhaik
- Department of Animal and Plant Sciences, University of Sheffield, Sheffield, UK. .,Department of Biology, Lund University, Lund, Sweden.
| |
Collapse
|
55
|
Strom SP, Hossain WA, Grigorian M, Li M, Fierro J, Scaringe W, Yen HY, Teguh M, Liu J, Gao H, Butler MG. A Streamlined Approach to Prader-Willi and Angelman Syndrome Molecular Diagnostics. Front Genet 2021; 12:608889. [PMID: 34046054 PMCID: PMC8148043 DOI: 10.3389/fgene.2021.608889] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2020] [Accepted: 03/23/2021] [Indexed: 11/13/2022] Open
Abstract
Establishing or ruling out a molecular diagnosis of Prader–Willi or Angelman syndrome (PWS/AS) presents unique challenges due to the variety of different genetic alterations that can lead to these conditions. Point mutations, copy number changes, uniparental isodisomy (i-UPD) 15 of two subclasses (segmental or total isodisomy), uniparental heterodisomy (h-UPD), and defects in the chromosome 15 imprinting center can all cause PWS/AS. Here, we outline a combined approach using whole-exome sequencing (WES) and DNA methylation data with methylation-sensitive multiplex ligation-dependent probe amplification (MLPA) to establish both the disease diagnosis and the mechanism of disease with high sensitivity using current standard of care technology and improved efficiency compared to serial methods. The authors encourage the use of this approach in the clinical setting to confirm and establish the diagnosis and genetic defect which may account for the secondary genetic conditions that may be seen in those with isodisomy 15, impacting surveillance and counseling with more accurate recurrence risks. Other similarly affected individuals due to other gene disorders or cytogenetic anomalies such as Rett syndrome or microdeletions would also be identified with this streamlined approach.
Collapse
Affiliation(s)
| | - Waheeda A Hossain
- Department of Psychiatry and Behavioral Sciences and Pediatrics, University of Kansas Medical Center, Kansas City, KS, United States
| | | | - Mickey Li
- Fulgent Genetics, Temple City, CA, United States
| | | | | | - Hai-Yun Yen
- Fulgent Genetics, Temple City, CA, United States
| | | | - Joanna Liu
- Fulgent Genetics, Temple City, CA, United States
| | - Harry Gao
- Fulgent Genetics, Temple City, CA, United States
| | - Merlin G Butler
- Department of Psychiatry and Behavioral Sciences and Pediatrics, University of Kansas Medical Center, Kansas City, KS, United States
| |
Collapse
|
56
|
Bulik CM, Thornton LM, Parker R, Kennedy H, Baker JH, MacDermod C, Guintivano J, Cleland L, Miller AL, Harper L, Larsen JT, Yilmaz Z, Grove J, Sullivan PF, Petersen LV, Jordan J, Kennedy MA, Martin NG. The Eating Disorders Genetics Initiative (EDGI): study protocol. BMC Psychiatry 2021; 21:234. [PMID: 33947359 PMCID: PMC8097919 DOI: 10.1186/s12888-021-03212-3] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/17/2021] [Accepted: 04/13/2021] [Indexed: 12/04/2022] Open
Abstract
BACKGROUND The Eating Disorders Genetics Initiative (EDGI) is an international investigation exploring the role of genes and environment in anorexia nervosa, bulimia nervosa, and binge-eating disorder. METHODS A total of 14,500 individuals with eating disorders and 1500 controls will be included from the United States (US), Australia (AU), New Zealand (NZ), and Denmark (DK). In the US, AU, and NZ, participants will complete comprehensive online phenotyping and will submit a saliva sample for genotyping. In DK, individuals with eating disorders will be identified by the National Patient Register, and genotyping will occur using bloodspots archived from birth. A genome-wide association study will be conducted within EDGI and via meta-analysis with other data from the Eating Disorders Working Group of the Psychiatric Genomics Consortium (PGC-ED). DISCUSSION EDGI represents the largest genetic study of eating disorders ever to be conducted and is designed to rapidly advance the study of the genetics of the three major eating disorders (anorexia nervosa, bulimia nervosa, and binge-eating disorder). We will explicate the genetic architecture of eating disorders relative to each other and to other psychiatric and metabolic disorders and traits. Our goal is for EDGI to deliver "actionable" findings that can be transformed into clinically meaningful insights. TRIAL REGISTRATION EDGI is a registered clinical trial: clinicaltrials.gov NCT04378101 .
Collapse
Affiliation(s)
- Cynthia M. Bulik
- Department of Psychiatry, University of North Carolina at Chapel Hill, CB #7160, 101 Manning Drive, Chapel Hill, NC 27599-7160 USA
- Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, PO Box 281, SE-171 77 Stockholm, Sweden
- Department of Nutrition, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599 USA
| | - Laura M. Thornton
- Department of Psychiatry, University of North Carolina at Chapel Hill, CB #7160, 101 Manning Drive, Chapel Hill, NC 27599-7160 USA
| | - Richard Parker
- QIMR Berghofer Medical Research Institute, Locked Bag 2000, Royal Brisbane Hospital, Herston, QLD 4029 Australia
| | - Hannah Kennedy
- Department of Psychological Medicine, University of Otago, Christchurch, New Zealand
| | - Jessica H. Baker
- Department of Psychiatry, University of North Carolina at Chapel Hill, CB #7160, 101 Manning Drive, Chapel Hill, NC 27599-7160 USA
| | - Casey MacDermod
- Department of Psychiatry, University of North Carolina at Chapel Hill, CB #7160, 101 Manning Drive, Chapel Hill, NC 27599-7160 USA
| | - Jerry Guintivano
- Department of Psychiatry, University of North Carolina at Chapel Hill, CB #7160, 101 Manning Drive, Chapel Hill, NC 27599-7160 USA
| | - Lana Cleland
- Department of Psychological Medicine, University of Otago, Christchurch, New Zealand
| | - Allison L. Miller
- Department of Pathology and Biomedical Science, University of Otago, Christchurch, New Zealand
| | - Lauren Harper
- Department of Psychiatry, University of North Carolina at Chapel Hill, CB #7160, 101 Manning Drive, Chapel Hill, NC 27599-7160 USA
| | - Janne T. Larsen
- National Centre for Register-based Research, Aarhus BSS, Aarhus University, Aarhus, Denmark
- The Lundbeck Foundation Initiative for Integrative Psychiatric Research, iPSYCH, Aarhus, Denmark
| | - Zeynep Yilmaz
- National Centre for Register-based Research, Aarhus BSS, Aarhus University, Aarhus, Denmark
- The Lundbeck Foundation Initiative for Integrative Psychiatric Research, iPSYCH, Aarhus, Denmark
| | - Jakob Grove
- The Lundbeck Foundation Initiative for Integrative Psychiatric Research, iPSYCH, Aarhus, Denmark
- Department of Biomedicine, Aarhus University, Aarhus, Denmark
- Center for Genomics and Personalized Medicine, CGPM, and Center for Integrative Sequencing, iSEQ, Aarhus, Denmark
- Bioinformatics Research Centre, Aarhus University, Aarhus, Denmark
| | - Patrick F. Sullivan
- Department of Psychiatry, University of North Carolina at Chapel Hill, CB #7160, 101 Manning Drive, Chapel Hill, NC 27599-7160 USA
- Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, PO Box 281, SE-171 77 Stockholm, Sweden
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599 USA
| | - Liselotte V. Petersen
- National Centre for Register-based Research, Aarhus BSS, Aarhus University, Aarhus, Denmark
- The Lundbeck Foundation Initiative for Integrative Psychiatric Research, iPSYCH, Aarhus, Denmark
| | - Jennifer Jordan
- Department of Psychological Medicine, University of Otago, Christchurch, New Zealand
| | - Martin A. Kennedy
- Department of Pathology and Biomedical Science, University of Otago, Christchurch, New Zealand
| | - Nicholas G. Martin
- QIMR Berghofer Medical Research Institute, Locked Bag 2000, Royal Brisbane Hospital, Herston, QLD 4029 Australia
| |
Collapse
|
57
|
Verzegnazzi AL, Dos Santos IG, Krause MD, Hufford M, Frei UK, Campbell J, Almeida VC, Zuffo LT, Boerman N, Lübberstedt T. Major locus for spontaneous haploid genome doubling detected by a case-control GWAS in exotic maize germplasm. TAG. THEORETICAL AND APPLIED GENETICS. THEORETISCHE UND ANGEWANDTE GENETIK 2021; 134:1423-1434. [PMID: 33543310 DOI: 10.1007/s00122-021-03780-8] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/07/2020] [Accepted: 01/19/2021] [Indexed: 06/12/2023]
Abstract
A major locus for spontaneous haploid genome doubling was detected by a case-control GWAS in an exotic maize germplasm. The combination of double haploid breeding method with this locus leads to segregation distortion on genomic regions of chromosome five. Temperate maize (Zea mays L.) breeding programs often rely on limited genetic diversity, which can be expanded by incorporating exotic germplasm. The aims of this study were to perform characterization of inbred lines derived from the tropical BS39 population using different breeding methods, to identify genomic regions showing segregation distortion in lines derived by the DH process using spontaneous haploid genome doubling (SHGD), and use case-control association mapping to identify loci controlling SHGD. Four different sets were used: BS39_DH and BS39_SSD were derived from the BS39 population by DH and single-seed descendent (SSD) methods, and BS39 × A427_DH and BS39 × A427_SSD from the cross between BS39 and A427. A total of 663 inbred lines were genotyped. The analyses of gene diversity and genetic differentiation for the DH sets provided evidence of the presence of a SHGD locus near the centromere of chromosome 5. The case-control GWAS for the DH set also pinpointed this locus. Haplotype sharing analysis showed almost 100% exclusive contribution of the A427 genome in the same region on chromosome 5 of BS39 × A427_DH, presumably due to an allele in this region affecting SHGD. This locus enables DH line production in exotic populations without colchicine or other artificial haploid genome doubling.
Collapse
Affiliation(s)
| | | | | | - Matthew Hufford
- Department of Ecology, Evolution, and Organismal Biology, Iowa State University, Ames, IA, USA
| | | | | | - Vinícius Costa Almeida
- Department of General Biology, Federal University of Viçosa, Viçosa, Minas Gerais, Brazil
| | - Leandro Tonello Zuffo
- Department of Plant Sciences, Federal University of Viçosa, Viçosa, Minas Gerais, Brazil
| | | | | |
Collapse
|
58
|
Mosaad YM, Hammad A, AlHarrass MF, Sallam R, Shouma A, Hammad E, Ahmed EO, Abdel-Azeem HA, Sherif D, Fawzy I, Elbahnasawy A, Abdel Twab H. ARID5B rs10821936 and rs10994982 gene polymorphism and susceptibility to juvenile systemic lupus erythematosus and lupus nephritis. Lupus 2021; 30:1226-1232. [PMID: 33888010 DOI: 10.1177/09612033211010338] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
BACKGROUND The prevalence of SLE and the spectrum of clinical manifestations vary widely in different races and geographical populations. OBJECTIVE To investigate the possible role of ARID5B rs10821936 and rs10994982 polymorphism as a risk factor for the development of SLE in children (jSLE) and to evaluate their role in relation to clinical manifestations especially lupus nephritis (LN). METHODS DNA extraction and Real-time PCR genotyping of ARID5B rs10821936 and rs10994982 were done for 104 jSLE and 282 healthy controls. RESULTS The C allele and C containing genotypes (CC, CT and CC+CT) of ARID5B rs10821936 were higher in children with SLE (p = 0.009, OR = 1.56, 0.037, OR = 2.35, 0.016, OR = 1.81 and 0.008 OR = 1.88 respectively). ARID5B rs10994982 alleles, genotypes and haplotypes are not associated with jSLE (p > 0.05). The ARID5B rs10821936 and rs10994982 genotypes showed non-significant associations with LN, proliferative versus non proliferative and biopsy grades (p > 0.05). CONCLUSION ARID5B rs10821936 SNP may be a susceptibility risk factor for juvenile SLE in the studied cohort of Egyptian children.
Collapse
Affiliation(s)
- Youssef M Mosaad
- Clinical Immunology Unit, Clinical Pathology Department, Faculty of Medicine, Mansoura University, Mansoura, Egypt
| | - Ayman Hammad
- Pediatric Nephrology Unit, Department of Pediatrics, Faculty of Medicine, Mansoura University, Mansoura, Egypt
| | - Mohamed F AlHarrass
- Clinical Immunology Unit, Clinical Pathology Department, Faculty of Medicine, Mansoura University, Mansoura, Egypt
| | - Rehab Sallam
- Rheumatology and Rehabilitation Department, Faculty of Medicine, Mansoura University, Mansoura, Egypt
| | - Amany Shouma
- Pediatric Cardiology Unit, Department of Pediatrics, Faculty of Medicine, Mansoura University, Mansoura, Egypt
| | - Enas Hammad
- Rheumatology and Rehabilitation Department, Faculty of Medicine, Mansoura University, Mansoura, Egypt
| | - Engy Osman Ahmed
- Pediatric Pulmonology and Allergy Unit, Department of Pediatrics, Faculty of Medicine, Mansoura University, Mansoura, Egypt
| | - Heba A Abdel-Azeem
- Dermatology, Andrology & STDs, Faculty of Medicine, Mansoura University, Mansoura, Egypt
| | - Doaa Sherif
- Microbiology Department, Faculty of Medicine, Mansoura University, Mansoura, Egypt
| | - Iman Fawzy
- Laboratory Medicine Department, Mansoura Fever Hospital, Ministry of Health, Mansoura, Egypt
| | - Amany Elbahnasawy
- Rheumatology and Rehabilitation Department, Faculty of Medicine, Mansoura University, Mansoura, Egypt
| | - Hosam Abdel Twab
- Clinical Immunology Unit, Clinical Pathology Department, Faculty of Medicine, Mansoura University, Mansoura, Egypt
| |
Collapse
|
59
|
Belbin GM, Cullina S, Wenric S, Soper ER, Glicksberg BS, Torre D, Moscati A, Wojcik GL, Shemirani R, Beckmann ND, Cohain A, Sorokin EP, Park DS, Ambite JL, Ellis S, Auton A, Bottinger EP, Cho JH, Loos RJF, Abul-Husn NS, Zaitlen NA, Gignoux CR, Kenny EE. Toward a fine-scale population health monitoring system. Cell 2021; 184:2068-2083.e11. [PMID: 33861964 DOI: 10.1016/j.cell.2021.03.034] [Citation(s) in RCA: 69] [Impact Index Per Article: 23.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2019] [Revised: 11/18/2020] [Accepted: 03/12/2021] [Indexed: 12/22/2022]
Abstract
Understanding population health disparities is an essential component of equitable precision health efforts. Epidemiology research often relies on definitions of race and ethnicity, but these population labels may not adequately capture disease burdens and environmental factors impacting specific sub-populations. Here, we propose a framework for repurposing data from electronic health records (EHRs) in concert with genomic data to explore the demographic ties that can impact disease burdens. Using data from a diverse biobank in New York City, we identified 17 communities sharing recent genetic ancestry. We observed 1,177 health outcomes that were statistically associated with a specific group and demonstrated significant differences in the segregation of genetic variants contributing to Mendelian diseases. We also demonstrated that fine-scale population structure can impact the prediction of complex disease risk within groups. This work reinforces the utility of linking genomic data to EHRs and provides a framework toward fine-scale monitoring of population health.
Collapse
Affiliation(s)
- Gillian M Belbin
- Institute for Genomic Health, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA; Department of Medicine, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA; The Charles Bronfman Institute of Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Sinead Cullina
- Institute for Genomic Health, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA; The Charles Bronfman Institute of Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Stephane Wenric
- Institute for Genomic Health, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA; The Charles Bronfman Institute of Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Emily R Soper
- Institute for Genomic Health, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Benjamin S Glicksberg
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA; Hasso Plattner Institute for Digital Health at Mount Sinai, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Denis Torre
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Arden Moscati
- The Charles Bronfman Institute of Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Genevieve L Wojcik
- Department of Biomedical Data Science, Stanford University, Stanford, CA 94305, USA
| | - Ruhollah Shemirani
- Information Science Institute, University of Southern California, Marina del Rey, CA 90089, USA
| | - Noam D Beckmann
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Ariella Cohain
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Elena P Sorokin
- Department of Biomedical Data Science, Stanford University, Stanford, CA 94305, USA
| | - Danny S Park
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA; Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, San Francisco, CA 94158, USA
| | - Jose-Luis Ambite
- Information Science Institute, University of Southern California, Marina del Rey, CA 90089, USA
| | - Steve Ellis
- The Charles Bronfman Institute of Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Adam Auton
- Department of Genetics, Albert Einstein College of Medicine, New York, NY 10461, USA
| | - Erwin P Bottinger
- Hasso Plattner Institute for Digital Health at Mount Sinai, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Judy H Cho
- The Charles Bronfman Institute of Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Ruth J F Loos
- Department of Medicine, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA; The Charles Bronfman Institute of Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Noura S Abul-Husn
- Institute for Genomic Health, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA; Department of Medicine, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA; Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Noah A Zaitlen
- Department of Neurology, University of California, Los Angeles, Los Angeles, CA 90033, USA
| | - Christopher R Gignoux
- Colorado Center for Personalized Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| | - Eimear E Kenny
- Institute for Genomic Health, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA; Department of Medicine, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA; Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA.
| |
Collapse
|
60
|
Sapin E, Keller MC. Novel Approach for Parallelizing Pairwise Comparison Problems as Applied to Detecting Segments Identical By Decent in Whole-Genome Data. Bioinformatics 2021; 37:2121-2125. [PMID: 33705528 PMCID: PMC8352502 DOI: 10.1093/bioinformatics/btab084] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2020] [Revised: 11/09/2020] [Accepted: 03/09/2021] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Pairwise comparison problems arise in many areas of science. In genomics, datasets are already large and getting larger, and so operations that require pairwise comparisons-either on pairs of SNPs or pairs of individuals-are extremely computationally challenging. We propose a generic algorithm for addressing pairwise comparison problems that breaks a large problem (of order n2 comparisons) into multiple smaller ones (each of order n comparisons), allowing for massive parallelization. RESULTS We demonstrated that this approach is very efficient for calling identical by descent (IBD) segments between all pairs of individuals in the UK Biobank dataset, with a 250-fold savings in time and 750-fold savings in memory over the standard approach to detecting such segments across the full dataset. This efficiency should extend to other methods of IBD calling and, more generally, to other pairwise comparison tasks in genomics or other areas of science.
Collapse
Affiliation(s)
- Emmanuel Sapin
- Institute for Behavioral Genetics, University of Colorado Boulder, Boulder, CO, USA
| | - Matthew C Keller
- Institute for Behavioral Genetics, University of Colorado Boulder, Boulder, CO, USA
| |
Collapse
|
61
|
Finke K, Kourakos M, Brown G, Dang HT, Tan SJS, Simons YB, Ramdas S, Schäffer AA, Kember RL, Bućan M, Mathieson S. Ancestral haplotype reconstruction in endogamous populations using identity-by-descent. PLoS Comput Biol 2021; 17:e1008638. [PMID: 33635861 PMCID: PMC7946327 DOI: 10.1371/journal.pcbi.1008638] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2020] [Revised: 03/10/2021] [Accepted: 12/15/2020] [Indexed: 12/24/2022] Open
Abstract
In this work we develop a novel algorithm for reconstructing the genomes of ancestral individuals, given genotype or sequence data from contemporary individuals and an extended pedigree of family relationships. A pedigree with complete genomes for every individual enables the study of allele frequency dynamics and haplotype diversity across generations, including deviations from neutrality such as transmission distortion. When studying heritable diseases, ancestral haplotypes can be used to augment genome-wide association studies and track disease inheritance patterns. The building blocks of our reconstruction algorithm are segments of Identity-By-Descent (IBD) shared between two or more genotyped individuals. The method alternates between identifying a source for each IBD segment and assembling IBD segments placed within each ancestral individual. Unlike previous approaches, our method is able to accommodate complex pedigree structures with hundreds of individuals genotyped at millions of SNPs. We apply our method to an Old Order Amish pedigree from Lancaster, Pennsylvania, whose founders came to North America from Europe during the early 18th century. The pedigree includes 1338 individuals from the past 12 generations, 394 with genotype data. The motivation for reconstruction is to understand the genetic basis of diseases segregating in the family through tracking haplotype transmission over time. Using our algorithm thread, we are able to reconstruct an average of 224 ancestral individuals per chromosome. For these ancestral individuals, on average we reconstruct 79% of their haplotypes. We also identify a region on chromosome 16 that is difficult to reconstruct—we find that this region harbors a short Amish-specific copy number variation and the gene HYDIN. thread was developed for endogamous populations, but can be applied to any extensive pedigree with the recent generations genotyped. We anticipate that this type of practical ancestral reconstruction will become more common and necessary to understand rare and complex heritable diseases in extended families. When analyzing complex heritable traits, genomic data from many generations of an extended family increases the amount of information available for statistical inference. However, typically only genomic data from the recent generations of a pedigree are available, as ancestral individuals are deceased. In this work we present an algorithm, called thread, for reconstructing the genomes of ancestral individuals, given a complex pedigree and genomic data from the recent generations. Previous approaches have not been able to accommodate large datasets (both in terms of sites and individuals), made simplifying assumptions about pedigree structure, or did not tie reconstructed sequences back to specific individuals. We apply thread to a complex Old Order Amish pedigree of 1338 individuals, 394 with genotype data.
Collapse
Affiliation(s)
- Kelly Finke
- Department of Computer Science, Swarthmore College, Swarthmore, Pennsylvania, United States of America
- Department of Biology, Swarthmore College, Swarthmore, Pennsylvania, United States of America
| | - Michael Kourakos
- Department of Computer Science, Swarthmore College, Swarthmore, Pennsylvania, United States of America
| | - Gabriela Brown
- Department of Computer Science, Swarthmore College, Swarthmore, Pennsylvania, United States of America
| | - Huyen Trang Dang
- Department of Computer Science, Bryn Mawr College, Bryn Mawr, Pennsylvania, United States of America
| | - Shi Jie Samuel Tan
- Department of Computer Science, Haverford College, Haverford, Pennsylvania, United States of America
| | - Yuval B. Simons
- Department of Genetics, Stanford University, Stanford, California, United States of America
| | - Shweta Ramdas
- Department of Genetics, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
| | - Alejandro A. Schäffer
- Cancer Data Science Laboratory, National Cancer Institute, National Institutes of Health, Bethesda, Maryland, United States of America
| | - Rachel L. Kember
- Department of Psychiatry, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
| | - Maja Bućan
- Department of Genetics, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
| | - Sara Mathieson
- Department of Computer Science, Haverford College, Haverford, Pennsylvania, United States of America
- * E-mail:
| |
Collapse
|
62
|
Jain A, Sharma D, Bajaj A, Gupta V, Scaria V. Founder variants and population genomes-Toward precision medicine. ADVANCES IN GENETICS 2021; 107:121-152. [PMID: 33641745 DOI: 10.1016/bs.adgen.2020.11.004] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]
Abstract
Human migration and community specific cultural practices have contributed to founder events and enrichment of the variants associated with genetic diseases. While many founder events in isolated populations have remained uncharacterized, the application of genomics in clinical settings as well as for population scale studies in the recent years have provided an unprecedented push towards identification of founder variants associated with human health and disease. The discovery and characterization of founder variants could have far reaching implications not only in understanding the history or genealogy of the disease, but also in implementing evidence based policies and genetic testing frameworks. This further enables precise diagnosis and prevention in an attempt towards precision medicine. This review provides an overview of founder variants along with methods and resources cataloging them. We have also discussed the public health implications and examples of prevalent disease associated founder variants in specific populations.
Collapse
Affiliation(s)
- Abhinav Jain
- CSIR-Institute of Genomics and Integrative Biology, New Delhi, India; Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, Uttar Pradesh, India
| | - Disha Sharma
- CSIR-Institute of Genomics and Integrative Biology, New Delhi, India
| | - Anjali Bajaj
- CSIR-Institute of Genomics and Integrative Biology, New Delhi, India; Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, Uttar Pradesh, India
| | - Vishu Gupta
- CSIR-Institute of Genomics and Integrative Biology, New Delhi, India; Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, Uttar Pradesh, India
| | - Vinod Scaria
- CSIR-Institute of Genomics and Integrative Biology, New Delhi, India; Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, Uttar Pradesh, India.
| |
Collapse
|
63
|
Naseri A, Tang K, Geng X, Shi J, Zhang J, Shakya P, Liu X, Zhang S, Zhi D. Personalized genealogical history of UK individuals inferred from biobank-scale IBD segments. BMC Biol 2021; 19:32. [PMID: 33593342 PMCID: PMC7888130 DOI: 10.1186/s12915-021-00964-y] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2020] [Accepted: 01/19/2021] [Indexed: 01/27/2023] Open
Abstract
BACKGROUND The genealogical histories of individuals within populations are of interest to studies aiming both to uncover detailed pedigree information and overall quantitative population demographic histories. However, the analysis of quantitative details of individual genealogical histories has faced challenges from incomplete available pedigree records and an absence of objective and quantitative details in pedigree information. Although complete pedigree information for most individuals is difficult to track beyond a few generations, it is possible to describe a person's genealogical history using their genetic relatives revealed by identity by descent (IBD) segments-long genomic segments shared by two individuals within a population, which are identical due to inheritance from common ancestors. When modern biobanks collect genotype information for a significant fraction of a population, dense genetic connections of a person can be traced using such IBD segments, offering opportunities to characterize individuals in the context of the underlying populations. Here, we conducted an individual-centric analysis of IBD segments among the UK Biobank participants that represent 0.7% of the UK population. RESULTS We made a high-quality call set of IBD segments over 5 cM among all 500,000 UK Biobank participants. On average, one UK individual shares IBD segments with 14,000 UK Biobank participants, which we refer to as "relatives." Using these segments, approximately 80% of a person's genome can be imputed. We subsequently propose genealogical descriptors based on the genetic connections of relative cohorts of individuals sharing at least one IBD segment and show that such descriptors offer important information about one's genetic makeup, personal genealogical history, and social behavior. Through analysis of relative counts sharing segments at different lengths, we identified a group, potentially British Jews, who has a distinct pattern of familial expansion history. Finally, using the enrichment of relatives in one's neighborhood, we identified regional variations of personal preference favoring living closer to one's extended families. CONCLUSIONS Our analysis revealed genetic makeup, personal genealogical history, and social behaviors at the population scale, opening possibilities for further studies of individual's genetic connections in biobank data.
Collapse
Affiliation(s)
- Ardalan Naseri
- School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, 77030, USA
| | - Kecong Tang
- Department of Computer Science, University of Central Florida, Orlando, FL, 32816, USA
| | - Xin Geng
- School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, 77030, USA
| | - Junjie Shi
- School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, 77030, USA
| | - Jing Zhang
- School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, 77030, USA
| | - Pramesh Shakya
- Department of Computer Science, University of Central Florida, Orlando, FL, 32816, USA
| | - Xiaoming Liu
- USF Genomics, College of Public Health, University of South Florida, Tampa, FL, 33612, USA
| | - Shaojie Zhang
- Department of Computer Science, University of Central Florida, Orlando, FL, 32816, USA.
| | - Degui Zhi
- School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, 77030, USA.
- Center for Precision Health, School of Biomedical Informatics, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX, 77030, USA.
| |
Collapse
|
64
|
Kling D, Phillips C, Kennett D, Tillmar A. Investigative genetic genealogy: Current methods, knowledge and practice. Forensic Sci Int Genet 2021; 52:102474. [PMID: 33592389 DOI: 10.1016/j.fsigen.2021.102474] [Citation(s) in RCA: 52] [Impact Index Per Article: 17.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2020] [Revised: 01/12/2021] [Accepted: 01/27/2021] [Indexed: 12/15/2022]
Abstract
Investigative genetic genealogy (IGG) has emerged as a new, rapidly growing field of forensic science. We describe the process whereby dense SNP data, commonly comprising more than half a million markers, are employed to infer distant relationships. By distant we refer to degrees of relatedness exceeding that of first cousins. We review how methods of relationship matching and SNP analysis on an enlarged scale are used in a forensic setting to identify a suspect in a criminal investigation or a missing person. There is currently a strong need in forensic genetics not only to understand the underlying models to infer relatedness but also to fully explore the DNA technologies and data used in IGG. This review brings together many of the topics and examines their effectiveness and operational limits, while suggesting future directions for their forensic validation. We further investigated the methods used by the major direct-to-consumer (DTC) genetic ancestry testing companies as well as submitting a questionnaire where providers of forensic genetic genealogy summarized their operation/services. Although most of the DTC market, and genetic genealogy in general, has undisclosed, proprietary algorithms we review the current knowledge where information has been discussed and published more openly.
Collapse
Affiliation(s)
- Daniel Kling
- Department of Forensic Genetics and Forensic Toxicology, National Board of Forensic Medicine, Linköping, Sweden; Department of Forensic Sciences, Oslo University Hospital, Oslo, Norway.
| | - Christopher Phillips
- Forensic Genetics Unit, Institute of Forensic Sciences, University of Santiago de Compostela, Santiago de Compostela, Spain.
| | - Debbie Kennett
- Research Department of Genetics, Evolution and Environment, University College London, Gower Street, London WC1E 6BT, United Kingdom
| | - Andreas Tillmar
- Department of Forensic Genetics and Forensic Toxicology, National Board of Forensic Medicine, Linköping, Sweden; Department of Biomedical and Clinical Sciences, Faculty of Medicine and Health Sciences, Linköping University, Linköping, Sweden
| |
Collapse
|
65
|
Petty LE, Phillippi-Falkenstein K, Kubisch HM, Raveendran M, Harris RA, Vallender EJ, Huff CD, Bohm RP, Rogers J, Below JE. Pedigree reconstruction and distant pairwise relatedness estimation from genome sequence data: A demonstration in a population of rhesus macaques (Macaca mulatta). Mol Ecol Resour 2021; 21:1333-1346. [PMID: 33386679 PMCID: PMC8247968 DOI: 10.1111/1755-0998.13317] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2019] [Revised: 11/13/2020] [Accepted: 12/07/2020] [Indexed: 12/30/2022]
Abstract
A primary challenge in the analysis of free‐ranging animal populations is the accurate estimation of relatedness among individuals. Many aspects of population analysis rely on knowledge of relatedness patterns, including socioecology, demography, heritability and gene mapping analyses, wildlife conservation and the management of breeding colonies. Methods for determining relatedness using genome‐wide data have improved our ability to determine kinship and reconstruct pedigrees in humans. However, methods for reconstructing complex pedigree structures and estimating distant relatedness (beyond third‐degree) have not been widely applied to other species. We sequenced the genomes of 150 male rhesus macaques from the Tulane National Primate Research Center colony to estimate pairwise relatedness, reconstruct closely related pedigrees, estimate more distant relationships and augment colony records. Methods for determining relatedness developed for human genetic data were applied and evaluated in the analysis of nonhuman primates, including identity‐by‐descent‐based methods for pedigree reconstruction and shared segment‐based inference of more distant relatedness. We compared the genotype‐based pedigrees and estimated relationships to available colony pedigree records and found high concordance (95.5% agreement) between expected and identified relationships for close relatives. In addition, we detected distant relationships not captured in colony records, including some as distant as twelfth‐degree. Furthermore, while deep sequence coverage is preferable, we show that this approach can also provide valuable information when only low‐coverage (5×) sequence data is available. Our findings demonstrate the value of these methods for determination of relatedness in various animal populations, with diverse applications to conservation biology, evolutionary and ecological research and biomedical studies.
Collapse
Affiliation(s)
- Lauren E Petty
- Vanderbilt Genetics Institute and Department of Genetic Medicine, Vanderbilt University Medical Center, Nashville, TN, USA
| | | | - H Michael Kubisch
- Division of Veterinary Medicine, Tulane National Primate Research Center, Covington, LA, USA
| | - Muthuswamy Raveendran
- Human Genome Sequencing Center and Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
| | - R Alan Harris
- Human Genome Sequencing Center and Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
| | - Eric J Vallender
- Division of Veterinary Medicine, Tulane National Primate Research Center, Covington, LA, USA.,Department of Psychiatry and Human Behavior, University of Mississippi Medical Center, Jackson, MS, USA
| | - Chad D Huff
- Department of Epidemiology, University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Rudolf P Bohm
- Division of Veterinary Medicine, Tulane National Primate Research Center, Covington, LA, USA
| | - Jeffrey Rogers
- Human Genome Sequencing Center and Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
| | - Jennifer E Below
- Vanderbilt Genetics Institute and Department of Genetic Medicine, Vanderbilt University Medical Center, Nashville, TN, USA
| |
Collapse
|
66
|
Quinodoz M, Peter VG, Bedoni N, Royer Bertrand B, Cisarova K, Salmaninejad A, Sepahi N, Rodrigues R, Piran M, Mojarrad M, Pasdar A, Ghanbari Asad A, Sousa AB, Coutinho Santos L, Superti-Furga A, Rivolta C. AutoMap is a high performance homozygosity mapping tool using next-generation sequencing data. Nat Commun 2021; 12:518. [PMID: 33483490 PMCID: PMC7822856 DOI: 10.1038/s41467-020-20584-4] [Citation(s) in RCA: 76] [Impact Index Per Article: 25.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2020] [Accepted: 12/09/2020] [Indexed: 12/11/2022] Open
Abstract
Homozygosity mapping is a powerful method for identifying mutations in patients with recessive conditions, especially in consanguineous families or isolated populations. Historically, it has been used in conjunction with genotypes from highly polymorphic markers, such as DNA microsatellites or common SNPs. Traditional software performs rather poorly with data from Whole Exome Sequencing (WES) and Whole Genome Sequencing (WGS), which are now extensively used in medical genetics. We develop AutoMap, a tool that is both web-based or downloadable, to allow performing homozygosity mapping directly on VCF (Variant Call Format) calls from WES or WGS projects. Following a training step on WES data from 26 consanguineous families and a validation procedure on a matched cohort, our method shows higher overall performances when compared with eight existing tools. Most importantly, when tested on real cases with negative molecular diagnosis from an internal set, AutoMap detects three gene-disease and multiple variant-disease associations that were previously unrecognized, projecting clear benefits for both molecular diagnosis and research activities in medical genetics. Homozygosity mapping is a useful tool for identifying candidate mutations in recessive conditions, however application to next generation sequencing data has been sub-optimal. Here, the authors present AutoMap, which efficiently identifies runs of homozygosity in whole exome/genome sequencing data.
Collapse
Affiliation(s)
- Mathieu Quinodoz
- Institute of Molecular and Clinical Ophthalmology Basel (IOB), Basel, Switzerland.,Department of Ophthalmology, University of Basel, Basel, Switzerland.,Department of Genetics and Genome Biology, University of Leicester, Leicester, UK
| | - Virginie G Peter
- Institute of Molecular and Clinical Ophthalmology Basel (IOB), Basel, Switzerland.,Department of Ophthalmology, University of Basel, Basel, Switzerland.,Department of Genetics and Genome Biology, University of Leicester, Leicester, UK.,Institute of Experimental Pathology, Lausanne University Hospital (CHUV), Lausanne, Switzerland
| | - Nicola Bedoni
- Service of Medical Genetics, Lausanne University Hospital (CHUV), Lausanne, Switzerland
| | - Béryl Royer Bertrand
- Service of Medical Genetics, Lausanne University Hospital (CHUV), Lausanne, Switzerland
| | - Katarina Cisarova
- Service of Medical Genetics, Lausanne University Hospital (CHUV), Lausanne, Switzerland
| | - Arash Salmaninejad
- Department of Medical Genetics, Faculty of Medicine, Mashhad University of Medical Sciences, Mashhad, Iran
| | - Neda Sepahi
- Noncommunicable Diseases Research Center, Fasa University of Sciences, Fasa, Iran
| | - Raquel Rodrigues
- Department of Medical Genetics, Hospital Santa Maria, Centro Hospitalar Universitário Lisboa Norte (CHULN), Lisbon Academic Medical Center (CAML), Lisbon, Portugal
| | - Mehran Piran
- Noncommunicable Diseases Research Center, Fasa University of Sciences, Fasa, Iran.,Bioinformatics and Computational Biology Research Center, Shiraz University of Medical Sciences, Shiraz, Iran
| | - Majid Mojarrad
- Department of Medical Genetics, Faculty of Medicine, Mashhad University of Medical Sciences, Mashhad, Iran
| | - Alireza Pasdar
- Department of Medical Genetics, Faculty of Medicine, Mashhad University of Medical Sciences, Mashhad, Iran.,Division of Applied Medicine, Medical School, University of Aberdeen, Aberdeen, UK
| | - Ali Ghanbari Asad
- Noncommunicable Diseases Research Center, Fasa University of Sciences, Fasa, Iran
| | - Ana Berta Sousa
- Department of Medical Genetics, Hospital Santa Maria, Centro Hospitalar Universitário Lisboa Norte (CHULN), Lisbon Academic Medical Center (CAML), Lisbon, Portugal.,Medical Faculty, Lisbon University, Lisbon, Portugal
| | | | - Andrea Superti-Furga
- Service of Medical Genetics, Lausanne University Hospital (CHUV), Lausanne, Switzerland
| | - Carlo Rivolta
- Institute of Molecular and Clinical Ophthalmology Basel (IOB), Basel, Switzerland. .,Department of Ophthalmology, University of Basel, Basel, Switzerland. .,Department of Genetics and Genome Biology, University of Leicester, Leicester, UK.
| |
Collapse
|
67
|
Qiao Y, Sannerud JG, Basu-Roy S, Hayward C, Williams AL. Distinguishing pedigree relationships via multi-way identity by descent sharing and sex-specific genetic maps. Am J Hum Genet 2021; 108:68-83. [PMID: 33385324 PMCID: PMC7820736 DOI: 10.1016/j.ajhg.2020.12.004] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2019] [Accepted: 12/07/2020] [Indexed: 12/31/2022] Open
Abstract
The proportion of samples with one or more close relatives in a genetic dataset increases rapidly with sample size, necessitating relatedness modeling and enabling pedigree-based analyses. Despite this, relatives are generally unreported and current inference methods typically detect only the degree of relatedness of sample pairs and not pedigree relationships. We developed CREST, an accurate and fast method that identifies the pedigree relationships of close relatives. CREST utilizes identity by descent (IBD) segments shared between a pair of samples and their mutual relatives, leveraging the fact that sharing rates among these individuals differ across pedigree configurations. Furthermore, CREST exploits the profound differences in sex-specific genetic maps to classify pairs as maternally or paternally related-e.g., paternal half-siblings-using the locations of autosomal IBD segments shared between the pair. In simulated data, CREST correctly classifies 91.5%-100% of grandparent-grandchild (GP) pairs, 80.0%-97.5% of avuncular (AV) pairs, and 75.5%-98.5% of half-siblings (HS) pairs compared to PADRE's rates of 38.5%-76.0% of GP, 60.5%-92.0% of AV, 73.0%-95.0% of HS pairs. Turning to the real 20,032 sample Generation Scotland (GS) dataset, CREST identified seven pedigrees with incorrect relationship types or maternal/paternal parent sexes, five of which we confirmed as mistakes, and two with uncertain relationships. After correcting these, CREST correctly determines relationship types for 93.5% of GP, 97.7% of AV, and 92.2% of HS pairs that have sufficient mutual relative data; the parent sex in 100% of HS and 99.6% of GP pairs; and it completes this analysis in 2.8 h including IBD detection in eight threads.
Collapse
Affiliation(s)
- Ying Qiao
- Department of Computational Biology, Cornell University, Ithaca, NY 14853, USA
| | - Jens G Sannerud
- Department of Computational Biology, Cornell University, Ithaca, NY 14853, USA
| | - Sayantani Basu-Roy
- Department of Computational Biology, Cornell University, Ithaca, NY 14853, USA
| | - Caroline Hayward
- MRC Human Genetics Unit, MRC Institute of Genetic and Molecular Medicine, University of Edinburgh, Edinburgh EH4 2XU, UK
| | - Amy L Williams
- Department of Computational Biology, Cornell University, Ithaca, NY 14853, USA.
| |
Collapse
|
68
|
Infantile onset Sandhoff disease: clinical manifestation and a novel common mutation in Thai patients. BMC Pediatr 2021; 21:22. [PMID: 33407268 PMCID: PMC7789739 DOI: 10.1186/s12887-020-02481-3] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/18/2020] [Accepted: 12/20/2020] [Indexed: 11/30/2022] Open
Abstract
Background Sandhoff disease (SD) is an autosomal recessive lysosomal storage disorder, resulting in accumulation of GM2 ganglioside, particular in neuronal cells. The disorder is caused by deficiency of β-hexosaminidase B (HEX-B), due to pathogenic variant of human HEXB gene. Method This study describes clinical features, biochemical, and genetic defects among Thai patients with infantile SD during 2008–2019. Results Five unrelated Thai patients presenting with developmental regression, axial hypotonia, seizures, exaggerated startle response to noise, and macular cherry red spot were confirmed to have infantile SD based on deficient HEX enzyme activities and biallelic variants of the HEXB gene. In addition, an uncommon presenting feature, cardiac defect, was observed in one patient. All the patients died in their early childhood. Plasma total HEX and HEX-B activities were severely deficient. Sequencing analysis of HEXB gene identified two variants including c.1652G>A (p.Cys551Tyr) and a novel variant of c.761T>C (p.Leu254Ser), in 90 and 10% of the mutant alleles found, respectively. The results from in silico analysis using multiple bioinformatics tools were in agreement that the p.Cys551Tyr and the p.Leu254Ser are likely pathogenic variants. Molecular modelling suggested that the Cys551Tyr disrupt disulfide bond, leading to protein destabilization while the Leu254Ser resulted in change of secondary structure from helix to coil and disturbing conformation of the active site of the enzyme. Genome-wide SNP array analysis showed no significant relatedness between the five affected individuals. These two variants were not present in control individuals. The prevalence of infantile SD in Thai population is estimated 1 in 1,458,521 and carrier frequency at 1 in 604. Conclusion The study suggests that SD likely represents the most common subtype of rare infantile GM2 gangliosidosis identified among Thai patients. We firstly described a potential common variant in HEXB in Thai patients with infantile onset SD. The data can aid a rapid molecular confirmation of infantile SD starting with the hotspot variant and the use of expanded carrier testing. Supplementary Information The online version contains supplementary material available at 10.1186/s12887-020-02481-3.
Collapse
|
69
|
Zhang W, Luo C, Scossa F, Zhang Q, Usadel B, Fernie AR, Mei H, Wen W. A phased genome based on single sperm sequencing reveals crossover pattern and complex relatedness in tea plants. THE PLANT JOURNAL : FOR CELL AND MOLECULAR BIOLOGY 2021; 105:197-208. [PMID: 33118252 DOI: 10.1111/tpj.15051] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/14/2020] [Revised: 10/19/2020] [Accepted: 10/22/2020] [Indexed: 05/27/2023]
Abstract
For diploid organisms that are highly heterozygous, a phased haploid genome can greatly aid in functional genomic, population genetic and breeding studies. Based on the genome sequencing of 135 single sperm cells of the elite tea cultivar 'Fudingdabai', we herein phased the genome of Camellia sinensis, one of the most popular beverage crops worldwide. High-resolution genetic and recombination maps of Fudingdabai were constructed, which revealed that crossover (CO) positions were frequently located in the 5' and 3' ends of annotated genes, while CO distributions across the genome were random. The low CO frequency in tea can be explained by strong CO interference, and CO simulation revealed the proportion of interference insensitive CO ranged from 5.2% to 11.7%. We furthermore developed a method to infer the relatedness between tea accessions and detected complex kinship and genetic signatures of 106 tea accessions. Among them, 59 accessions were closely related with Fudingdabai and 31 of them were first-degree relatives. We additionally identified genes displaying allele specific expression patterns between the two haplotypes of Fudingdabai and genes displaying significantly differential expression levels between Fudingdabai and other haplotypes. These results lay the foundation for further investigation of genetic and epigenetic factors underpinning the regulation of gene expression and provide insights into the evolution of tea plants as well as a valuable genetic resource for future breeding efforts.
Collapse
Affiliation(s)
- Weiyi Zhang
- Key Laboratory of Horticultural Plant Biology (MOE), College of Horticulture and Forestry Sciences, Huazhong Agricultural University, Wuhan, 430070, China
| | - Cheng Luo
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan, 430070, China
| | - Federico Scossa
- Max-Planck-Institute of Molecular Plant Physiology, Am Muehlenberg 1, Potsdam-Golm, 14476, Germany
- Council for Agricultural Research and Economics, Research Center for Genomics and Bioinformatics, Via Ardeatina 546, Rome, 00178, Italy
| | - Qinghua Zhang
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan, 430070, China
| | - Björn Usadel
- Institute for Biological Data Science, Heinrich Heine University, Düsseldorf, Germany
- Institute of Bio- and Geosciences, IBG-4: Bioinformatics, CEPLAS, Forschungszentrum Jülich, Leo-Brandt-Straße, Jülich, 52425, Germany
| | - Alisdair R Fernie
- Max-Planck-Institute of Molecular Plant Physiology, Am Muehlenberg 1, Potsdam-Golm, 14476, Germany
- Center of Plant Systems Biology and Biotechnology, Plovdiv, 4000, Bulgaria
| | - Hanwei Mei
- Shanghai Agrobiological Gene Center, Shanghai, 201106, China
| | - Weiwei Wen
- Key Laboratory of Horticultural Plant Biology (MOE), College of Horticulture and Forestry Sciences, Huazhong Agricultural University, Wuhan, 430070, China
| |
Collapse
|
70
|
Naseri A, Shi J, Lin X, Zhang S, Zhi D. RAFFI: Accurate and fast familial relationship inference in large scale biobank studies using RaPID. PLoS Genet 2021; 17:e1009315. [PMID: 33476339 PMCID: PMC7853505 DOI: 10.1371/journal.pgen.1009315] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2020] [Revised: 02/02/2021] [Accepted: 12/18/2020] [Indexed: 11/19/2022] Open
Abstract
Inference of relationships from whole-genome genetic data of a cohort is a crucial prerequisite for genome-wide association studies. Typically, relationships are inferred by computing the kinship coefficients (ϕ) and the genome-wide probability of zero IBD sharing (π0) among all pairs of individuals. Current leading methods are based on pairwise comparisons, which may not scale up to very large cohorts (e.g., sample size >1 million). Here, we propose an efficient relationship inference method, RAFFI. RAFFI leverages the efficient RaPID method to call IBD segments first, then estimate the ϕ and π0 from detected IBD segments. This inference is achieved by a data-driven approach that adjusts the estimation based on phasing quality and genotyping quality. Using simulations, we showed that RAFFI is robust against phasing/genotyping errors, admix events, and varying marker densities, and achieves higher accuracy compared to KING, the current leading method, especially for more distant relatives. When applied to the phased UK Biobank data with ~500K individuals, RAFFI is approximately 18 times faster than KING. We expect RAFFI will offer fast and accurate relatedness inference for even larger cohorts.
Collapse
Affiliation(s)
- Ardalan Naseri
- School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, Texas, United States of America
| | - Junjie Shi
- Department of Computer Science, Rice University, Houston, Texas, United States of America
| | - Xihong Lin
- Department of Biostatistics, Harvard T. H. Chan School of Public Health, Boston, Massachusetts, United States of America
- Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, Massachusetts, United States of America
- Department of Statistics, Harvard University, Cambridge, Massachusetts, United States of America
| | - Shaojie Zhang
- Department of Computer Science, University of Central Florida, Orlando, Florida, United States of America
| | - Degui Zhi
- School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, Texas, United States of America
- Center for Precision Health, The University of Texas Health Science Center at Houston, Houston, Texas, United States of America
| |
Collapse
|
71
|
Sole-Navais P, Bacelis J, Helgeland Ø, Modzelewska D, Vaudel M, Flatley C, Andreassen O, Njølstad PR, Muglia LJ, Johansson S, Zhang G, Jacobsson B. Autozygosity mapping and time-to-spontaneous delivery in Norwegian parent-offspring trios. Hum Mol Genet 2020; 29:3845-3858. [PMID: 33291140 PMCID: PMC7861013 DOI: 10.1093/hmg/ddaa255] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2020] [Revised: 11/21/2020] [Accepted: 11/24/2020] [Indexed: 11/18/2022] Open
Abstract
Parental genetic relatedness may lead to adverse health and fitness outcomes in the offspring. However, the degree to which it affects human delivery timing is unknown. We use genotype data from ≃25 000 parent-offspring trios from the Norwegian Mother, Father and Child Cohort Study to optimize runs of homozygosity (ROH) calling by maximizing the correlation between parental genetic relatedness and offspring ROHs. We then estimate the effect of maternal, paternal and fetal autozygosity and that of autozygosity mapping (common segments and gene burden test) on the timing of spontaneous onset of delivery. The correlation between offspring ROH using a variety of parameters and parental genetic relatedness ranged between −0.2 and 0.6, revealing the importance of the minimum number of genetic variants included in an ROH and the use of genetic distance. The optimized compared to predefined parameters showed a ≃45% higher correlation between parental genetic relatedness and offspring ROH. We found no evidence of an effect of maternal, paternal nor fetal overall autozygosity on spontaneous delivery timing. Yet, through autozygosity mapping, we identified three maternal loci TBC1D1, SIGLECs and EDN1 gene regions reducing the median time-to-spontaneous onset of delivery by ≃2–5% (P-value < 2.3 × 10−6). We also found suggestive evidence of a fetal locus at 3q22.2, near the RYK gene region (P-value = 2.0 × 10−6). Autozygosity mapping may provide new insights on the genetic determinants of delivery timing beyond traditional genome-wide association studies, but particular and rigorous attention should be given to ROH calling parameter selection.
Collapse
Affiliation(s)
- Pol Sole-Navais
- Department of Obstetrics and Gynecology, Institute of Clinical Sciences, Sahlgrenska Academy, University of Gothenburg, Gothenburg 41685, Sweden
| | - Jonas Bacelis
- Department of Obstetrics and Gynecology, Institute of Clinical Sciences, Sahlgrenska Academy, University of Gothenburg, Gothenburg 41685, Sweden
| | - Øyvind Helgeland
- Center for Diabetes Research, Department of Clinical Science, University of Bergen, 5020 Bergen, Norway.,Division of Health Data and Digitalization, Department of Genetics and Bioinformatics, Norwegian Institute of Public Health, Oslo 0213, Norway
| | - Dominika Modzelewska
- Department of Obstetrics and Gynecology, Institute of Clinical Sciences, Sahlgrenska Academy, University of Gothenburg, Gothenburg 41685, Sweden
| | - Marc Vaudel
- Center for Diabetes Research, Department of Clinical Science, University of Bergen, 5020 Bergen, Norway.,Department of Pediatrics and Adolescents, Haukeland University Hospital, Bergen 5021, Norway
| | - Christopher Flatley
- Division of Health Data and Digitalization, Department of Genetics and Bioinformatics, Norwegian Institute of Public Health, Oslo 0213, Norway
| | - Ole Andreassen
- NORMENT, University of Oslo, Oslo 0450, Norway.,Division of Mental Health and Addiction, Oslo University Hospital, Oslo 0450, Norway.,Department of Psychiatry, University of California San Diego, San Diego, CA 92093, USA
| | - Pål R Njølstad
- Center for Diabetes Research, Department of Clinical Science, University of Bergen, 5020 Bergen, Norway.,Department of Pediatrics and Adolescents, Haukeland University Hospital, Bergen 5021, Norway
| | - Louis J Muglia
- Department of Pediatrics, University of Cincinnati College of Medicine, Cincinnati, OH 45267, USA.,Division of Human Genetics, The Center for Prevention of Preterm Birth, Perinatal Institute, March of Dimes Prematurity Research Center Ohio Collaborative, Cincinnati Children's Hospital Medical Center, Cincinnati, OH 45267, USA
| | - Stefan Johansson
- Center for Diabetes Research, Department of Clinical Science, University of Bergen, 5020 Bergen, Norway.,Center for Medical Genetics, Haukeland University Hospital, Bergen 5021, Norway
| | - Ge Zhang
- Department of Pediatrics, University of Cincinnati College of Medicine, Cincinnati, OH 45267, USA.,Division of Human Genetics, The Center for Prevention of Preterm Birth, Perinatal Institute, March of Dimes Prematurity Research Center Ohio Collaborative, Cincinnati Children's Hospital Medical Center, Cincinnati, OH 45267, USA
| | - Bo Jacobsson
- Department of Obstetrics and Gynecology, Institute of Clinical Sciences, Sahlgrenska Academy, University of Gothenburg, Gothenburg 41685, Sweden.,Division of Health Data and Digitalization, Department of Genetics and Bioinformatics, Norwegian Institute of Public Health, Oslo 0213, Norway.,Department of Obstetrics and Gynecology, Sahlgrenska University Hospital, Gothenburg 41685, Sweden
| |
Collapse
|
72
|
Nait Saada J, Kalantzis G, Shyr D, Cooper F, Robinson M, Gusev A, Palamara PF. Identity-by-descent detection across 487,409 British samples reveals fine scale population structure and ultra-rare variant associations. Nat Commun 2020; 11:6130. [PMID: 33257650 PMCID: PMC7704644 DOI: 10.1038/s41467-020-19588-x] [Citation(s) in RCA: 43] [Impact Index Per Article: 10.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2020] [Accepted: 10/02/2020] [Indexed: 12/14/2022] Open
Abstract
Detection of Identical-By-Descent (IBD) segments provides a fundamental measure of genetic relatedness and plays a key role in a wide range of analyses. We develop FastSMC, an IBD detection algorithm that combines a fast heuristic search with accurate coalescent-based likelihood calculations. FastSMC enables biobank-scale detection and dating of IBD segments within several thousands of years in the past. We apply FastSMC to 487,409 UK Biobank samples and detect ~214 billion IBD segments transmitted by shared ancestors within the past 1500 years, obtaining a fine-grained picture of genetic relatedness in the UK. Sharing of common ancestors strongly correlates with geographic distance, enabling the use of genomic data to localize a sample's birth coordinates with a median error of 45 km. We seek evidence of recent positive selection by identifying loci with unusually strong shared ancestry and detect 12 genome-wide significant signals. We devise an IBD-based test for association between phenotype and ultra-rare loss-of-function variation, identifying 29 association signals in 7 blood-related traits.
Collapse
Affiliation(s)
| | | | - Derek Shyr
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, 02115, USA
| | - Fergus Cooper
- Department of Computer Science, University of Oxford, Oxford, UK
| | - Martin Robinson
- Department of Computer Science, University of Oxford, Oxford, UK
| | - Alexander Gusev
- Brigham & Women's Hospital, Division of Genetics, Boston, MA, 02215, USA
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, 02215, USA
| | - Pier Francesco Palamara
- Department of Statistics, University of Oxford, Oxford, UK.
- Wellcome Centre for Human Genetics, University of Oxford, Oxford, UK.
| |
Collapse
|
73
|
Zaidi AA, Mathieson I. Demographic history mediates the effect of stratification on polygenic scores. eLife 2020; 9:e61548. [PMID: 33200985 PMCID: PMC7758063 DOI: 10.7554/elife.61548] [Citation(s) in RCA: 61] [Impact Index Per Article: 15.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2020] [Accepted: 11/16/2020] [Indexed: 12/13/2022] Open
Abstract
Population stratification continues to bias the results of genome-wide association studies (GWAS). When these results are used to construct polygenic scores, even subtle biases can cumulatively lead to large errors. To study the effect of residual stratification, we simulated GWAS under realistic models of demographic history. We show that when population structure is recent, it cannot be corrected using principal components of common variants because they are uninformative about recent history. Consequently, polygenic scores are biased in that they recapitulate environmental structure. Principal components calculated from rare variants or identity-by-descent segments can correct this stratification for some types of environmental effects. While family-based studies are immune to stratification, the hybrid approach of ascertaining variants in GWAS but reestimating effect sizes in siblings reduces but does not eliminate stratification. We show that the effect of population stratification depends not only on allele frequencies and environmental structure but also on demographic history.
Collapse
Affiliation(s)
- Arslan A Zaidi
- Department of Genetics, Perelman School of Medicine, University of PennsylvaniaPhiladelphiaUnited States
| | - Iain Mathieson
- Department of Genetics, Perelman School of Medicine, University of PennsylvaniaPhiladelphiaUnited States
| |
Collapse
|
74
|
Browning SR, Browning BL. Probabilistic Estimation of Identity by Descent Segment Endpoints and Detection of Recent Selection. Am J Hum Genet 2020; 107:895-910. [PMID: 33053335 PMCID: PMC7553009 DOI: 10.1016/j.ajhg.2020.09.010] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2020] [Accepted: 09/25/2020] [Indexed: 12/18/2022] Open
Abstract
Most methods for fast detection of identity by descent (IBD) segments report identity by state segments without any quantification of the uncertainty in the endpoints and lengths of the IBD segments. We present a method for determining the posterior probability distribution of IBD segment endpoints. Our approach accounts for genotype errors, recent mutations, and gene conversions which disrupt DNA sequence identity within IBD segments, and it can be applied to large cohorts with whole-genome sequence or SNP array data. We find that our method's estimates of uncertainty are well calibrated for homogeneous samples. We quantify endpoint uncertainty for 77.7 billion IBD segments from 408,883 individuals of white British ancestry in the UK Biobank, and we use these IBD segments to find regions showing evidence of recent natural selection. We show that many spurious selection signals are eliminated by the use of unbiased estimates of IBD segment endpoints and a pedigree-based genetic map. Eleven of the twelve regions with the greatest evidence for recent selection in our scan have been identified as selected in previous analyses using different approaches. Our computationally efficient method for quantifying IBD segment endpoint uncertainty is implemented in the open source ibd-ends software package.
Collapse
Affiliation(s)
- Sharon R Browning
- Department of Biostatistics, University of Washington, Seattle, WA 98195, USA.
| | - Brian L Browning
- Department of Biostatistics, University of Washington, Seattle, WA 98195, USA; Department of Medicine, Division of Medical Genetics, University of Washington, Seattle, WA 98195, USA
| |
Collapse
|
75
|
Samuels DC, Below JE, Ness S, Yu H, Leng S, Guo Y. Alternative Applications of Genotyping Array Data Using Multivariant Methods. Trends Genet 2020; 36:857-867. [PMID: 32773169 PMCID: PMC7572808 DOI: 10.1016/j.tig.2020.07.006] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2020] [Revised: 07/08/2020] [Accepted: 07/09/2020] [Indexed: 10/23/2022]
Abstract
One of the forerunners that pioneered the revolution of high-throughput genomic technologies is the genotyping microarray technology, which can genotype millions of single-nucleotide variants simultaneously. Owing to apparent benefits, such as high speed, low cost, and high throughput, the genotyping array has gained lasting applications in genome-wide association studies (GWAS) and thus accumulated an enormous amount of data. Empowered by continuous manufactural upgrades and analytical innovation, unconventional applications of genotyping array data have emerged to address more diverse genetic problems, holding promise of boosting genetic research into human diseases through the re-mining of the rich accumulated data. Here, we review several unconventional genotyping array analysis techniques that have been built on the idea of large-scale multivariant analysis and provide empirical application examples. These unconventional outcomes of genotyping arrays include polygenic score, runs of homozygosity (ROH)/heterozygosity ratio, distant pedigree computation, and mitochondrial DNA (mtDNA) copy number inference.
Collapse
Affiliation(s)
- David C Samuels
- Department of Molecular Physiology and Biophysics, Vanderbilt Genetics Institute, Vanderbilt University, Nashville, TN 37232, USA
| | - Jennifer E Below
- Devision of Genetic Medicine, Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN 37232, USA
| | - Scott Ness
- Department of Internal Medicine, Comprehensive Cancer Center, University of New Mexico, Albuquerque, NM 87109, USA
| | - Hui Yu
- Department of Internal Medicine, Comprehensive Cancer Center, University of New Mexico, Albuquerque, NM 87109, USA
| | - Shuguang Leng
- Department of Internal Medicine, Comprehensive Cancer Center, University of New Mexico, Albuquerque, NM 87109, USA
| | - Yan Guo
- Department of Internal Medicine, Comprehensive Cancer Center, University of New Mexico, Albuquerque, NM 87109, USA.
| |
Collapse
|
76
|
Hwang LD, Tubbs JD, Luong J, Lundberg M, Moen GH, Wang G, Warrington NM, Sham PC, Cuellar-Partida G, Evans DM. Estimating indirect parental genetic effects on offspring phenotypes using virtual parental genotypes derived from sibling and half sibling pairs. PLoS Genet 2020; 16:e1009154. [PMID: 33104719 PMCID: PMC7646364 DOI: 10.1371/journal.pgen.1009154] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2020] [Revised: 11/05/2020] [Accepted: 09/28/2020] [Indexed: 02/03/2023] Open
Abstract
Indirect parental genetic effects may be defined as the influence of parental
genotypes on offspring phenotypes over and above that which results from the
transmission of genes from parents to their children. However, given the
relative paucity of large-scale family-based cohorts around the world, it is
difficult to demonstrate parental genetic effects on human traits, particularly
at individual loci. In this manuscript, we illustrate how parental genetic
effects on offspring phenotypes, including late onset conditions, can be
estimated at individual loci in principle using large-scale genome-wide
association study (GWAS) data, even in the absence of parental genotypes. Our
strategy involves creating “virtual” mothers and fathers by estimating the
genotypic dosages of parental genotypes using physically genotyped data from
relative pairs. We then utilize the expected dosages of the parents, and the
actual genotypes of the offspring relative pairs, to perform conditional genetic
association analyses to obtain asymptotically unbiased estimates of maternal,
paternal and offspring genetic effects. We apply our approach to 19066 sibling
pairs from the UK Biobank and show that a polygenic score consisting of imputed
parental educational attainment SNP dosages is strongly related to offspring
educational attainment even after correcting for offspring genotype at the same
loci. We develop a freely available web application that quantifies the power of
our approach using closed form asymptotic solutions. We implement our methods in
a user-friendly software package IMPISH (IMputing
Parental genotypes In Siblings and
Half Siblings) which allows users to quickly and efficiently
impute parental genotypes across the genome in large genome-wide datasets, and
then use these estimated dosages in downstream linear mixed model association
analyses. We conclude that imputing parental genotypes from relative pairs may
provide a useful adjunct to existing large-scale genetic studies of parents and
their offspring. Indirect parental genetic effects may be defined as the influence of parental
genotypes on offspring phenotypes over and above that which results from the
transmission of genes from parents to children. Estimating indirect parental
genetic effects on offspring outcomes at the genotype level has been challenging
because it requires large-scale, individual level genotypes from both parents
and their offspring, and there is a paucity of cohorts around the world with
this information. Here we present a new approach to estimate indirect parental
genetic effects without the requirement of physically genotyped parents. Our
method creates virtual parental genotypes based on the genotypes of offspring
pairs, and then uses these virtual genotypes in downstream genetic association
analyses. We developed a software package “IMPISH” that allows users to impute
virtual parental genotypes in their own genome-wide datasets and then use these
in downstream genome-wide association analyses, as well a series of power
calculators to estimate the power to detect indirect parental genetic effects on
offspring phenotypes. We apply our method to educational attainment data from
the UK Biobank and show that indirect parental genetic effects are related to
offspring educational attainment even after correcting for offspring genotype at
the same loci.
Collapse
Affiliation(s)
- Liang-Dar Hwang
- The University of Queensland Diamantina Institute, The University of
Queensland, Brisbane, Australia
| | - Justin D. Tubbs
- Department of Psychiatry, The University of Hong Kong, Hong Kong SAR,
China
| | - Justin Luong
- The University of Queensland Diamantina Institute, The University of
Queensland, Brisbane, Australia
| | - Mischa Lundberg
- The University of Queensland Diamantina Institute, The University of
Queensland, Brisbane, Australia
- Transformational Bioinformatics, Commonwealth Scientific and Industrial
Research Organisation, Sydney, New South Wales, Australia
| | - Gunn-Helen Moen
- The University of Queensland Diamantina Institute, The University of
Queensland, Brisbane, Australia
- Institute of Clinical Medicine, Faculty of Medicine, University of Oslo,
Oslo, Norway
- Population Health Science, Bristol Medical School, University of Bristol,
Bristol, United Kingdom
- K.G. Jebsen Center for Genetic Epidemiology, Department of Public Health
and Nursing, NTNU, Norwegian University of Science and Technology, Trondheim,
Norway
| | - Geng Wang
- The University of Queensland Diamantina Institute, The University of
Queensland, Brisbane, Australia
| | - Nicole M. Warrington
- The University of Queensland Diamantina Institute, The University of
Queensland, Brisbane, Australia
- K.G. Jebsen Center for Genetic Epidemiology, Department of Public Health
and Nursing, NTNU, Norwegian University of Science and Technology, Trondheim,
Norway
| | - Pak C. Sham
- Department of Psychiatry, The University of Hong Kong, Hong Kong SAR,
China
- Centre for PanorOmic Sciences, The University of Hong Kong, Hong Kong
SAR, China
- Centre of Brain and Cognitive Sciences, The University of Hong Kong, Hong
Kong SAR, China
| | - Gabriel Cuellar-Partida
- The University of Queensland Diamantina Institute, The University of
Queensland, Brisbane, Australia
- 23andMe Inc, Sunnyvale, California, United States of
America
| | - David M. Evans
- The University of Queensland Diamantina Institute, The University of
Queensland, Brisbane, Australia
- Medical Research Council Integrative Epidemiology Unit at the University
of Bristol, Bristol, United Kingdom
- * E-mail:
| |
Collapse
|
77
|
Tuazon AMDA, Lott P, Bohórquez M, Benavides J, Ramirez C, Criollo A, Estrada-Florez A, Mateus G, Velez A, Carmona J, Olaya J, Garcia E, Polanco-Echeverry G, Stultz J, Alvarez C, Tapia T, Ashton-Prolla P, Vega A, Lazaro C, Tornero E, Martinez-Bouzas C, Infante M, De La Hoya M, Diez O, Browning BL, Rannala B, Teixeira MR, Carvallo P, Echeverry M, Carvajal-Carmona LG. Haplotype analysis of the internationally distributed BRCA1 c.3331_3334delCAAG founder mutation reveals a common ancestral origin in Iberia. Breast Cancer Res 2020; 22:108. [PMID: 33087180 PMCID: PMC7579869 DOI: 10.1186/s13058-020-01341-3] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2020] [Accepted: 09/16/2020] [Indexed: 12/02/2022] Open
Abstract
Background The BRCA1 c.3331_3334delCAAG founder mutation has been reported in hereditary breast and ovarian cancer families from multiple Hispanic groups. We aimed to evaluate BRCA1 c.3331_3334delCAAG haplotype diversity in cases of European, African, and Latin American ancestry. Methods BC mutation carrier cases from Colombia (n = 32), Spain (n = 13), Portugal (n = 2), Chile (n = 10), Africa (n = 1), and Brazil (n = 2) were genotyped with the genome-wide single nucleotide polymorphism (SNP) arrays to evaluate haplotype diversity around BRCA1 c.3331_3334delCAAG. Additional Portuguese (n = 13) and Brazilian (n = 18) BC mutation carriers were genotyped for 15 informative SNPs surrounding BRCA1. Data were phased using SHAPEIT2, and identical by descent regions were determined using BEAGLE and GERMLINE. DMLE+ was used to date the mutation in Colombia and Iberia. Results The haplotype reconstruction revealed a shared 264.4-kb region among carriers from all six countries. The estimated mutation age was ~ 100 generations in Iberia and that it was introduced to South America early during the European colonization period. Conclusions Our results suggest that this mutation originated in Iberia and later introduced to Colombia and South America at the time of Spanish colonization during the early 1500s. We also found that the Colombian mutation carriers had higher European ancestry, at the BRCA1 gene harboring chromosome 17, than controls, which further supported the European origin of the mutation. Understanding founder mutations in diverse populations has implications in implementing cost-effective, ancestry-informed screening.
Collapse
Affiliation(s)
| | - Paul Lott
- Genome Center, University of California Davis, Davis, CA, USA
| | | | | | | | | | | | | | - Alejandro Velez
- Hospital Pablo Tobon Uribe, Medellín, Colombia.,Dinamica IPS, Medellín, Colombia
| | | | - Justo Olaya
- Hospital Universitario Hernando Moncaleano Perdomo, Neiva, Colombia
| | - Elisha Garcia
- Genome Center, University of California Davis, Davis, CA, USA
| | | | - Jacob Stultz
- Genome Center, University of California Davis, Davis, CA, USA
| | | | - Teresa Tapia
- Pontificia Universidad Católica de Chile, Santiago, Chile
| | - Patricia Ashton-Prolla
- Department of Genetics, Federal University of Rio Grande do Sul (UFRGS), Porto Alegre, Brazil.,Post-graduate Course in Genetics and Molecular Biology, UFRGS, Porto Alegre, Brazil.,Medical Genetics Service, Hospital de Clinicas de Porto Alegre (HCPA), Porto Alegre, Brazil
| | | | - Ana Vega
- Fundación Pública Galega de Medicina Xenómica, Grupo de Medicina Xenómica-USC, CIBERER, IDIS, Santiago de Compostela, Spain
| | - Conxi Lazaro
- Hereditary Cancer Program, Catalan Institute of Oncology, Oncobell Program-IDIBELL, Hospitalet de Llobregat, Barcelona, Spain.,Centro de Investigación Biomédica en Red de Cáncer (CIBERONC), Madrid, Spain
| | - Eva Tornero
- Hereditary Cancer Program, Catalan Institute of Oncology, Oncobell Program-IDIBELL, Hospitalet de Llobregat, Barcelona, Spain.,Centro de Investigación Biomédica en Red de Cáncer (CIBERONC), Madrid, Spain
| | | | - Mar Infante
- Cancer Genetics Group, Institute of Genetics and Molecular Biology (UVa-CSIC), Valladolid, Spain
| | - Miguel De La Hoya
- Laboratorio de Oncología Molecular, Hospital Clínico San Carlos. IdISSC (Instituto de Investigación Sanitaria San Carlos), Madrid, Spain
| | - Orland Diez
- Grupo de Cáncer Hereditario, Instituto Oncológico Vall d'Hebron (VHIO), Madrid, Spain
| | - Brian L Browning
- Department of Medicine, Division of Medical Genetics, University of Washington, Seattle, WA, USA
| | | | - Bruce Rannala
- Department of Evolution and Ecology, University of California Davis, Davis, CA, USA
| | - Manuel R Teixeira
- Portuguese Oncology Institute of Porto (IPO Porto) and Biomedical Sciences Institute (ICBAS), University of Porto, Porto, Portugal
| | - Pilar Carvallo
- Pontificia Universidad Católica de Chile, Santiago, Chile
| | | | - Luis G Carvajal-Carmona
- Genome Center, University of California Davis, Davis, CA, USA. .,Division de Investigaciones, Fundacion de Genética y Genómica, Ibague, Colombia. .,University of California Davis Comprehensive Cancer Center, Sacramento, CA, USA. .,Department of Biochemistry and Molecular Medicine, University of California Davis, Sacramento, CA, USA.
| |
Collapse
|
78
|
Surbakti S, Parker HG, McIntyre JK, Maury HK, Cairns KM, Selvig M, Pangau-Adam M, Safonpo A, Numberi L, Runtuboi DYP, Davis BW, Ostrander EA. New Guinea highland wild dogs are the original New Guinea singing dogs. Proc Natl Acad Sci U S A 2020; 117:24369-24376. [PMID: 32868416 PMCID: PMC7533868 DOI: 10.1073/pnas.2007242117] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
New Guinea singing dogs (NGSD) are identifiable by their namesake vocalizations, which are unlike any other canid population. Their novel behaviors and potential singular origin during dog domestication make them an attractive, but elusive, subject for evolutionary and conservation study. Although once plentiful on the island of New Guinea (NG), they were presumed to currently exist only in captivity. This conclusion was based on the lack of sightings in the lowlands of the island and the concurrent expansion of European- and Asian-derived dogs. We have analyzed the first nuclear genomes from a canid population discovered during a recent expedition to the highlands of NG. The extreme altitude (>4,000 m) of the highland wild dogs' (HWD) observed range and confirmed vocalizations indicate their potential to be a wild NGSD population. Comparison of single-nucleotide polymorphism genotypes shows strong similarity between HWD and the homogeneous captive NGSD, with the HWD showing significantly higher genetic diversity. Admixture analyses and estimation of shared haplotypes with phylogenetically diverse populations also indicates the HWD is a novel population within the distinct evolutionary lineage of Oceanic canids. Taken together, these data indicate the HWD possesses a distinct potential to aid in the conservation of NGSD both in the wild and under human care.
Collapse
Affiliation(s)
- Suriani Surbakti
- Department of Biology, Universitas Cenderawasih, Jayapura, Papua 99224, Indonesia
| | - Heidi G Parker
- National Human Genome Research Institute, National Institutes of Health, Bethesda MD 20892
| | - James K McIntyre
- New Guinea Highland Wild Dog Foundation, Fernandina Beach, FL 32034
| | - Hendra K Maury
- Department of Biology, Universitas Cenderawasih, Jayapura, Papua 99224, Indonesia
| | - Kylie M Cairns
- Centre for Ecosystem Science, School of Biological, Earth and Environmental Sciences, University of New South Wales, Sydney, NSW 2052, Australia
| | - Meagan Selvig
- Department of Conservation Biology, University of Göttingen, 37073 Göttingen Germany
| | - Margaretha Pangau-Adam
- Department of Biology, Universitas Cenderawasih, Jayapura, Papua 99224, Indonesia
- Department of Conservation Biology, University of Göttingen, 37073 Göttingen Germany
| | - Apolo Safonpo
- Department of Biology, Universitas Cenderawasih, Jayapura, Papua 99224, Indonesia
| | - Leonardo Numberi
- Department of Biology, Universitas Cenderawasih, Jayapura, Papua 99224, Indonesia
| | - Dirk Y P Runtuboi
- Department of Biology, Universitas Cenderawasih, Jayapura, Papua 99224, Indonesia
| | - Brian W Davis
- Department of Veterinary Integrative Biosciences, Texas A&M University College of Veterinary Medicine, College Station, TX 77843
| | - Elaine A Ostrander
- National Human Genome Research Institute, National Institutes of Health, Bethesda MD 20892;
| |
Collapse
|
79
|
Identity by descent analysis identifies founder events and links SOD1 familial and sporadic ALS cases. NPJ Genom Med 2020; 5:32. [PMID: 32789025 PMCID: PMC7414871 DOI: 10.1038/s41525-020-00139-8] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2020] [Accepted: 07/14/2020] [Indexed: 12/11/2022] Open
Abstract
Amyotrophic lateral sclerosis (ALS) is a neurodegenerative disorder characterised by the loss of upper and lower motor neurons resulting in paralysis and eventual death. Approximately 10% of ALS cases have a family history of disease, while the remainder present as apparently sporadic cases. Heritability studies suggest a significant genetic component to sporadic ALS, and although most sporadic cases have an unknown genetic aetiology, some familial ALS mutations have also been found in sporadic cases. This suggests that some sporadic cases may be unrecognised familial cases with reduced disease penetrance in their ancestors. A powerful strategy to uncover a familial link is identity-by-descent (IBD) analysis, which detects genomic regions that have been inherited from a common ancestor. IBD analysis was performed on 83 Australian familial ALS cases from 25 families and three sporadic ALS cases, each of whom carried one of three SOD1 mutations (p.I114T, p.V149G and p.E101G). We defined five unique 350-SNP haplotypes that carry these mutations in our cohort, indicative of five founder events. This included two founder haplotypes that carry SOD1 p.I114T; linking familial and sporadic cases. We found that SOD1 p.E101G arose independently in each family that carries this mutation and linked two families that carry SOD1 p.V149G. The age of disease onset varied between cases that carried each SOD1 p.I114T haplotype. Linking families with identical ALS mutations allows for larger sample sizes and increased statistical power to identify putative phenotypic modifiers.
Collapse
|
80
|
Kerdoncuff E, Lambert A, Achaz G. Testing for population decline using maximal linkage disequilibrium blocks. Theor Popul Biol 2020; 134:171-181. [DOI: 10.1016/j.tpb.2020.03.004] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2019] [Revised: 03/26/2020] [Accepted: 03/29/2020] [Indexed: 02/02/2023]
|
81
|
Loh PR, Genovese G, McCarroll SA. Monogenic and polygenic inheritance become instruments for clonal selection. Nature 2020; 584:136-141. [PMID: 32581363 PMCID: PMC7415571 DOI: 10.1038/s41586-020-2430-6] [Citation(s) in RCA: 99] [Impact Index Per Article: 24.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2019] [Accepted: 04/23/2020] [Indexed: 12/30/2022]
Abstract
Clonally expanded blood cells that contain somatic mutations (clonal haematopoiesis) are commonly acquired with age and increase the risk of blood cancer1-9. The blood clones identified so far contain diverse large-scale mosaic chromosomal alterations (deletions, duplications and copy-neutral loss of heterozygosity (CN-LOH)) on all chromosomes1,2,5,6,9, but the sources of selective advantage that drive the expansion of most clones remain unknown. Here, to identify genes, mutations and biological processes that give selective advantage to mutant clones, we analysed genotyping data from the blood-derived DNA of 482,789 participants from the UK Biobank10. We identified 19,632 autosomal mosaic chromosomal alterations and analysed these for relationships to inherited genetic variation. We found 52 inherited, rare, large-effect coding or splice variants in 7 genes that were associated with greatly increased vulnerability to clonal haematopoiesis with specific acquired CN-LOH mutations. Acquired mutations systematically replaced the inherited risk alleles (at MPL) or duplicated them to the homologous chromosome (at FH, NBN, MRE11, ATM, SH2B3 and TM2D3). Three of the genes (MRE11, NBN and ATM) encode components of the MRN-ATM pathway, which limits cell division after DNA damage and telomere attrition11-13; another two (MPL and SH2B3) encode proteins that regulate the self-renewal of stem cells14-16. In addition, we found that CN-LOH mutations across the genome tended to cause chromosomal segments with alleles that promote the expansion of haematopoietic cells to replace their homologous (allelic) counterparts, increasing polygenic drive for blood-cell proliferation traits. Readily acquired mutations that replace chromosomal segments with their homologous counterparts seem to interact with pervasive inherited variation to create a challenge for lifelong cytopoiesis.
Collapse
Affiliation(s)
- Po-Ru Loh
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA.
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA.
| | - Giulio Genovese
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA.
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA.
- Department of Genetics, Harvard Medical School, Boston, MA, USA.
| | - Steven A McCarroll
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA.
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA.
- Department of Genetics, Harvard Medical School, Boston, MA, USA.
| |
Collapse
|
82
|
Ioannidis AG, Blanco-Portillo J, Sandoval K, Hagelberg E, Miquel-Poblete JF, Moreno-Mayar JV, Rodríguez-Rodríguez JE, Quinto-Cortés CD, Auckland K, Parks T, Robson K, Hill AVS, Avila-Arcos MC, Sockell A, Homburger JR, Wojcik GL, Barnes KC, Herrera L, Berríos S, Acuña M, Llop E, Eng C, Huntsman S, Burchard EG, Gignoux CR, Cifuentes L, Verdugo RA, Moraga M, Mentzer AJ, Bustamante CD, Moreno-Estrada A. Native American gene flow into Polynesia predating Easter Island settlement. Nature 2020; 583:572-577. [PMID: 32641827 PMCID: PMC8939867 DOI: 10.1038/s41586-020-2487-2] [Citation(s) in RCA: 41] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2019] [Accepted: 05/22/2020] [Indexed: 11/08/2022]
Abstract
The possibility of voyaging contact between prehistoric Polynesian and Native American populations has long intrigued researchers. Proponents have pointed to the existence of New World crops, such as the sweet potato and bottle gourd, in the Polynesian archaeological record, but nowhere else outside the pre-Columbian Americas1-6, while critics have argued that these botanical dispersals need not have been human mediated7. The Norwegian explorer Thor Heyerdahl controversially suggested that prehistoric South American populations had an important role in the settlement of east Polynesia and particularly of Easter Island (Rapa Nui)2. Several limited molecular genetic studies have reached opposing conclusions, and the possibility continues to be as hotly contested today as it was when first suggested8-12. Here we analyse genome-wide variation in individuals from islands across Polynesia for signs of Native American admixture, analysing 807 individuals from 17 island populations and 15 Pacific coast Native American groups. We find conclusive evidence for prehistoric contact of Polynesian individuals with Native American individuals (around AD 1200) contemporaneous with the settlement of remote Oceania13-15. Our analyses suggest strongly that a single contact event occurred in eastern Polynesia, before the settlement of Rapa Nui, between Polynesian individuals and a Native American group most closely related to the indigenous inhabitants of present-day Colombia.
Collapse
Affiliation(s)
- Alexander G Ioannidis
- Institute for Computational and Mathematical Engineering, Stanford University, Stanford, CA, USA.
- National Laboratory of Genomics for Biodiversity (LANGEBIO), Unit of Advanced Genomics, CINVESTAV, Irapuato, Mexico.
| | - Javier Blanco-Portillo
- National Laboratory of Genomics for Biodiversity (LANGEBIO), Unit of Advanced Genomics, CINVESTAV, Irapuato, Mexico
| | - Karla Sandoval
- National Laboratory of Genomics for Biodiversity (LANGEBIO), Unit of Advanced Genomics, CINVESTAV, Irapuato, Mexico
| | - Erika Hagelberg
- Department of Biosciences, University of Oslo, Blindern, Oslo, Norway
| | | | | | | | - Consuelo D Quinto-Cortés
- National Laboratory of Genomics for Biodiversity (LANGEBIO), Unit of Advanced Genomics, CINVESTAV, Irapuato, Mexico
| | - Kathryn Auckland
- Wellcome Centre for Human Genetics, University of Oxford, Oxford, UK
| | - Tom Parks
- Wellcome Centre for Human Genetics, University of Oxford, Oxford, UK
| | - Kathryn Robson
- MRC Weatherall Institute of Molecular Medicine, University of Oxford, Oxford, UK
| | - Adrian V S Hill
- Wellcome Centre for Human Genetics, University of Oxford, Oxford, UK
- The Jenner Institute, Nuffield Department of Medicine, University of Oxford, Oxford, UK
| | - María C Avila-Arcos
- International Laboratory for Human Genome Research (LIIGH), UNAM Juriquilla, Queretaro, Mexico
| | - Alexandra Sockell
- Center for Computational, Evolutionary and Human Genomics (CEHG), Stanford University, Stanford, CA, USA
| | - Julian R Homburger
- Center for Computational, Evolutionary and Human Genomics (CEHG), Stanford University, Stanford, CA, USA
| | - Genevieve L Wojcik
- Center for Computational, Evolutionary and Human Genomics (CEHG), Stanford University, Stanford, CA, USA
| | - Kathleen C Barnes
- Division of Biomedical Informatics and Personalized Medicine, University of Colorado, Denver, CO, USA
| | - Luisa Herrera
- Human Genetics Program, Institute of Biomedical Sciences, Faculty of Medicine, University of Chile, Santiago, Chile
| | - Soledad Berríos
- Human Genetics Program, Institute of Biomedical Sciences, Faculty of Medicine, University of Chile, Santiago, Chile
| | - Mónica Acuña
- Human Genetics Program, Institute of Biomedical Sciences, Faculty of Medicine, University of Chile, Santiago, Chile
| | - Elena Llop
- Human Genetics Program, Institute of Biomedical Sciences, Faculty of Medicine, University of Chile, Santiago, Chile
| | - Celeste Eng
- Program in Pharmaceutical Sciences and Pharmacogenomics, Department of Medicine, University of California San Francisco, San Francisco, CA, USA
| | - Scott Huntsman
- Program in Pharmaceutical Sciences and Pharmacogenomics, Department of Medicine, University of California San Francisco, San Francisco, CA, USA
| | - Esteban G Burchard
- Program in Pharmaceutical Sciences and Pharmacogenomics, Department of Medicine, University of California San Francisco, San Francisco, CA, USA
| | - Christopher R Gignoux
- Division of Biomedical Informatics and Personalized Medicine, University of Colorado, Denver, CO, USA
| | - Lucía Cifuentes
- Human Genetics Program, Institute of Biomedical Sciences, Faculty of Medicine, University of Chile, Santiago, Chile
| | - Ricardo A Verdugo
- Human Genetics Program, Institute of Biomedical Sciences, Faculty of Medicine, University of Chile, Santiago, Chile
- Basic-Applied Oncology Department, Faculty of Medicine, University of Chile, Santiago, Chile
| | - Mauricio Moraga
- Human Genetics Program, Institute of Biomedical Sciences, Faculty of Medicine, University of Chile, Santiago, Chile
- Department of Anthropology, Faculty of Social Sciences, University of Chile, Santiago, Chile
| | - Alexander J Mentzer
- Wellcome Centre for Human Genetics, University of Oxford, Oxford, UK
- Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, University of Oxford, Oxford, UK
| | - Carlos D Bustamante
- Center for Computational, Evolutionary and Human Genomics (CEHG), Stanford University, Stanford, CA, USA
- Department of Biomedical Data Science, Stanford University, Stanford, CA, USA
| | - Andrés Moreno-Estrada
- National Laboratory of Genomics for Biodiversity (LANGEBIO), Unit of Advanced Genomics, CINVESTAV, Irapuato, Mexico.
| |
Collapse
|
83
|
Naseri A, Holzhauser E, Zhi D, Zhang S. Efficient haplotype matching between a query and a panel for genealogical search. Bioinformatics 2020; 35:i233-i241. [PMID: 31510689 PMCID: PMC6612857 DOI: 10.1093/bioinformatics/btz347] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Motivation With the wide availability of whole-genome genotype data, there is an increasing need for conducting genetic genealogical searches efficiently. Computationally, this task amounts to identifying shared DNA segments between a query individual and a very large panel containing millions of haplotypes. The celebrated Positional Burrows-Wheeler Transform (PBWT) data structure is a pre-computed index of the panel that enables constant time matching at each position between one haplotype and an arbitrarily large panel. However, the existing algorithm (Durbin’s Algorithm 5) can only identify set-maximal matches, the longest matches ending at any location in a panel, while in real genealogical search scenarios, multiple ‘good enough’ matches are desired. Results In this work, we developed two algorithmic extensions of Durbin’s Algorithm 5, that can find all L-long matches, matches longer than or equal to a given length L, between a query and a panel. In the first algorithm, PBWT-Query, we introduce ‘virtual insertion’ of the query into the PBWT matrix of the panel, and then scanning up and down for the PBWT match blocks with length greater than L. In our second algorithm, L-PBWT-Query, we further speed up PBWT-Query by introducing additional data structures that allow us to avoid iterating through blocks of incomplete matches. The efficiency of PBWT-Query and L-PBWT-Query is demonstrated using the simulated data and the UK Biobank data. Our results show that our proposed algorithms can detect related individuals for a given query efficiently in very large cohorts which enables a fast on-line query search. Availability and implementation genome.ucf.edu/pbwt-query Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Ardalan Naseri
- Department of Computer Science, University of Central Florida, Orlando, FL, USA
| | - Erwin Holzhauser
- Department of Computer Science, University of Central Florida, Orlando, FL, USA
| | - Degui Zhi
- School of Biomedical Informatics and School of Public Health, University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Shaojie Zhang
- Department of Computer Science, University of Central Florida, Orlando, FL, USA
| |
Collapse
|
84
|
A positively selected FBN1 missense variant reduces height in Peruvian individuals. Nature 2020; 582:234-239. [PMID: 32499652 PMCID: PMC7410362 DOI: 10.1038/s41586-020-2302-0] [Citation(s) in RCA: 31] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2019] [Accepted: 03/10/2020] [Indexed: 01/21/2023]
Abstract
On average, Peruvian individuals are among the shortest in the world1. Here we show that Native American ancestry is associated with reduced height in an ethnically diverse group of Peruvian individuals, and identify a population-specific, missense variant in the FBN1 gene (E1297G) that is significantly associated with lower height. Each copy of the minor allele (frequency of 4.7%) reduces height by 2.2 cm (4.4 cm in homozygous individuals). To our knowledge, this is the largest effect size known for a common height-associated variant. FBN1 encodes the extracellular matrix protein fibrillin 1, which is a major structural component of microfibrils. We observed less densely packed fibrillin-1-rich microfibrils with irregular edges in the skin of individuals who were homozygous for G1297 compared with individuals who were homozygous for E1297. Moreover, we show that the E1297G locus is under positive selection in non-African populations, and that the E1297 variant shows subtle evidence of positive selection specifically within the Peruvian population. This variant is also significantly more frequent in coastal Peruvian populations than in populations from the Andes or the Amazon, which suggests that short stature might be the result of adaptation to factors that are associated with the coastal environment in Peru.
Collapse
|
85
|
A framework for high-resolution phenotyping of candidate male infertility mutants: from human to mouse. Hum Genet 2020; 140:155-182. [PMID: 32248361 DOI: 10.1007/s00439-020-02159-x] [Citation(s) in RCA: 26] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2019] [Accepted: 03/27/2020] [Indexed: 12/18/2022]
Abstract
Male infertility is a heterogeneous condition of largely unknown etiology that affects at least 7% of men worldwide. Classical genetic approaches and emerging next-generation sequencing studies support genetic variants as a frequent cause of male infertility. Meanwhile, the barriers to transmission of this disease mean that most individual genetic cases will be rare, but because of the large percentage of the genome required for spermatogenesis, the number of distinct causal mutations is potentially large. Identifying bona fide causes of male infertility thus requires advanced filtering techniques to select for high-probability candidates, including the ability to test causality in animal models. The mouse remains the gold standard for defining the genotype-phenotype connection in male fertility. Here, we present a best practice guide consisting of (a) major points to consider when interpreting next-generation sequencing data performed on infertile men, and, (b) a systematic strategy to categorize infertility types and how they relate to human male infertility. Phenotyping infertility in mice can involve investigating the function of multiple cell types across the testis and epididymis, as well as sperm function. These findings will feed into the diagnosis and treatment of male infertility as well as male health broadly.
Collapse
|
86
|
Seidman DN, Shenoy SA, Kim M, Babu R, Woods IG, Dyer TD, Lehman DM, Curran JE, Duggirala R, Blangero J, Williams AL. Rapid, Phase-free Detection of Long Identity-by-Descent Segments Enables Effective Relationship Classification. Am J Hum Genet 2020; 106:453-466. [PMID: 32197076 PMCID: PMC7118564 DOI: 10.1016/j.ajhg.2020.02.012] [Citation(s) in RCA: 31] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2019] [Accepted: 02/18/2020] [Indexed: 01/29/2023] Open
Abstract
Identity-by-descent (IBD) segments are a useful tool for applications ranging from demographic inference to relationship classification, but most detection methods rely on phasing information and therefore require substantial computation time. As genetic datasets grow, methods for inferring IBD segments that scale well will be critical. We developed IBIS, an IBD detector that locates long regions of allele sharing between unphased individuals, and benchmarked it with Refined IBD, GERMLINE, and TRUFFLE on 3,000 simulated individuals. Phasing these with Beagle 5 takes 4.3 CPU days, followed by either Refined IBD or GERMLINE segment detection in 2.9 or 1.1 h, respectively. By comparison, IBIS finishes in 6.8 min or 7.8 min with IBD2 functionality enabled: speedups of 805-946× including phasing time. TRUFFLE takes 2.6 h, corresponding to IBIS speedups of 20.2-23.3×. IBIS is also accurate, inferring ≥7 cM IBD segments at quality comparable to Refined IBD and GERMLINE. With these segments, IBIS classifies first through third degree relatives in real Mexican American samples at rates meeting or exceeding other methods tested and identifies fourth through sixth degree pairs at rates within 0.0%-2.0% of the top method. While allele frequency-based approaches that do not detect segments can infer relationship degrees faster than IBIS, the fastest are biased in admixed samples, with KING inferring 30.8% fewer fifth degree Mexican American relatives correctly compared with IBIS. Finally, we ran IBIS on chromosome 2 of the UK Biobank dataset and estimate its runtime on the autosomes to be 3.3 days parallelized across 128 cores.
Collapse
Affiliation(s)
- Daniel N Seidman
- Department of Computational Biology, Cornell University, Ithaca, NY 14853, USA
| | - Sushila A Shenoy
- Department of Genetic Medicine, Weill Cornell Medicine, New York, NY 10065, USA
| | - Minsoo Kim
- Department of Genetic Medicine, Weill Cornell Medicine, New York, NY 10065, USA
| | - Ramya Babu
- Department of Computer Science, Cornell University, Ithaca, NY 14853, USA
| | - Ian G Woods
- Department of Biology, Ithaca College, Ithaca, NY 14850, USA
| | - Thomas D Dyer
- South Texas Diabetes and Obesity Institute and Department of Human Genetics, University of Texas Rio Grande Valley School of Medicine, Brownsville, TX 78520, USA
| | - Donna M Lehman
- Department of Medicine, University of Texas Health Science Center at San Antonio, San Antonio, TX 78229, USA
| | - Joanne E Curran
- South Texas Diabetes and Obesity Institute and Department of Human Genetics, University of Texas Rio Grande Valley School of Medicine, Brownsville, TX 78520, USA
| | - Ravindranath Duggirala
- South Texas Diabetes and Obesity Institute and Department of Human Genetics, University of Texas Rio Grande Valley School of Medicine, Brownsville, TX 78520, USA
| | - John Blangero
- South Texas Diabetes and Obesity Institute and Department of Human Genetics, University of Texas Rio Grande Valley School of Medicine, Brownsville, TX 78520, USA
| | - Amy L Williams
- Department of Computational Biology, Cornell University, Ithaca, NY 14853, USA.
| |
Collapse
|
87
|
Zhou Y, Browning SR, Browning BL. A Fast and Simple Method for Detecting Identity-by-Descent Segments in Large-Scale Data. Am J Hum Genet 2020; 106:426-437. [PMID: 32169169 PMCID: PMC7118582 DOI: 10.1016/j.ajhg.2020.02.010] [Citation(s) in RCA: 74] [Impact Index Per Article: 18.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2019] [Accepted: 02/12/2020] [Indexed: 12/24/2022] Open
Abstract
Segments of identity by descent (IBD) are used in many genetic analyses. We present a method for detecting identical-by-descent haplotype segments in phased genotype data. Our method, called hap-IBD, combines a compressed representation of haplotype data, the positional Burrows-Wheeler transform, and multi-threaded execution to produce very fast analysis times. An attractive feature of hap-IBD is its simplicity: the input parameters clearly and precisely define the IBD segments that are reported, so that program correctness can be confirmed by users. We evaluate hap-IBD and four state-of-the-art IBD segment detection methods (GERMLINE, iLASH, RaPID, and TRUFFLE) using UK Biobank chromosome 20 data and simulated sequence data. We show that hap-IBD detects IBD segments faster and more accurately than competing methods, and that hap-IBD is the only method that can rapidly and accurately detect short 2-4 centiMorgan (cM) IBD segments in the full UK Biobank data. Analysis of 485,346 UK Biobank samples through the use of hap-IBD with 12 computational threads detects 231.5 billion autosomal IBD segments with length ≥2 cM in 24.4 h.
Collapse
Affiliation(s)
- Ying Zhou
- Department of Biostatistics, University of Washington, Seattle, WA 98195, USA
| | - Sharon R Browning
- Department of Biostatistics, University of Washington, Seattle, WA 98195, USA
| | - Brian L Browning
- Department of Biostatistics, University of Washington, Seattle, WA 98195, USA; Department of Medicine, Division of Medical Genetics, University of Washington, Seattle, WA 98195, USA.
| |
Collapse
|
88
|
Vai S, Amorim CEG, Lari M, Caramelli D. Kinship Determination in Archeological Contexts Through DNA Analysis. Front Ecol Evol 2020. [DOI: 10.3389/fevo.2020.00083] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
|
89
|
Beyond broad strokes: sociocultural insights from the study of ancient genomes. Nat Rev Genet 2020; 21:355-366. [DOI: 10.1038/s41576-020-0218-z] [Citation(s) in RCA: 25] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 02/04/2020] [Indexed: 01/01/2023]
|
90
|
Leitwein M, Duranton M, Rougemont Q, Gagnaire PA, Bernatchez L. Using Haplotype Information for Conservation Genomics. Trends Ecol Evol 2020; 35:245-258. [DOI: 10.1016/j.tree.2019.10.012] [Citation(s) in RCA: 34] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2019] [Revised: 10/18/2019] [Accepted: 10/28/2019] [Indexed: 12/19/2022]
|
91
|
Ghoreishifar SM, Moradi-Shahrbabak H, Fallahi MH, Jalil Sarghale A, Moradi-Shahrbabak M, Abdollahi-Arpanahi R, Khansefid M. Genomic measures of inbreeding coefficients and genome-wide scan for runs of homozygosity islands in Iranian river buffalo, Bubalus bubalis. BMC Genet 2020; 21:16. [PMID: 32041535 PMCID: PMC7011551 DOI: 10.1186/s12863-020-0824-y] [Citation(s) in RCA: 34] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2019] [Accepted: 02/04/2020] [Indexed: 01/06/2023] Open
Abstract
Background Consecutive homozygous fragments of a genome inherited by offspring from a common ancestor are known as runs of homozygosity (ROH). ROH can be used to calculate genomic inbreeding and to identify genomic regions that are potentially under historical selection pressure. The dataset of our study consisted of 254 Azeri (AZ) and 115 Khuzestani (KHZ) river buffalo genotyped for ~ 65,000 SNPs for the following two purposes: 1) to estimate and compare inbreeding calculated using ROH (FROH), excess of homozygosity (FHOM), correlation between uniting gametes (FUNI), and diagonal elements of the genomic relationship matrix (FGRM); 2) to identify frequently occurring ROH (i.e. ROH islands) for our selection signature and gene enrichment studies. Results In this study, 9102 ROH were identified, with an average number of 21.2 ± 13.1 and 33.2 ± 15.9 segments per animal in AZ and KHZ breeds, respectively. On average in AZ, 4.35% (108.8 ± 120.3 Mb), and in KHZ, 5.96% (149.1 ± 107.7 Mb) of the genome was autozygous. The estimated inbreeding values based on FHOM, FUNI and FGRM were higher in AZ than they were in KHZ, which was in contrast to the FROH estimates. We identified 11 ROH islands (four in AZ and seven in KHZ). In the KHZ breed, the genes located in ROH islands were enriched for multiple Gene Ontology (GO) terms (P ≤ 0.05). The genes located in ROH islands were associated with diverse biological functions and traits such as body size and muscle development (BMP2), immune response (CYP27B1), milk production and components (MARS, ADRA1A, and KCTD16), coat colour and pigmentation (PMEL and MYO1A), reproductive traits (INHBC, INHBE, STAT6 and PCNA), and bone development (SUOX). Conclusion The calculated FROH was in line with expected higher inbreeding in KHZ than in AZ because of the smaller effective population size of KHZ. Thus, we find that FROH can be used as a robust estimate of genomic inbreeding. Further, the majority of ROH peaks were overlapped with or in close proximity to the previously reported genomic regions with signatures of selection. This tells us that it is likely that the genes in the ROH islands have been subject to artificial or natural selection.
Collapse
Affiliation(s)
- Seyed Mohammad Ghoreishifar
- Department of Animal Science, University College of Agriculture and Natural Resources, University of Tehran, Karaj, 31587-11167, Iran
| | - Hossein Moradi-Shahrbabak
- Department of Animal Science, University College of Agriculture and Natural Resources, University of Tehran, Karaj, 31587-11167, Iran.
| | - Mohammad Hossein Fallahi
- Department of Animal Science, University College of Agriculture and Natural Resources, University of Tehran, Karaj, 31587-11167, Iran
| | - Ali Jalil Sarghale
- Department of Animal Science, University College of Agriculture and Natural Resources, University of Tehran, Karaj, 31587-11167, Iran
| | - Mohammad Moradi-Shahrbabak
- Department of Animal Science, University College of Agriculture and Natural Resources, University of Tehran, Karaj, 31587-11167, Iran
| | - Rostam Abdollahi-Arpanahi
- Departments of Animal and Poultry Science, College of Aburaihan, University of Tehran, Pakdasht, 33916-53755, Iran
| | - Majid Khansefid
- AgriBio Centre for AgriBioscience, Agriculture Victoria, Bundoora, VIC, 3083, Australia
| |
Collapse
|
92
|
Olsen HF, Tenhunen S, Dolvik NI, Våge DI, Klemetsdal G. Segment-based coancestry, additive relationship and genetic variance within and between the Norwegian and the Swedish Fjord horse populations. ACTA AGR SCAND A-AN 2020. [DOI: 10.1080/09064702.2019.1711155] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
Affiliation(s)
- Hanne Fjerdingby Olsen
- Department of Animal and Aquacultural Sciences, Faculty of Biosciences, Norwegian University of Life Sciences, Ås, Norway
| | - Saija Tenhunen
- Department of Animal and Aquacultural Sciences, Faculty of Biosciences, Norwegian University of Life Sciences, Ås, Norway
- Viking Genetics, Hollola, Finland
| | - Nils Ivar Dolvik
- Department of Companion Animal Clinical Sciences, Faculty of Veterinary Medicine, Norwegian University of Life Sciences, Ås, Norway
| | - Dag Inge Våge
- Centre for Integrative Genetics (CIGENE), Department of Animal and Aquacultural Sciences, Faculty of Biosciences, Norwegian University of Life Sciences, Ås, Norway
| | - Gunnar Klemetsdal
- Department of Animal and Aquacultural Sciences, Faculty of Biosciences, Norwegian University of Life Sciences, Ås, Norway
| |
Collapse
|
93
|
Yordy J, Kraus C, Hayward JJ, White ME, Shannon LM, Creevy KE, Promislow DEL, Boyko AR. Body size, inbreeding, and lifespan in domestic dogs. CONSERV GENET 2020; 21:137-148. [PMID: 32607099 PMCID: PMC7326369 DOI: 10.1007/s10592-019-01240-x] [Citation(s) in RCA: 51] [Impact Index Per Article: 12.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2019] [Accepted: 11/22/2019] [Indexed: 01/04/2023]
Abstract
Inbreeding poses a real or potential threat to nearly every species of conservation concern. Inbreeding leads to loss of diversity at the individual level, which can cause inbreeding depression, and at the population level, which can hinder ability to respond to a changing environment. In closed populations such as endangered species and ex situ breeding programs, some degree of inbreeding is inevitable. It is therefore vital to understand how different patterns of breeding and inbreeding can affect fitness in real animals. Domestic dogs provide an excellent model, showing dramatic variation in degree of inbreeding and in lifespan, an important aspect of fitness that is known to be impacted by inbreeding in other species. There is a strong negative correlation between body size and lifespan in dogs, but it is unknown whether the higher rate of aging in large dogs is due to body size per se or some other factor associated with large size. We used dense genome-wide SNP array data to calculate average inbreeding for over 100 dog breeds based on autozygous segment length and found that large breeds tend to have higher coefficients of inbreeding than small breeds. We then used data from the Veterinary medical Database and other published sources to estimate life expectancies for pure and mixed breed dogs. When controlling for size, variation in inbreeding was not associated with life expectancy across breeds. When comparing mixed versus purebred dogs, however, mixed breed dogs lived about 1.2 years longer on average than size-matched purebred dogs. Furthermore, individual pedigree coefficients of inbreeding and lifespans for over 9000 golden retrievers showed that inbreeding does negatively impact lifespan at the individual level. Registration data from the American Kennel Club suggest that the molecular inbreeding patterns observed in purebred dogs result from specific breeding practices and/or founder effects and not the current population size. Our results suggest that recent inbreeding, as reflected in variation within a breed, is more likely to affect fitness than historic inbreeding, as reflected in variation among breeds. Our results also indicate that occasional outcrosses, as in mixed breed dogs, can have a substantial positive effect on fitness.
Collapse
Affiliation(s)
- Jennifer Yordy
- Department of Biomedical Sciences, College of Veterinary Medicine, Cornell University, Ithaca, NY, USA
- Smithsonian Conservation Biology Institute, National Zoological Park, Front Royal, VA, USA
| | - Cornelia Kraus
- Laboratory of Survival and Longevity, Max Planck Institute for Demographic Research, Rostock, Germany
- Department of Sociobiology/Anthropology, University of Göttingen, Göttingen, Germany
| | - Jessica J. Hayward
- Department of Biomedical Sciences, College of Veterinary Medicine, Cornell University, Ithaca, NY, USA
| | - Michelle E. White
- Department of Biomedical Sciences, College of Veterinary Medicine, Cornell University, Ithaca, NY, USA
| | - Laura M. Shannon
- Department of Horticultural Sciences, University of Minnesota, Minneapolis, USA
| | - Kate E. Creevy
- Department of Small Animal Clinical Sciences, Texas A&M University, College Station, USA
| | | | - Adam R. Boyko
- Department of Biomedical Sciences, College of Veterinary Medicine, Cornell University, Ithaca, NY, USA
| |
Collapse
|
94
|
McClain L, Mansour H, Ibrahim I, Klei L, Fathi W, Wood J, Kodavali C, Maysterchuk A, Wood S, El-Chennawi F, Ibrahim N, Eissa A, El-Bahaei W, El Sayed H, Yassein A, Tobar S, El-Boraie H, El-Sheshtawy E, Salah H, Ali A, Erdin S, Devlin B, Talkowski M, Nimgaonkar V. Age dependent association of inbreeding with risk for schizophrenia in Egypt. Schizophr Res 2020; 216:450-459. [PMID: 31928911 PMCID: PMC8054776 DOI: 10.1016/j.schres.2019.10.039] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/25/2019] [Revised: 10/13/2019] [Accepted: 10/14/2019] [Indexed: 12/27/2022]
Abstract
BACKGROUND Self-reported consanguinity is associated with risk for schizophrenia (SZ) in several inbred populations, but estimates using DNA-based coefficients of inbreeding are unavailable. Further, it is not known whether recessively inherited risk mutations can be identified through homozygosity by descent (HBD) mapping. METHODS We studied self-reported and DNA-based estimates of inbreeding among Egyptian patients with SZ (n = 421, DSM IV criteria) and adult controls without psychosis (n = 301), who were evaluated using semi-structured diagnostic interview schedules and genotyped using the Illumina Infinium PsychArray. Following quality control checks, coefficients of inbreeding (F) and regions of homozygosity (ROH) were estimated using PLINK software for HBD analysis. Exome sequencing was conducted in selected cases. RESULTS Inbreeding was associated with schizophrenia based on self-reported consanguinity (χ2 = 4.506, 1 df, p = 0.034) and DNA-based estimates for inbreeding (F); the latter with a significant F × age interaction (β = 32.34, p = 0.0047). The association was most notable among patients older than age 40 years. Eleven ROH were over-represented in cases on chromosomes 1, 3, 6, 11, and 14; all but one region is novel for schizophrenia risk. Exome sequencing identified six recessively-acting genes in ROH with loss-of-function variants; one of which causes primary hereditary microcephaly. CONCLUSIONS We propose consanguinity as an age-dependent risk factor for SZ in Egypt. HBD mapping is feasible for SZ in adequately powered samples.
Collapse
Affiliation(s)
- Lora McClain
- Department of Psychiatry, University of Pittsburgh School of Medicine, Western Psychiatric Hospital, Pittsburgh, PA, USA
| | - Hader Mansour
- Department of Psychiatry, University of Pittsburgh School of Medicine, Western Psychiatric Hospital, Pittsburgh, PA, USA; Department of Psychiatry, Mansoura University School of Medicine, Mansoura, Egypt
| | - Ibtihal Ibrahim
- Department of Psychiatry, Mansoura University School of Medicine, Mansoura, Egypt
| | - Lambertus Klei
- Department of Psychiatry, University of Pittsburgh School of Medicine, Western Psychiatric Hospital, Pittsburgh, PA, USA
| | - Warda Fathi
- Department of Psychiatry, Mansoura University School of Medicine, Mansoura, Egypt
| | - Joel Wood
- Department of Psychiatry, University of Pittsburgh School of Medicine, Western Psychiatric Hospital, Pittsburgh, PA, USA
| | - Chowdari Kodavali
- Department of Psychiatry, University of Pittsburgh School of Medicine, Western Psychiatric Hospital, Pittsburgh, PA, USA
| | - Alina Maysterchuk
- Department of Psychiatry, University of Pittsburgh School of Medicine, Western Psychiatric Hospital, Pittsburgh, PA, USA
| | - Shawn Wood
- Department of Psychiatry, University of Pittsburgh School of Medicine, Western Psychiatric Hospital, Pittsburgh, PA, USA
| | - Farha El-Chennawi
- Department of Clinical Pathology, Mansoura University School of Medicine, Mansoura, Egypt
| | - Nahed Ibrahim
- Department of Psychiatry, University of Pittsburgh School of Medicine, Western Psychiatric Hospital, Pittsburgh, PA, USA
| | - Ahmed Eissa
- Department of Psychiatry and Neuropsychiatry, Port Said University, Port Said, Egypt
| | - Wafaa El-Bahaei
- Department of Psychiatry, Mansoura University School of Medicine, Mansoura, Egypt
| | - Hanan El Sayed
- Department of Psychiatry, Mansoura University School of Medicine, Mansoura, Egypt
| | - Amal Yassein
- Department of Psychiatry, Mansoura University School of Medicine, Mansoura, Egypt
| | - Salwa Tobar
- Department of Psychiatry, Mansoura University School of Medicine, Mansoura, Egypt
| | - Hala El-Boraie
- Department of Psychiatry, Mansoura University School of Medicine, Mansoura, Egypt
| | - Eman El-Sheshtawy
- Department of Psychiatry, Mansoura University School of Medicine, Mansoura, Egypt
| | - Hala Salah
- Department of Psychiatry, Mansoura University School of Medicine, Mansoura, Egypt
| | - Ahmed Ali
- Department of Clinical Pathology, Mansoura University Student Hospital, Mansoura, Egypt
| | - Serkan Erdin
- Center for Genomic Medicine, Department of Neurology, Massachusetts General Hospital Research Institute, Harvard Medical School, Boston, MA, USA
| | - Bernie Devlin
- Department of Psychiatry, University of Pittsburgh School of Medicine, Western Psychiatric Hospital, Pittsburgh, PA, USA
| | - Michael Talkowski
- Center for Genomic Medicine, Department of Neurology, Massachusetts General Hospital Research Institute, Harvard Medical School, Boston, MA, USA
| | - Vishwajit Nimgaonkar
- Department of Psychiatry, University of Pittsburgh School of Medicine, Western Psychiatric Hospital, Pittsburgh, PA, USA; Department of Human Genetics, Graduate School of Public Health, University of Pittsburgh, Pittsburgh, PA, USA.
| |
Collapse
|
95
|
Edge MD, Coop G. Attacks on genetic privacy via uploads to genealogical databases. eLife 2020; 9:51810. [PMID: 31908268 PMCID: PMC6992384 DOI: 10.7554/elife.51810] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2019] [Accepted: 12/23/2019] [Indexed: 02/06/2023] Open
Abstract
Direct-to-consumer (DTC) genetics services are increasingly popular, with tens of millions of customers. Several DTC genealogy services allow users to upload genetic data to search for relatives, identified as people with genomes that share identical by state (IBS) regions. Here, we describe methods by which an adversary can learn database genotypes by uploading multiple datasets. For example, an adversary who uploads approximately 900 genomes could recover at least one allele at SNP sites across up to 82% of the genome of a median person of European ancestries. In databases that detect IBS segments using unphased genotypes, approximately 100 falsified uploads can reveal enough genetic information to allow genome-wide genetic imputation. We provide a proof-of-concept demonstration in the GEDmatch database, and we suggest countermeasures that will prevent the exploits we describe.
Collapse
Affiliation(s)
- Michael D Edge
- Center for Population Biology, University of California, Davis, Davis, United States.,Department of Evolution and Ecology, University of California, Davis, Davis, United States.,Quantitative and Computational Biology, Department of Biological Sciences, University of Southern California, Los Angeles, United States
| | - Graham Coop
- Center for Population Biology, University of California, Davis, Davis, United States.,Department of Evolution and Ecology, University of California, Davis, Davis, United States
| |
Collapse
|
96
|
Waples RK, Albrechtsen A, Moltke I. Allele frequency-free inference of close familial relationships from genotypes or low-depth sequencing data. Mol Ecol 2019; 28:35-48. [PMID: 30462358 PMCID: PMC6850436 DOI: 10.1111/mec.14954] [Citation(s) in RCA: 50] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2018] [Accepted: 10/12/2018] [Indexed: 01/03/2023]
Abstract
Knowledge of how individuals are related is important in many areas of research, and numerous methods for inferring pairwise relatedness from genetic data have been developed. However, the majority of these methods were not developed for situations where data are limited. Specifically, most methods rely on the availability of population allele frequencies, the relative genomic position of variants and accurate genotype data. But in studies of non‐model organisms or ancient samples, such data are not always available. Motivated by this, we present a new method for pairwise relatedness inference, which requires neither allele frequency information nor information on genomic position. Furthermore, it can be applied not only to accurate genotype data but also to low‐depth sequencing data from which genotypes cannot be accurately called. We evaluate it using data from a range of human populations and show that it can be used to infer close familial relationships with a similar accuracy as a widely used method that relies on population allele frequencies. Additionally, we show that our method is robust to SNP ascertainment and applicable to low‐depth sequencing data generated using different strategies, including resequencing and RADseq, which is important for application to a diverse range of populations and species.
Collapse
Affiliation(s)
- Ryan K Waples
- Section for Computational and RNA Biology, Department of Biology, University of Copenhagen, Copenhagen N, Denmark
| | - Anders Albrechtsen
- Section for Computational and RNA Biology, Department of Biology, University of Copenhagen, Copenhagen N, Denmark
| | - Ida Moltke
- Section for Computational and RNA Biology, Department of Biology, University of Copenhagen, Copenhagen N, Denmark
| |
Collapse
|
97
|
Tian X, Browning BL, Browning SR. Estimating the Genome-wide Mutation Rate with Three-Way Identity by Descent. Am J Hum Genet 2019; 105:883-893. [PMID: 31587867 DOI: 10.1016/j.ajhg.2019.09.012] [Citation(s) in RCA: 28] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2019] [Accepted: 09/09/2019] [Indexed: 12/20/2022] Open
Abstract
The two primary methods for estimating the genome-wide mutation rate have been counting de novo mutations in parent-offspring trios and comparing sequence data between closely related species. With parent-offspring trio analysis it is difficult to control for genotype error, and resolution is limited because each trio provides information from only two meioses. Inter-species comparison is difficult to calibrate due to uncertainty in the number of meioses separating species, and it can be biased by selection and by changing mutation rates over time. An alternative class of approaches for estimating mutation rates that avoids these limitations is based on identity by descent (IBD) segments that arise from common ancestry within the past few thousand years. Existing IBD-based methods are limited to highly inbred samples, or lack robustness to genotype error and error in the estimated demographic history. We present an IBD-based method that uses sharing of IBD segments among sets of three individuals to estimate the mutation rate. Our method is applicable to accurately phased genotype data, such as parent-offspring trio data phased using Mendelian rules of inheritance. Unlike standard parent-offspring analysis, our method utilizes distant relationships and is robust to genotype error. We apply our method to data from 1,307 European-ancestry individuals in the Framingham Heart Study sequenced by the NHLBI TOPMed project. We obtain an estimate of 1.29 × 10-8 mutations per base pair per meiosis with a 95% confidence interval of [1.02 × 10-8, 1.56 × 10-8].
Collapse
|
98
|
A Prospective Analysis of Genetic Variants Associated with Human Lifespan. G3-GENES GENOMES GENETICS 2019; 9:2863-2878. [PMID: 31484785 PMCID: PMC6723124 DOI: 10.1534/g3.119.400448] [Citation(s) in RCA: 32] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 02/08/2023]
Abstract
We present a massive investigation into the genetic basis of human lifespan. Beginning with a genome-wide association (GWA) study using a de-identified snapshot of the unique AncestryDNA database – more than 300,000 genotyped individuals linked to pedigrees of over 400,000,000 people – we mapped six genome-wide significant loci associated with parental lifespan. We compared these results to a GWA analysis of the traditional lifespan proxy trait, age, and found only one locus, APOE, to be associated with both age and lifespan. By combining the AncestryDNA results with those of an independent UK Biobank dataset, we conducted a meta-analysis of more than 650,000 individuals and identified fifteen parental lifespan-associated loci. Beyond just those significant loci, our genome-wide set of polymorphisms accounts for up to 8% of the variance in human lifespan; this value represents a large fraction of the heritability estimated from phenotypic correlations between relatives.
Collapse
|
99
|
Naseri A, Liu X, Tang K, Zhang S, Zhi D. RaPID: ultra-fast, powerful, and accurate detection of segments identical by descent (IBD) in biobank-scale cohorts. Genome Biol 2019; 20:143. [PMID: 31345249 PMCID: PMC6659282 DOI: 10.1186/s13059-019-1754-8] [Citation(s) in RCA: 37] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2019] [Accepted: 07/03/2019] [Indexed: 11/10/2022] Open
Abstract
While genetic relatedness, usually manifested as segments identical by descent (IBD), is ubiquitous in modern large biobanks, current IBD detection methods are not efficient at such a scale. Here, we describe an efficient method, RaPID, for detecting IBD segments in a panel with phased haplotypes. RaPID achieves a time and space complexity linear to the input size and the number of reported IBDs. With simulation, we showed that RaPID is orders of magnitude faster than existing method while offering competitive power and accuracy. In UK Biobank, RaPID identified 3,335,807 IBDs with a lenght ≥ 10 cM among 223,507 male X chromosomes in 11 min.
Collapse
Affiliation(s)
- Ardalan Naseri
- Department of Computer Science, University of Central Florida, Orlando, FL, 32816, USA
| | - Xiaoming Liu
- USF Genomics, College of Public Health, University of South Florida, Tampa, FL, 33612, USA
| | - Kecong Tang
- Department of Computer Science, University of Central Florida, Orlando, FL, 32816, USA
| | - Shaojie Zhang
- Department of Computer Science, University of Central Florida, Orlando, FL, 32816, USA.
| | - Degui Zhi
- School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, 77030, USA.
- Department of Epidemiology, Human Genetics & Environmental Sciences, The University of Texas Health Science Center at Houston, Houston, TX, 77030, USA.
| |
Collapse
|
100
|
Zhu SJ, Hendry JA, Almagro-Garcia J, Pearson RD, Amato R, Miles A, Weiss DJ, Lucas TC, Nguyen M, Gething PW, Kwiatkowski D, McVean G. The origins and relatedness structure of mixed infections vary with local prevalence of P. falciparum malaria. eLife 2019; 8:e40845. [PMID: 31298657 PMCID: PMC6684230 DOI: 10.7554/elife.40845] [Citation(s) in RCA: 42] [Impact Index Per Article: 8.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2018] [Accepted: 07/10/2019] [Indexed: 02/07/2023] Open
Abstract
Individual malaria infections can carry multiple strains of Plasmodium falciparum with varying levels of relatedness. Yet, how local epidemiology affects the properties of such mixed infections remains unclear. Here, we develop an enhanced method for strain deconvolution from genome sequencing data, which estimates the number of strains, their proportions, identity-by-descent (IBD) profiles and individual haplotypes. Applying it to the Pf3k data set, we find that the rate of mixed infection varies from 29% to 63% across countries and that 51% of mixed infections involve more than two strains. Furthermore, we estimate that 47% of symptomatic dual infections contain sibling strains likely to have been co-transmitted from a single mosquito, and find evidence of mixed infections propagated over successive infection cycles. Finally, leveraging data from the Malaria Atlas Project, we find that prevalence correlates within Africa, but not Asia, with both the rate of mixed infection and the level of IBD.
Collapse
Affiliation(s)
- Sha Joe Zhu
- Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, University of Oxford, Oxford, United Kingdom
| | - Jason A Hendry
- Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, University of Oxford, Oxford, United Kingdom
| | - Jacob Almagro-Garcia
- Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, University of Oxford, Oxford, United Kingdom
- Wellcome Centre for Human Genetics, University of Oxford, Oxford, United Kingdom
- Medical Research Council Centre for Genomics and Global Health, University of Oxford, Oxford, United Kingdom
- Wellcome Sanger Institute, Hinxton, United Kingdom
| | - Richard D Pearson
- Wellcome Centre for Human Genetics, University of Oxford, Oxford, United Kingdom
- Medical Research Council Centre for Genomics and Global Health, University of Oxford, Oxford, United Kingdom
- Wellcome Sanger Institute, Hinxton, United Kingdom
| | - Roberto Amato
- Wellcome Centre for Human Genetics, University of Oxford, Oxford, United Kingdom
- Medical Research Council Centre for Genomics and Global Health, University of Oxford, Oxford, United Kingdom
- Wellcome Sanger Institute, Hinxton, United Kingdom
| | - Alistair Miles
- Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, University of Oxford, Oxford, United Kingdom
- Wellcome Centre for Human Genetics, University of Oxford, Oxford, United Kingdom
- Medical Research Council Centre for Genomics and Global Health, University of Oxford, Oxford, United Kingdom
- Wellcome Sanger Institute, Hinxton, United Kingdom
| | - Daniel J Weiss
- Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, University of Oxford, Oxford, United Kingdom
| | - Tim Cd Lucas
- Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, University of Oxford, Oxford, United Kingdom
| | - Michele Nguyen
- Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, University of Oxford, Oxford, United Kingdom
| | - Peter W Gething
- Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, University of Oxford, Oxford, United Kingdom
| | - Dominic Kwiatkowski
- Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, University of Oxford, Oxford, United Kingdom
- Wellcome Centre for Human Genetics, University of Oxford, Oxford, United Kingdom
- Medical Research Council Centre for Genomics and Global Health, University of Oxford, Oxford, United Kingdom
- Wellcome Sanger Institute, Hinxton, United Kingdom
| | - Gil McVean
- Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, University of Oxford, Oxford, United Kingdom
- Medical Research Council Centre for Genomics and Global Health, University of Oxford, Oxford, United Kingdom
| |
Collapse
|