1
|
Caliebe A, Tekola‐Ayele F, Darst BF, Wang X, Song YE, Gui J, Sebro RA, Balding DJ, Saad M, Dubé M. Including diverse and admixed populations in genetic epidemiology research. Genet Epidemiol 2022; 46:347-371. [PMID: 35842778 PMCID: PMC9452464 DOI: 10.1002/gepi.22492] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2022] [Revised: 05/31/2022] [Accepted: 06/06/2022] [Indexed: 11/25/2022]
Abstract
The inclusion of ancestrally diverse participants in genetic studies can lead to new discoveries and is important to ensure equitable health care benefit from research advances. Here, members of the Ethical, Legal, Social, Implications (ELSI) committee of the International Genetic Epidemiology Society (IGES) offer perspectives on methods and analysis tools for the conduct of inclusive genetic epidemiology research, with a focus on admixed and ancestrally diverse populations in support of reproducible research practices. We emphasize the importance of distinguishing socially defined population categorizations from genetic ancestry in the design, analysis, reporting, and interpretation of genetic epidemiology research findings. Finally, we discuss the current state of genomic resources used in genetic association studies, functional interpretation, and clinical and public health translation of genomic findings with respect to diverse populations.
Collapse
Affiliation(s)
- Amke Caliebe
- Institute of Medical Informatics and StatisticsKiel University and University Hospital Schleswig‐HolsteinKielGermany
| | - Fasil Tekola‐Ayele
- Epidemiology Branch, Division of Population Health Research, Division of Intramural Research, Eunice Kennedy Shriver National Institute of Child Health and Human DevelopmentNational Institutes of HealthBethesdaMarylandUSA
| | - Burcu F. Darst
- Center for Genetic EpidemiologyUniversity of Southern CaliforniaLos AngelesCaliforniaUSA
- Public Health Sciences DivisionFred Hutchinson Cancer Research CenterSeattleWashingtonUSA
| | - Xuexia Wang
- Department of MathematicsUniversity of North TexasDentonTexasUSA
| | - Yeunjoo E. Song
- Department of Population and Quantitative Health SciencesCase Western Reserve UniversityClevelandOhioUSA
| | - Jiang Gui
- Department of Biomedical Data Science, Geisel School of Medicine, Dartmouth CollegeOne Medical Center Dr.LebanonNew HampshireUSA
| | | | - David J. Balding
- Melbourne Integrative Genomics, Schools of BioSciences and of Mathematics & StatisticsUniversity of MelbourneMelbourneAustralia
| | - Mohamad Saad
- Qatar Computing Research InstituteHamad Bin Khalifa UniversityDohaQatar
- Neuroscience Research Center, Faculty of Medical SciencesLebanese UniversityBeirutLebanon
| | - Marie‐Pierre Dubé
- Department of Medicine, and Social and Preventive MedicineUniversité de MontréalMontréalQuébecCanada
- Beaulieu‐Saucier Pharmacogenomcis CentreMontreal Heart InstituteMontrealCanada
| | | |
Collapse
|
2
|
Genetic Ancestry Inference and Its Application for the Genetic Mapping of Human Diseases. Int J Mol Sci 2021; 22:ijms22136962. [PMID: 34203440 PMCID: PMC8269095 DOI: 10.3390/ijms22136962] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2021] [Revised: 06/24/2021] [Accepted: 06/25/2021] [Indexed: 12/21/2022] Open
Abstract
Admixed populations arise when two or more ancestral populations interbreed. As a result of this admixture, the genome of admixed populations is defined by tracts of variable size inherited from these parental groups and has particular genetic features that provide valuable information about their demographic history. Diverse methods can be used to derive the ancestry apportionment of admixed individuals, and such inferences can be leveraged for the discovery of genetic loci associated with diseases and traits, therefore having important biomedical implications. In this review article, we summarize the most common methods of global and local genetic ancestry estimation and discuss the use of admixture mapping studies in human diseases.
Collapse
|
3
|
Wu J, Liu Y, Zhao Y. Systematic Review on Local Ancestor Inference From a Mathematical and Algorithmic Perspective. Front Genet 2021; 12:639877. [PMID: 34108987 PMCID: PMC8181461 DOI: 10.3389/fgene.2021.639877] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2020] [Accepted: 04/12/2021] [Indexed: 11/20/2022] Open
Abstract
Genotypic data provide deep insights into the population history and medical genetics. The local ancestry inference (LAI) (also termed local ancestry deconvolution) method uses the hidden Markov model (HMM) to solve the mathematical problem of ancestry reconstruction based on genomic data. HMM is combined with other statistical models and machine learning techniques for particular genetic tasks in a series of computer tools. In this article, we surveyed the mathematical structure, application characteristics, historical development, and benchmark analysis of the LAI method in detail, which will help researchers better understand and further develop LAI methods. Firstly, we extensively explore the mathematical structure of each model and its characteristic applications. Next, we use bibliometrics to show detailed model application fields and list articles to elaborate on the historical development. LAI publications had experienced a peak period during 2006-2016 and had kept on moving in the following years. The efficiency, accuracy, and stability of the existing models were evaluated by the benchmark. We find that phased data had higher accuracy in comparison with unphased data. We summarize these models with their distinct advantages and disadvantages. The Loter model uses dynamic programming to obtain a globally optimal solution with its parameter-free advantage. Aligned bases can be used directly in the Seqmix model if the genotype is hard to call. This research may help model developers to realize current challenges, develop more advanced models, and enable scholars to select appropriate models according to given populations and datasets.
Collapse
Affiliation(s)
- Jie Wu
- State Key Laboratory of Agrobiotechnology, China Agricultural University, Beijing, China
- Institute of Chinese Materia Medica, China Academy of Chinese Medical Sciences, Beijing, China
| | - Yangxiu Liu
- State Key Laboratory of Agrobiotechnology, China Agricultural University, Beijing, China
| | - Yiqiang Zhao
- State Key Laboratory of Agrobiotechnology, China Agricultural University, Beijing, China
| |
Collapse
|
4
|
Geza E, Mugo J, Mulder NJ, Wonkam A, Chimusa ER, Mazandu GK. A comprehensive survey of models for dissecting local ancestry deconvolution in human genome. Brief Bioinform 2020; 20:1709-1724. [PMID: 30010715 DOI: 10.1093/bib/bby044] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2018] [Revised: 04/16/2018] [Indexed: 11/14/2022] Open
Abstract
Over the past decade, studies of admixed populations have increasingly gained interest in both medical and population genetics. These studies have so far shed light on the patterns of genetic variation throughout modern human evolution and have improved our understanding of the demographics and adaptive processes of human populations. To date, there exist about 20 methods or tools to deconvolve local ancestry. These methods have merits and drawbacks in estimating local ancestry in multiway admixed populations. In this article, we survey existing ancestry deconvolution methods, with special emphasis on multiway admixture, and compare these methods based on simulation results reported by different studies, computational approaches used, including mathematical and statistical models, and biological challenges related to each method. This should orient users on the choice of an appropriate method or tool for given population admixture characteristics and update researchers on current advances, challenges and opportunities behind existing ancestry deconvolution methods.
Collapse
Affiliation(s)
- Ephifania Geza
- African Institute for Mathematical Sciences, Muizenberg, Cape Town 7945, South Africa.,Computational Biology Division, Department of Integrative Biomedical Sciences, Faculty of Health Sciences, IDM, University of Cape Town, Cape Town 7925, South Africa
| | - Jacquiline Mugo
- African Institute for Mathematical Sciences, Muizenberg, Cape Town 7945, South Africa
| | - Nicola J Mulder
- Computational Biology Division, Department of Integrative Biomedical Sciences, Faculty of Health Sciences, IDM, University of Cape Town, Cape Town 7925, South Africa
| | - Ambroise Wonkam
- Division of Human Genetics, Department of Pathology, University of Cape Town, Cape Town 7925, South Africa
| | - Emile R Chimusa
- Division of Human Genetics, Department of Pathology, University of Cape Town, Cape Town 7925, South Africa
| | - Gaston K Mazandu
- African Institute for Mathematical Sciences, Muizenberg, Cape Town 7945, South Africa.,Computational Biology Division, Department of Integrative Biomedical Sciences, Faculty of Health Sciences, IDM, University of Cape Town, Cape Town 7925, South Africa.,Division of Human Genetics, Department of Pathology, University of Cape Town, Cape Town 7925, South Africa
| |
Collapse
|
5
|
Liu Z, Shriner D, Hansen NF, Rotimi CN, Mullikin JC. Admixture mapping identifies genetic regions associated with blood pressure phenotypes in African Americans. PLoS One 2020; 15:e0232048. [PMID: 32315356 PMCID: PMC7173845 DOI: 10.1371/journal.pone.0232048] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2019] [Accepted: 04/06/2020] [Indexed: 01/11/2023] Open
Abstract
Hypertension occurs at a higher rate in African Americans than in European Americans. Based on the assumption that causal variants are more frequently found on DNA segments inherited from the ancestral population with higher disease risk, we employed admixture mapping to identify genetic loci with excess local African ancestry associated with blood pressure. Chromosomal regions 1q21.2–21.3, 4p15.1, 19q12 and 20p13 were significantly associated with diastolic blood pressure (β = 5.28, -7.94, -6.82 and 5.89, P-value = 6.39E-04, 2.07E-04, 6.56E-05 and 5.04E-04, respectively); 1q21.2–21.3 and 19q12 were also significantly associated with mean arterial pressure (β = 5.86 and -6.40, P-value = 5.32E-04 and 6.37E-04, respectively). We further selected SNPs that had large allele frequency differences within these regions and tested their association with blood pressure. SNP rs4815428 was significantly associated with diastolic blood pressure after Bonferroni correction (β = -2.42, P-value = 9.57E-04), and it partially explained the admixture mapping signal at 20p13. SNPs rs771205 (β = -1.99, P-value = 3.37E-03), rs3126067, rs2184953 and rs58001094 (the latter three exhibit strong linkage disequilibrium, β = -2.3, P-value = 1.4E-03) were identified to be significantly associated with mean arterial pressure, and together they fully explained the admixture signal at 1q21.2–21.3. Although no SNP at 4p15.1 showed large ancestral allele frequency differences in our dataset, we detected association at low-frequency African-specific variants that mapped predominantly to the gene PCDH7, which is most highly expressed in aorta. Our results suggest that these regions may harbor genetic variants that contribute to the different prevalence of hypertension.
Collapse
Affiliation(s)
- Zhi Liu
- Cancer Genetics and Comparative Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, Maryland, United States of America
| | - Daniel Shriner
- Center for Research on Genomics and Global Health, National Human Genome Research Institute, National Institutes of Health, Bethesda, Maryland, United States of America
| | - Nancy F. Hansen
- Cancer Genetics and Comparative Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, Maryland, United States of America
| | - Charles N. Rotimi
- Center for Research on Genomics and Global Health, National Human Genome Research Institute, National Institutes of Health, Bethesda, Maryland, United States of America
| | - James C. Mullikin
- Cancer Genetics and Comparative Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, Maryland, United States of America
- NIH Intramural Sequencing Center, National Human Genome Research Institute, National Institutes of Health, Bethesda, Maryland, United States of America
- * E-mail:
| | | |
Collapse
|
6
|
Wang LJ, Zhang CW, Su SC, Chen HIH, Chiu YC, Lai Z, Bouamar H, Ramirez AG, Cigarroa FG, Sun LZ, Chen Y. An ancestry informative marker panel design for individual ancestry estimation of Hispanic population using whole exome sequencing data. BMC Genomics 2019; 20:1007. [PMID: 31888480 PMCID: PMC6936141 DOI: 10.1186/s12864-019-6333-6] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023] Open
Abstract
Background Europeans and American Indians were major genetic ancestry of Hispanics in the U.S. These ancestral groups have markedly different incidence rates and outcomes in many types of cancers. Therefore, the genetic admixture may cause biased genetic association study with cancer susceptibility variants specifically in Hispanics. For example, the incidence rate of liver cancer has been shown with substantial disparity between Hispanic, Asian and non-Hispanic white populations. Currently, ancestry informative marker (AIM) panels have been widely utilized with up to a few hundred ancestry-informative single nucleotide polymorphisms (SNPs) to infer ancestry admixture. Notably, current available AIMs are predominantly located in intron and intergenic regions, while the whole exome sequencing (WES) protocols commonly used in translational research and clinical practice do not cover these markers. Thus, it remains challenging to accurately determine a patient’s admixture proportion without additional DNA testing. Results In this study we designed an unique AIM panel that infers 3-way genetic admixture from three distinct and selective continental populations (African (AFR), European (EUR), and East Asian (EAS)) within evolutionarily conserved exonic regions. Initially, about 1 million exonic SNPs from selective three populations in the 1000 Genomes Project were trimmed by their linkage disequilibrium (LD), restricted to biallelic variants, and finally we optimized to an AIM panel with 250 SNP markers, or the UT-AIM250 panel, using their ancestral informativeness statistics. Comparing to published AIM panels, UT-AIM250 performed better accuracy when we tested with three ancestral populations (accuracy: 0.995 ± 0.012 for AFR, 0.997 ± 0.007 for EUR, and 0.994 ± 0.012 for EAS). We further demonstrated the performance of the UT-AIM250 panel to admixed American (AMR) samples of the 1000 Genomes Project and obtained similar results (AFR, 0.085 ± 0.098; EUR, 0.665 ± 0.182; and EAS, 0.250 ± 0.205) to previously published AIM panels (Phillips-AIM34: AFR, 0.096 ± 0.127, EUR, 0.575 ± 0.290, and EAS, 0.330 ± 0.315; Wei-AIM278: AFR, 0.070 ± 0.096, EUR, 0.537 ± 0.267, and EAS, 0.393 ± 0.300). Subsequently, we applied the UT-AIM250 panel to a clinical dataset of 26 self-reported Hispanic patients in South Texas with hepatocellular carcinoma (HCC). We estimated the admixture proportions using WES data of adjacent non-cancer liver tissues (AFR, 0.065 ± 0.043; EUR, 0.594 ± 0.150; and EAS, 0.341 ± 0.160). Similar admixture proportions were identified from corresponding tumor tissues. In addition, we estimated admixture proportions of The Cancer Genome Atlas (TCGA) collection of hepatocellular carcinoma (TCGA-LIHC) samples (376 patients) using the UT-AIM250 panel. The panel obtained consistent admixture proportions from tumor and matched normal tissues, identified 3 possible incorrectly reported race/ethnicity, and/or provided race/ethnicity determination if necessary. Conclusions Here we demonstrated the feasibility of using evolutionarily conserved exonic regions to infer admixture proportions and provided a robust and reliable control for sample collection or patient stratification for genetic analysis. R implementation of UT-AIM250 is available at https://github.com/chenlabgccri/UT-AIM250.
Collapse
Affiliation(s)
- Li-Ju Wang
- Greehey Children's Cancer Research Institute, University of Texas Health San Antonio, San Antonio, TX, 78229, USA
| | - Catherine W Zhang
- Greehey Children's Cancer Research Institute, University of Texas Health San Antonio, San Antonio, TX, 78229, USA
| | - Sophia C Su
- Greehey Children's Cancer Research Institute, University of Texas Health San Antonio, San Antonio, TX, 78229, USA
| | - Hung-I H Chen
- Greehey Children's Cancer Research Institute, University of Texas Health San Antonio, San Antonio, TX, 78229, USA
| | - Yu-Chiao Chiu
- Greehey Children's Cancer Research Institute, University of Texas Health San Antonio, San Antonio, TX, 78229, USA
| | - Zhao Lai
- Greehey Children's Cancer Research Institute, University of Texas Health San Antonio, San Antonio, TX, 78229, USA.,Department of Molecular Medicine, University of Texas Health San Antonio, San Antonio, TX, 78229, USA
| | - Hakim Bouamar
- Department of Cell Systems and Anatomy, University of Texas Health San Antonio, San Antonio, TX, 78229, USA
| | - Amelie G Ramirez
- Department of Population Health Sciences, University of Texas Health San Antonio, San Antonio, TX, 78229, USA.,Institute for Health Promotion Research, University of Texas Health San Antonio, San Antonio, TX, 78229, USA
| | - Francisco G Cigarroa
- Department of Surgery, University of Texas Health San Antonio, San Antonio, TX, 78229, USA
| | - Lu-Zhe Sun
- Department of Cell Systems and Anatomy, University of Texas Health San Antonio, San Antonio, TX, 78229, USA
| | - Yidong Chen
- Greehey Children's Cancer Research Institute, University of Texas Health San Antonio, San Antonio, TX, 78229, USA. .,Department of Population Health Sciences, University of Texas Health San Antonio, San Antonio, TX, 78229, USA.
| |
Collapse
|
7
|
Guerrero S, López-Cortés A, Indacochea A, García-Cárdenas JM, Zambrano AK, Cabrera-Andrade A, Guevara-Ramírez P, González DA, Leone PE, Paz-Y-Miño C. Analysis of Racial/Ethnic Representation in Select Basic and Applied Cancer Research Studies. Sci Rep 2018; 8:13978. [PMID: 30228363 PMCID: PMC6143551 DOI: 10.1038/s41598-018-32264-x] [Citation(s) in RCA: 68] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2018] [Accepted: 07/26/2018] [Indexed: 12/15/2022] Open
Abstract
Over the past decades, consistent studies have shown that race/ethnicity have a great impact on cancer incidence, survival, drug response, molecular pathways and epigenetics. Despite the influence of race/ethnicity in cancer outcomes and its impact in health care quality, a comprehensive understanding of racial/ethnic inclusion in oncological research has never been addressed. We therefore explored the racial/ethnic composition of samples/individuals included in fundamental (patient-derived oncological models, biobanks and genomics) and applied cancer research studies (clinical trials). Regarding patient-derived oncological models (n = 794), 48.3% have no records on their donor's race/ethnicity, the rest were isolated from White (37.5%), Asian (10%), African American (3.8%) and Hispanic (0.4%) donors. Biobanks (n = 8,293) hold specimens from unknown (24.56%), White (59.03%), African American (11.05%), Asian (4.12%) and other individuals (1.24%). Genomic projects (n = 6,765,447) include samples from unknown (0.6%), White (91.1%), Asian (5.6%), African American (1.7%), Hispanic (0.5%) and other populations (0.5%). Concerning clinical trials (n = 89,212), no racial/ethnic registries were found in 66.95% of participants, and records were mainly obtained from Whites (25.94%), Asians (4.97%), African Americans (1.08%), Hispanics (0.16%) and other minorities (0.9%). Thus, two tendencies were observed across oncological studies: lack of racial/ethnic information and overrepresentation of Caucasian/White samples/individuals. These results clearly indicate a need to diversify oncological studies to other populations along with novel strategies to enhanced race/ethnicity data recording and reporting.
Collapse
Affiliation(s)
- Santiago Guerrero
- Centro de Investigación Genética y Genómica, Facultad de Ciencias de la Salud Eugenio Espejo, Universidad UTE, Av. Mariscal Sucre and Mariana de Jesús, Block I, 2nd floor, 170129, Quito, Ecuador.
| | - Andrés López-Cortés
- Centro de Investigación Genética y Genómica, Facultad de Ciencias de la Salud Eugenio Espejo, Universidad UTE, Av. Mariscal Sucre and Mariana de Jesús, Block I, 2nd floor, 170129, Quito, Ecuador
| | - Alberto Indacochea
- Gene Regulation, Stem Cells and Cancer Programme, Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona, Spain
- Oncology and Molecular Pathology Research Group-VHIR- Vall d' Hebron Institut de Recerca-Vall d' Hebron Hospital, P/de la Vall d'Hebron, Barcelona, Spain
| | - Jennyfer M García-Cárdenas
- Centro de Investigación Genética y Genómica, Facultad de Ciencias de la Salud Eugenio Espejo, Universidad UTE, Av. Mariscal Sucre and Mariana de Jesús, Block I, 2nd floor, 170129, Quito, Ecuador
| | - Ana Karina Zambrano
- Centro de Investigación Genética y Genómica, Facultad de Ciencias de la Salud Eugenio Espejo, Universidad UTE, Av. Mariscal Sucre and Mariana de Jesús, Block I, 2nd floor, 170129, Quito, Ecuador
| | - Alejandro Cabrera-Andrade
- Centro de Investigación Genética y Genómica, Facultad de Ciencias de la Salud Eugenio Espejo, Universidad UTE, Av. Mariscal Sucre and Mariana de Jesús, Block I, 2nd floor, 170129, Quito, Ecuador
- Carrera de Enfermería, Facultad de Ciencias de la Salud, Universidad de las Américas, Avenue de los Granados, Quito, 170125, Ecuador
- Grupo de Bio-Quimioinformática, Universidad de las Américas, Avenue de los Granados, Quito, 170125, Ecuador
| | - Patricia Guevara-Ramírez
- Centro de Investigación Genética y Genómica, Facultad de Ciencias de la Salud Eugenio Espejo, Universidad UTE, Av. Mariscal Sucre and Mariana de Jesús, Block I, 2nd floor, 170129, Quito, Ecuador
| | - Diana Abigail González
- Centro de Investigación Genética y Genómica, Facultad de Ciencias de la Salud Eugenio Espejo, Universidad UTE, Av. Mariscal Sucre and Mariana de Jesús, Block I, 2nd floor, 170129, Quito, Ecuador
| | - Paola E Leone
- Centro de Investigación Genética y Genómica, Facultad de Ciencias de la Salud Eugenio Espejo, Universidad UTE, Av. Mariscal Sucre and Mariana de Jesús, Block I, 2nd floor, 170129, Quito, Ecuador
| | - César Paz-Y-Miño
- Centro de Investigación Genética y Genómica, Facultad de Ciencias de la Salud Eugenio Espejo, Universidad UTE, Av. Mariscal Sucre and Mariana de Jesús, Block I, 2nd floor, 170129, Quito, Ecuador.
| |
Collapse
|
8
|
Khayatzadeh N, Mészáros G, Utsunomiya YT, Garcia JF, Schnyder U, Gredler B, Curik I, Sölkner J. Locus-specific ancestry to detect recent response to selection in admixed Swiss Fleckvieh cattle. Anim Genet 2016; 47:637-646. [PMID: 27435758 DOI: 10.1111/age.12470] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/30/2016] [Indexed: 01/08/2023]
Abstract
Identification of selection signatures is one of the current endeavors of evolutionary genetics. Admixed populations may be used to infer post-admixture selection. We calculated local ancestry for Swiss Fleckvieh, a composite of Simmental (SI) and Red Holstein Friesian (RHF), to infer such signals. Illumina Bovine SNP50 BeadChip data for 300 admixed, 88 SI and 97 RHF bulls were used. The average RHF ancestry across the whole genome was 0.70. To identify regions with high deviation from average, we considered two significance thresholds, based on a permutation test and extreme deviation from normal distribution. Regions on chromosomes 13 (46.3-47.3 Mb) and 18 (18.7-25.9 Mb) passed both thresholds in the direction of increased SI. Extended haplotype homozygosity within (iHS) and between (Rsb) populations was calculated to explore additional patterns of pre- and post-admixture selection signals. The Rsb score of admixed and SI was significant in a wide region of chromosome 18 (6.6-24.6 Mb) overlapped with one area of strong local ancestry deviation. FTO, with pleiotropic effect on milk and fertility, NOD2 on dairy and NKD1 and SALL1 on fertility traits are located there. Genetic differentiation of RHF and SI (Fst ), an alternative indicator of pre-admixture selection in pure populations, was calculated. No considerable overlap of peaks of local ancestry deviations and Fst was observed. We found two regions with significant signatures of post-admixture selection in this very young composite, applying comparatively stringent significance thresholds. The signals cover relatively large genomic areas and did not allow pinpointing of the gene(s) responsible for the apparent shift in ancestry proportions.
Collapse
Affiliation(s)
- N Khayatzadeh
- Division of Livestock Science, Department of Sustainable Agricultural Systems, University of Natural Resources and Life Sciences, Vienna, Gregor-Mendel-Straße 33, A-1180, Vienna, Austria
| | - G Mészáros
- Division of Livestock Science, Department of Sustainable Agricultural Systems, University of Natural Resources and Life Sciences, Vienna, Gregor-Mendel-Straße 33, A-1180, Vienna, Austria.
| | - Y T Utsunomiya
- Departamento de Medicina Veterinária Preventiva e Reprodução Animal, Faculdade de Ciências Agrárias e Veterinárias, UNESP - Univ Estadual Paulista, Jaboticabal, São Paulo, Brazil
| | - J F Garcia
- Departamento de Medicina Veterinária Preventiva e Reprodução Animal, Faculdade de Ciências Agrárias e Veterinárias, UNESP - Univ Estadual Paulista, Jaboticabal, São Paulo, Brazil.,Departamento de Apoio, Saúde e Produção Animal, Faculdade de Medicina Veterinária de Araçatuba, UNESP - Univ Estadual Paulista, Araçatuba, São Paulo, Brazil
| | - U Schnyder
- Qualitas AG, Chamerstrasse 56, CH-6300, Zug, Switzerland
| | - B Gredler
- Qualitas AG, Chamerstrasse 56, CH-6300, Zug, Switzerland
| | - I Curik
- Department of Animal Science, Faculty of Agriculture, University of Zagreb, Svetošimunska cesta 25, 10000, Zagreb, Croatia
| | - J Sölkner
- Division of Livestock Science, Department of Sustainable Agricultural Systems, University of Natural Resources and Life Sciences, Vienna, Gregor-Mendel-Straße 33, A-1180, Vienna, Austria
| |
Collapse
|
9
|
Mathias PC, Turner EH, Scroggins SM, Salipante SJ, Hoffman NG, Pritchard CC, Shirts BH. Applying Ancestry and Sex Computation as a Quality Control Tool in Targeted Next-Generation Sequencing. Am J Clin Pathol 2016; 145:308-15. [PMID: 27124912 DOI: 10.1093/ajcp/aqv098] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
OBJECTIVES To apply techniques for ancestry and sex computation from next-generation sequencing (NGS) data as an approach to confirm sample identity and detect sample processing errors. METHODS We combined a principal component analysis method with k-nearest neighbors classification to compute the ancestry of patients undergoing NGS testing. By combining this calculation with X chromosome copy number data, we determined the sex and ancestry of patients for comparison with self-report. We also modeled the sensitivity of this technique in detecting sample processing errors. RESULTS We applied this technique to 859 patient samples with reliable self-report data. Our k-nearest neighbors ancestry screen had an accuracy of 98.7% for patients reporting a single ancestry. Visual inspection of principal component plots was consistent with self-report in 99.6% of single-ancestry and mixed-ancestry patients. Our model demonstrates that approximately two-thirds of potential sample swaps could be detected in our patient population using this technique. CONCLUSIONS Patient ancestry can be estimated from NGS data incidentally sequenced in targeted panels, enabling an inexpensive quality control method when coupled with patient self-report.
Collapse
Affiliation(s)
- Patrick C Mathias
- From the Department of Laboratory Medicine, University of Washington, Seattle.
| | - Emily H Turner
- From the Department of Laboratory Medicine, University of Washington, Seattle
| | - Sheena M Scroggins
- From the Department of Laboratory Medicine, University of Washington, Seattle
| | - Stephen J Salipante
- From the Department of Laboratory Medicine, University of Washington, Seattle
| | - Noah G Hoffman
- From the Department of Laboratory Medicine, University of Washington, Seattle
| | - Colin C Pritchard
- From the Department of Laboratory Medicine, University of Washington, Seattle
| | - Brian H Shirts
- From the Department of Laboratory Medicine, University of Washington, Seattle
| |
Collapse
|
10
|
Mersha TB. Mapping asthma-associated variants in admixed populations. Front Genet 2015; 6:292. [PMID: 26483834 PMCID: PMC4586512 DOI: 10.3389/fgene.2015.00292] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2015] [Accepted: 09/03/2015] [Indexed: 12/19/2022] Open
Abstract
Admixed populations arise when two or more previously isolated populations interbreed. Mapping asthma susceptibility loci in an admixed population using admixture mapping (AM) involves screening the genome of individuals of mixed ancestry for chromosomal regions that have a higher frequency of alleles from a parental population with higher asthma risk as compared with parental population with lower asthma risk. AM takes advantage of the admixture created in populations of mixed ancestry to identify genomic regions where an association exists between genetic ancestry and asthma (in contrast to between the genotype of the marker and asthma). The theory behind AM is that chromosomal segments of affected individuals contain a significantly higher-than-average proportion of alleles from the high-risk parental population and thus are more likely to harbor disease-associated loci. Criteria to evaluate the applicability of AM as a gene mapping approach include: (1) the prevalence of the disease differences in ancestral populations from which the admixed population was formed; (2) a measurable difference in disease-causing alleles between the parental populations; (3) reduced linkage disequilibrium (LD) between unlinked loci across chromosomes and strong LD between neighboring loci; (4) a set of markers with noticeable allele-frequency differences between parental populations that contributes to the admixed population (single nucleotide polymorphisms (SNPs) are the markers of choice because they are abundant, stable, relatively cheap to genotype, and informative with regard to the LD structure of chromosomal segments); and (5) there is an understanding of the extent of segmental chromosomal admixtures and their interactions with environmental factors. Although genome-wide association studies have contributed greatly to our understanding of the genetic components of asthma, the large and increasing degree of admixture in populations across the world create many challenges for further efforts to map disease-causing genes. This review, summarizes the historical context of admixed populations and AM, and considers current opportunities to use AM to map asthma genes. In addition, we provide an overview of the potential limitations and future directions of AM in biomedical research, including joint admixture and association mapping for asthma and asthma-related disorders.
Collapse
Affiliation(s)
- Tesfaye B Mersha
- Division of Asthma Research, Department of Pediatrics, Cincinnati Children's Hospital Medical Center, University of Cincinnati Cincinnati, OH, USA
| |
Collapse
|
11
|
Leveraging Identity-by-Descent for Accurate Genotype Inference in Family Sequencing Data. PLoS Genet 2015; 11:e1005271. [PMID: 26043085 PMCID: PMC4456389 DOI: 10.1371/journal.pgen.1005271] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2014] [Accepted: 05/12/2015] [Indexed: 12/23/2022] Open
Abstract
Sequencing family DNA samples provides an attractive alternative to population based designs to identify rare variants associated with human disease due to the enrichment of causal variants in pedigrees. Previous studies showed that genotype calling accuracy can be improved by modeling family relatedness compared to standard calling algorithms. Current family-based variant calling methods use sequencing data on single variants and ignore the identity-by-descent (IBD) sharing along the genome. In this study we describe a new computational framework to accurately estimate the IBD sharing from the sequencing data, and to utilize the inferred IBD among family members to jointly call genotypes in pedigrees. Through simulations and application to real data, we showed that IBD can be reliably estimated across the genome, even at very low coverage (e.g. 2X), and genotype accuracy can be dramatically improved. Moreover, the improvement is more pronounced for variants with low frequencies, especially at low to intermediate coverage (e.g. 10X to 20X), making our approach effective in studying rare variants in cost-effective whole genome sequencing in pedigrees. We hope that our tool is useful to the research community for identifying rare variants for human disease through family-based sequencing. To identify disease variants that occur less frequently in population, sequencing families in which multiple individuals are affected is more powerful due to the enrichment of causal variants. An important step in such studies is to infer individual genotypes from sequencing data. Existing methods do not utilize full familial transmission information and therefore result in reduced accuracy of inferred genotypes. In this study we describe a new method that infers shared genetic materials among family members and then incorporate the shared genomic information in a novel algorithm that can accurately infer genotypes. Our method is particularly advantageous when inferring low frequency variants with fewer sequence data, making it effective in analyzing genome-wide sequence data. We implemented the algorithm in a computationally efficient tool to facilitate cost-effective sequencing in families for identifying disease genetic variants.
Collapse
|
12
|
Wang X, Zhang S, Li Y, Li M, Sha Q. A powerful approach to test an optimally weighted combination of rare variants in admixed populations. Genet Epidemiol 2015; 39:294-305. [PMID: 25758547 DOI: 10.1002/gepi.21894] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2014] [Revised: 01/09/2015] [Accepted: 01/26/2015] [Indexed: 11/09/2022]
Abstract
Population stratification has long been recognized as an issue in genetic association studies because unrecognized population stratification can lead to both false-positive and false-negative findings and can obscure true association signals if not appropriately corrected. This issue can be even worse in rare variant association analyses because rare variants often demonstrate stronger and potentially different patterns of stratification than common variants. To correct for population stratification in genetic association studies, we proposed a novel method to Test the effect of an Optimally Weighted combination of variants in Admixed populations (TOWA) in which the analytically derived optimal weights can be calculated from existing phenotype and genotype data. TOWA up weights rare variants and those variants that have strong associations with the phenotype. Additionally, it can adjust for the direction of the association, and allows for local ancestry difference among study subjects. Extensive simulations show that the type I error rate of TOWA is under control in the presence of population stratification and it is more powerful than existing methods. We have also applied TOWA to a real sequencing data. Our simulation studies as well as real data analysis results indicate that TOWA is a useful tool for rare variant association analyses in admixed populations.
Collapse
Affiliation(s)
- Xuexia Wang
- Joseph J. Zilber School of Public Health, University of Wisconsin-Milwaukee, Milwaukee, Wisconsin, United States of America
| | | | | | | | | |
Collapse
|
13
|
Johnson RC, Nelson GW, Zagury JF, Winkler CA. ALDsuite: Dense marker MALD using principal components of ancestral linkage disequilibrium. BMC Genet 2015; 16:23. [PMID: 25886794 PMCID: PMC4408589 DOI: 10.1186/s12863-015-0179-y] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2014] [Accepted: 02/06/2015] [Indexed: 01/04/2023] Open
Abstract
Background Mapping by admixture linkage disequilibrium (MALD) is a whole genome gene mapping method that uses LD from extended blocks of ancestry inherited from parental populations among admixed individuals to map associations for diseases, that vary in prevalence among human populations. The extended LD queried for marker association with ancestry results in a greatly reduced number of comparisons compared to standard genome wide association studies. As ancestral population LD tends to confound the analysis of admixture LD, the earliest algorithms for MALD required marker sets sufficiently sparse to lack significant ancestral LD between markers. However current genotyping technologies routinely provide dense SNP data, which convey more information than sparse sets, if this information can be efficiently used. There are currently no software solutions that offer both local ancestry inference using dense marker data and disease association statistics. Results We present here an R package, ALDsuite, which accounts for local LD using principal components of haplotypes from surrogate ancestral population data, and includes tools for quality control of data, MALD, downstream analysis of results and visualization graphics. Conclusions ALDsuite offers a fast, accurate estimation of global and local ancestry and comes bundled with the tools needed for MALD, from data quality control through mapping of and visualization of disease genes.
Collapse
Affiliation(s)
- Randall C Johnson
- BSP CCR Genetics Core, Leidos Biomedical Research, Inc, Frederick National Laboratory, Frederick, MD, 21702, USA. .,Chaire de Bioinformatique, Conservatiore National des Arts et Metieèrs, Paris, 75003, France.
| | - George W Nelson
- BSP CCR Genetics Core, Leidos Biomedical Research, Inc, Frederick National Laboratory, Frederick, MD, 21702, USA.
| | - Jean-Francois Zagury
- Chaire de Bioinformatique, Conservatiore National des Arts et Metieèrs, Paris, 75003, France.
| | - Cheryl A Winkler
- Basic Research Laboratory, Leidos Biomedical Research, Inc, Frederick National Laboratory, Frederick, MD, 21702, USA.
| |
Collapse
|
14
|
Bansal V, Libiger O. Fast individual ancestry inference from DNA sequence data leveraging allele frequencies for multiple populations. BMC Bioinformatics 2015; 16:4. [PMID: 25592880 PMCID: PMC4301802 DOI: 10.1186/s12859-014-0418-7] [Citation(s) in RCA: 39] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2014] [Accepted: 12/10/2014] [Indexed: 01/18/2023] Open
Abstract
Background Estimation of individual ancestry from genetic data is useful for the analysis of disease association studies, understanding human population history and interpreting personal genomic variation. New, computationally efficient methods are needed for ancestry inference that can effectively utilize existing information about allele frequencies associated with different human populations and can work directly with DNA sequence reads. Results We describe a fast method for estimating the relative contribution of known reference populations to an individual’s genetic ancestry. Our method utilizes allele frequencies from the reference populations and individual genotype or sequence data to obtain a maximum likelihood estimate of the global admixture proportions using the BFGS optimization algorithm. It accounts for the uncertainty in genotypes present in sequence data by using genotype likelihoods and does not require individual genotype data from external reference panels. Simulation studies and application of the method to real datasets demonstrate that our method is significantly times faster than previous methods and has comparable accuracy. Using data from the 1000 Genomes project, we show that estimates of the genome-wide average ancestry for admixed individuals are consistent between exome sequence data and whole-genome low-coverage sequence data. Finally, we demonstrate that our method can be used to estimate admixture proportions using pooled sequence data making it a valuable tool for controlling for population stratification in sequencing based association studies that utilize DNA pooling. Conclusions Our method is an efficient and versatile tool for estimating ancestry from DNA sequence data and is available from https://sites.google.com/site/vibansal/software/iAdmix. Electronic supplementary material The online version of this article (doi:10.1186/s12859-014-0418-7) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Vikas Bansal
- Department of Pediatrics, University of California San Diego, 9500 Gilman Drive, La Jolla, 92093, CA, USA. .,Scripps Translational Science Institute, 3344 N Torrey Pines Court, La Jolla, 92037, CA, USA.
| | - Ondrej Libiger
- Scripps Translational Science Institute, 3344 N Torrey Pines Court, La Jolla, 92037, CA, USA. .,Current address: MD Revolution, San Diego, CA, USA.
| |
Collapse
|
15
|
Accurate inference of local phased ancestry of modern admixed populations. Sci Rep 2014; 4:5800. [PMID: 25052506 PMCID: PMC4107375 DOI: 10.1038/srep05800] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2014] [Accepted: 07/07/2014] [Indexed: 01/10/2023] Open
Abstract
Population stratification is a growing concern in genetic-association studies. Averaged ancestry at the genome level (global ancestry) is insufficient for detecting the population substructures and correcting population stratifications in association studies. Local and phase stratification are needed for human genetic studies, but current technologies cannot be applied on the entire genome data due to various technical caveats. Here we developed a novel approach (aMAP, ancestry of Modern Admixed Populations) for inferring local phased ancestry. It took about 3 seconds on a desktop computer to finish a local ancestry analysis for each human genome with 1.4-million SNPs. This method also exhibits the scalability to larger datasets with respect to the number of SNPs, the number of samples, and the size of reference panels. It can detect the lack of the proxy of reference panels. The accuracy was 99.4%. The aMAP software has a capacity for analyzing 6-way admixed individuals. As the biomedical community continues to expand its efforts to increase the representation of diverse populations, and as the number of large whole-genome sequence datasets continues to grow rapidly, there is an increasing demand on rapid and accurate local ancestry analysis in genetics, pharmacogenomics, population genetics, and clinical diagnosis.
Collapse
|
16
|
Lee S, Abecasis G, Boehnke M, Lin X. Rare-variant association analysis: study designs and statistical tests. Am J Hum Genet 2014; 95:5-23. [PMID: 24995866 DOI: 10.1016/j.ajhg.2014.06.009] [Citation(s) in RCA: 658] [Impact Index Per Article: 65.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2014] [Indexed: 12/30/2022] Open
Abstract
Despite the extensive discovery of trait- and disease-associated common variants, much of the genetic contribution to complex traits remains unexplained. Rare variants can explain additional disease risk or trait variability. An increasing number of studies are underway to identify trait- and disease-associated rare variants. In this review, we provide an overview of statistical issues in rare-variant association studies with a focus on study designs and statistical tests. We present the design and analysis pipeline of rare-variant studies and review cost-effective sequencing designs and genotyping platforms. We compare various gene- or region-based association tests, including burden tests, variance-component tests, and combined omnibus tests, in terms of their assumptions and performance. Also discussed are the related topics of meta-analysis, population-stratification adjustment, genotype imputation, follow-up studies, and heritability due to rare variants. We provide guidelines for analysis and discuss some of the challenges inherent in these studies and future research directions.
Collapse
|