1
|
Magalhães Borges V, Horimoto ARVR, Wijsman EM, Kimura L, Nunes K, Nato AQ, Mingroni-Netto RC. Genomic Exploration of Essential Hypertension in African-Brazilian Quilombo Populations: A Comprehensive Approach with Pedigree Analysis and Family-Based Association Studies. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.06.26.24309531. [PMID: 38978678 PMCID: PMC11230341 DOI: 10.1101/2024.06.26.24309531] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/10/2024]
Abstract
Essential Hypertension (EH) is a major global health concern, causing about 9.4 million deaths annually. Its prevalence varies across different regions, affecting 17% of the population in the Americas, 19.2% in the Western Pacific, 23.2% in Europe, 25.1% in Southeast Asia, 26.3% in the Eastern Mediterranean, and 27.2% in Africa. EH is a multifactorial disease influenced by both genetic and environmental factors. While genetic factors contribute 30-60% to blood pressure variation, the genetic complexity of EH remains largely unexplained due to limited knowledge of candidate genes and population-specific differences. Various methods, including candidate gene studies, genome-wide linkage analysis (GWLA), and genome-wide association studies (GWAS), have been employed to identify genetic factors, yet much of the heritability of EH is still unknown. This study aimed to investigate the genetic basis of EH by mapping regions of interest (ROIs) and identifying candidate genes and variants influencing EH in African-derived individuals from partially isolated populations of quilombo remnants in Vale do Ribeira, São Paulo, Brazil. Samples from 431 individuals (167 affected, 261 unaffected, 3 with unknown phenotype) from eight quilombo remnant populations were genotyped using a 650k SNP array. The global ancestry proportions were estimated at 47% African, 36% European, and 16% Native American. Genealogical information from 673 individuals was used to construct six pedigrees comprising 1104 individuals. The mapping strategy consisted of a multi-level computational approach. We constructed pedigrees based on interviews and kinship coefficient, pruned the dataset to obtain three non-overlapping markers subpanels, phased the haplotype and performed local ancestry to account for admixture. We performed GWLA and dense linkage analyses using markers subpanels and performed fine-mapping using family-based association studies (FBAS) based on population and pedigree imputed data, investigating EH-related genes and variants. The linkage analysis identified 22 ROIs with LOD scores 1.45-3.03, containing markers co-segregating with the phenotype. These ROIs encompassed 2363 genes. Fine-mapping identified 60 EH-related candidate genes and 118 suggestive or significant variants (FBAS). Among these, 14 genes, including PHGDH, S100A10, MFN2, and RYR2, were highlighted with strong evidence of association with hypertension. These genes, harboring 29 SNPs, were implicated in regulating blood pressure, sodium and potassium levels, and the aldosterone pathway. This study revealed, through a complementary approach - combining admixture-adjusted genome-wide linkage analysis based on Markov chain Monte Carlo (MCMC) methods, association studies on imputed data, and in silico investigations - genetic regions, variants and candidate genes that shed light on the genetic basis of essential hypertension, with significant potential to explain the genetic etiology in quilombo remnant populations.
Collapse
Affiliation(s)
- Vinícius Magalhães Borges
- Centro de Estudos sobre o Genoma Humano e Células Tronco, Departamento de Genética e Biologia Evolutiva, Instituto de Biociências, Universidade de São Paulo, São Paulo 05508-090, Brazil
- Department of Biomedical Sciences, Joan C. Edwards School of Medicine, Marshall University, Huntington, WV 25755, USA
| | - Andrea R V R Horimoto
- Division of Medical Genetics, Department of Medicine, University of Washington, Seattle, WA, 98105 USA
| | - Ellen Marie Wijsman
- Division of Medical Genetics, Department of Medicine, University of Washington, Seattle, WA, 98105 USA
| | - Lilian Kimura
- Centro de Estudos sobre o Genoma Humano e Células Tronco, Departamento de Genética e Biologia Evolutiva, Instituto de Biociências, Universidade de São Paulo, São Paulo 05508-090, Brazil
| | - Kelly Nunes
- Centro de Estudos sobre o Genoma Humano e Células Tronco, Departamento de Genética e Biologia Evolutiva, Instituto de Biociências, Universidade de São Paulo, São Paulo 05508-090, Brazil
| | - Alejandro Q Nato
- Department of Biomedical Sciences, Joan C. Edwards School of Medicine, Marshall University, Huntington, WV 25755, USA
| | - Regina Célia Mingroni-Netto
- Centro de Estudos sobre o Genoma Humano e Células Tronco, Departamento de Genética e Biologia Evolutiva, Instituto de Biociências, Universidade de São Paulo, São Paulo 05508-090, Brazil
| |
Collapse
|
2
|
Naj AC, Lin H, Vardarajan BN, White S, Lancour D, Ma Y, Schmidt M, Sun F, Butkiewicz M, Bush WS, Kunkle BW, Malamon J, Amin N, Choi SH, Hamilton-Nelson KL, van der Lee SJ, Gupta N, Koboldt DC, Saad M, Wang B, Nato AQ, Sohi HK, Kuzma A, Wang LS, Cupples LA, van Duijn C, Seshadri S, Schellenberg GD, Boerwinkle E, Bis JC, Dupuis J, Salerno WJ, Wijsman EM, Martin ER, DeStefano AL. Quality control and integration of genotypes from two calling pipelines for whole genome sequence data in the Alzheimer's disease sequencing project. Genomics 2019; 111:808-818. [PMID: 29857119 PMCID: PMC6397097 DOI: 10.1016/j.ygeno.2018.05.004] [Citation(s) in RCA: 22] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2017] [Revised: 04/03/2018] [Accepted: 05/06/2018] [Indexed: 12/30/2022]
Abstract
The Alzheimer's Disease Sequencing Project (ADSP) performed whole genome sequencing (WGS) of 584 subjects from 111 multiplex families at three sequencing centers. Genotype calling of single nucleotide variants (SNVs) and insertion-deletion variants (indels) was performed centrally using GATK-HaplotypeCaller and Atlas V2. The ADSP Quality Control (QC) Working Group applied QC protocols to project-level variant call format files (VCFs) from each pipeline, and developed and implemented a novel protocol, termed "consensus calling," to combine genotype calls from both pipelines into a single high-quality set. QC was applied to autosomal bi-allelic SNVs and indels, and included pipeline-recommended QC filters, variant-level QC, and sample-level QC. Low-quality variants or genotypes were excluded, and sample outliers were noted. Quality was assessed by examining Mendelian inconsistencies (MIs) among 67 parent-offspring pairs, and MIs were used to establish additional genotype-specific filters for GATK calls. After QC, 578 subjects remained. Pipeline-specific QC excluded ~12.0% of GATK and 14.5% of Atlas SNVs. Between pipelines, ~91% of SNV genotypes across all QCed variants were concordant; 4.23% and 4.56% of genotypes were exclusive to Atlas or GATK, respectively; the remaining ~0.01% of discordant genotypes were excluded. For indels, variant-level QC excluded ~36.8% of GATK and 35.3% of Atlas indels. Between pipelines, ~55.6% of indel genotypes were concordant; while 10.3% and 28.3% were exclusive to Atlas or GATK, respectively; and ~0.29% of discordant genotypes were. The final WGS consensus dataset contains 27,896,774 SNVs and 3,133,926 indels and is publicly available.
Collapse
Affiliation(s)
- Adam C Naj
- Department of Biostatistics, Epidemiology, and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA; Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA.
| | - Honghuang Lin
- Department of Medicine, Boston University School of Medicine, Boston, MA, USA
| | - Badri N Vardarajan
- Department of Neurology, Columbia University Medical Center, New York, NY, USA
| | - Simon White
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
| | - Daniel Lancour
- Department of Biomedical Genetics, Boston University School of Medicine, Boston, MA, USA
| | - Yiyi Ma
- Department of Biomedical Genetics, Boston University School of Medicine, Boston, MA, USA
| | - Michael Schmidt
- John P. Hussman Institute for Human Genetics, University of Miami Miller School of Medicine, Miami, FL, USA
| | - Fangui Sun
- Department of Biostatistics, Boston University School of Public Health, Boston, MA, USA
| | - Mariusz Butkiewicz
- Department of Epidemiology and Biostatistics, Case Western Reserve University, Cleveland, OH, USA
| | - William S Bush
- Department of Epidemiology and Biostatistics, Case Western Reserve University, Cleveland, OH, USA
| | - Brian W Kunkle
- John P. Hussman Institute for Human Genetics, University of Miami Miller School of Medicine, Miami, FL, USA
| | - John Malamon
- Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Najaf Amin
- Department of Epidemiology, Erasmus Medical Center, Rotterdam, the Netherlands
| | - Seung Hoan Choi
- Department of Biostatistics, Boston University School of Public Health, Boston, MA, USA
| | - Kara L Hamilton-Nelson
- John P. Hussman Institute for Human Genetics, University of Miami Miller School of Medicine, Miami, FL, USA
| | - Sven J van der Lee
- Department of Epidemiology, Erasmus Medical Center, Rotterdam, the Netherlands
| | - Namrata Gupta
- Medical and Population Genetics Program, Broad Institute, Cambridge, MA, USA
| | - Daniel C Koboldt
- Institute for Genomic Medicine, Nationwide Children's Hospital, Columbus, OH, USA
| | - Mohamad Saad
- Department of Biostatistics, University of Washington, Seattle, WA, USA; Division of Medical Genetics, University of Washington, Seattle, WA, USA
| | - Bowen Wang
- Department of Statistics, University of Washington, Seattle, WA, USA
| | - Alejandro Q Nato
- Division of Medical Genetics, University of Washington, Seattle, WA, USA
| | - Harkirat K Sohi
- Division of Medical Genetics, University of Washington, Seattle, WA, USA
| | - Amanda Kuzma
- Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Li-San Wang
- Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - L Adrienne Cupples
- Department of Biostatistics, Boston University School of Public Health, Boston, MA, USA; The Framingham Heart Study, Framingham, MA, USA
| | - Cornelia van Duijn
- Department of Epidemiology, Erasmus Medical Center, Rotterdam, the Netherlands
| | - Sudha Seshadri
- The Framingham Heart Study, Framingham, MA, USA; Department of Neurology, Boston University School of Medicine, Boston, MA, USA
| | - Gerard D Schellenberg
- Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Eric Boerwinkle
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA; Human Genetics Center, University of Texas Health Science Center, Houston, TX, USA
| | - Joshua C Bis
- Cardiovascular Health Research Unit, Department of Medicine, University of Washington, Seattle, WA, USA
| | - Josée Dupuis
- Department of Biostatistics, Boston University School of Public Health, Boston, MA, USA; The Framingham Heart Study, Framingham, MA, USA
| | - William J Salerno
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
| | - Ellen M Wijsman
- Department of Biostatistics, University of Washington, Seattle, WA, USA; Division of Medical Genetics, University of Washington, Seattle, WA, USA
| | - Eden R Martin
- John P. Hussman Institute for Human Genetics, University of Miami Miller School of Medicine, Miami, FL, USA
| | - Anita L DeStefano
- Department of Biostatistics, Boston University School of Public Health, Boston, MA, USA; The Framingham Heart Study, Framingham, MA, USA; Department of Neurology, Boston University School of Medicine, Boston, MA, USA
| |
Collapse
|
3
|
Nafikov RA, Nato AQ, Sohi H, Wang B, Brown L, Horimoto AR, Vardarajan BN, Barral SM, Tosto G, Mayeux RP, Thornton TA, Blue E, Wijsman EM. Analysis of pedigree data in populations with multiple ancestries: Strategies for dealing with admixture in Caribbean Hispanic families from the ADSP. Genet Epidemiol 2018; 42:500-515. [PMID: 29862559 PMCID: PMC6160322 DOI: 10.1002/gepi.22133] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2017] [Revised: 05/04/2018] [Accepted: 05/14/2018] [Indexed: 11/12/2022]
Abstract
Multipoint linkage analysis is an important approach for localizing disease-associated loci in pedigrees. Linkage analysis, however, is sensitive to misspecification of marker allele frequencies. Pedigrees from recently admixed populations are particularly susceptible to this problem because of the challenge of accurately accounting for population structure. Therefore, increasing emphasis on use of multiethnic samples in genetic studies requires reevaluation of best practices, given data currently available. Typical strategies have been to compute allele frequencies from the sample, or to use marker allele frequencies determined by admixture proportions averaged over the entire sample. However, admixture proportions vary among pedigrees and throughout the genome in a family-specific manner. Here, we evaluate several approaches to model admixture in linkage analysis, providing different levels of detail about ancestral origin. To perform our evaluations, for specification of marker allele frequencies, we used data on 67 Caribbean Hispanic admixed families from the Alzheimer's Disease Sequencing Project. Our results show that choice of admixture model has an effect on the linkage analysis results. Variant-specific admixture proportions, computed for individual families, provide the most detailed regional admixture estimates, and, as such, are the most appropriate allele frequencies for linkage analysis. This likely decreases the number of false-positive results, and is straightforward to implement.
Collapse
Affiliation(s)
- Rafael A Nafikov
- Division of Medical Genetics, Department of Medicine, University of Washington, Seattle, Washington
| | - Alejandro Q Nato
- Division of Medical Genetics, Department of Medicine, University of Washington, Seattle, Washington
| | - Harkirat Sohi
- Division of Medical Genetics, Department of Medicine, University of Washington, Seattle, Washington
| | - Bowen Wang
- Department of Statistics, University of Washington, Seattle, Washington
| | - Lisa Brown
- Department of Biostatistics, University of Washington, Seattle, Washington
| | - Andrea R Horimoto
- Division of Medical Genetics, Department of Medicine, University of Washington, Seattle, Washington
| | | | - Sandra M Barral
- Department of Neurology, Columbia University, New York, Washington
| | - Giuseppe Tosto
- Department of Neurology, Columbia University, New York, Washington
| | - Richard P Mayeux
- Department of Neurology, Columbia University, New York, Washington
| | - Timothy A Thornton
- Department of Biostatistics, University of Washington, Seattle, Washington
| | - Elizabeth Blue
- Division of Medical Genetics, Department of Medicine, University of Washington, Seattle, Washington
| | - Ellen M Wijsman
- Division of Medical Genetics, Department of Medicine, University of Washington, Seattle, Washington.,Department of Biostatistics, University of Washington, Seattle, Washington
| |
Collapse
|
4
|
Truong DT, Shriberg LD, Smith SD, Chapman KL, Scheer-Cohen AR, DeMille MMC, Adams AK, Nato AQ, Wijsman EM, Eicher JD, Gruen JR. Multipoint genome-wide linkage scan for nonword repetition in a multigenerational family further supports chromosome 13q as a locus for verbal trait disorders. Hum Genet 2016; 135:1329-1341. [PMID: 27535846 PMCID: PMC5065602 DOI: 10.1007/s00439-016-1717-z] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2016] [Accepted: 07/22/2016] [Indexed: 12/19/2022]
Abstract
Verbal trait disorders encompass a wide range of conditions and are marked by deficits in five domains that impair a person's ability to communicate: speech, language, reading, spelling, and writing. Nonword repetition is a robust endophenotype for verbal trait disorders that is sensitive to cognitive processes critical to verbal development, including auditory processing, phonological working memory, and motor planning and programming. In the present study, we present a six-generation extended pedigree with a history of verbal trait disorders. Using genome-wide multipoint variance component linkage analysis of nonword repetition, we identified a region spanning chromosome 13q14-q21 with LOD = 4.45 between 52 and 55 cM, spanning approximately 5.5 Mb on chromosome 13. This region overlaps with SLI3, a locus implicated in reading disability in families with a history of specific language impairment. Our study of a large multigenerational family with verbal trait disorders further implicates the SLI3 region in verbal trait disorders. Future studies will further refine the specific causal genetic factors in this locus on chromosome 13q that contribute to language traits.
Collapse
Affiliation(s)
- D T Truong
- Department of Pediatrics, Yale School of Medicine, New Haven, CT, 06510, USA
| | - L D Shriberg
- Waisman Center, University of Wisconsin-Madison, Madison, WI, 53705, USA
| | - S D Smith
- Department of Pediatrics, University of Nebraska at Omaha, Omaha, NE, 68182, USA
| | - K L Chapman
- Department of Communication Sciences and Disorders, University of Utah, Salt Lake City, UT, 84112, USA
| | - A R Scheer-Cohen
- Department of Speech-Language Pathology, California State University, San Marcos, CA, 92096, USA
| | - M M C DeMille
- Department of Pediatrics, Yale School of Medicine, New Haven, CT, 06510, USA
| | - A K Adams
- Department of Genetics, Yale School of Medicine, New Haven, CT, 06510, USA
| | - A Q Nato
- Division of Medical Genetics, Department of Medicine, University of Washington, Seattle, WA, 98195, USA
| | - E M Wijsman
- Division of Medical Genetics, Department of Medicine, University of Washington, Seattle, WA, 98195, USA
- Department of Biostatistics and Department of Genome Sciences, University of Washington, Seattle, WA, 98195, USA
| | - J D Eicher
- Department of Genetics, Yale School of Medicine, New Haven, CT, 06510, USA
| | - J R Gruen
- Department of Pediatrics, Yale School of Medicine, New Haven, CT, 06510, USA.
- Department of Genetics, Yale School of Medicine, New Haven, CT, 06510, USA.
- Investigative Medicine Program, Yale School of Medicine, New Haven, CT, 06510, USA.
| |
Collapse
|
5
|
Chung RH, Tsai WY, Kang CY, Yao PJ, Tsai HJ, Chen CH. FamPipe: An Automatic Analysis Pipeline for Analyzing Sequencing Data in Families for Disease Studies. PLoS Comput Biol 2016; 12:e1004980. [PMID: 27272119 PMCID: PMC4894624 DOI: 10.1371/journal.pcbi.1004980] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2015] [Accepted: 05/12/2016] [Indexed: 11/18/2022] Open
Abstract
In disease studies, family-based designs have become an attractive approach to analyzing next-generation sequencing (NGS) data for the identification of rare mutations enriched in families. Substantial research effort has been devoted to developing pipelines for automating sequence alignment, variant calling, and annotation. However, fewer pipelines have been designed specifically for disease studies. Most of the current analysis pipelines for family-based disease studies using NGS data focus on a specific function, such as identifying variants with Mendelian inheritance or identifying shared chromosomal regions among affected family members. Consequently, some other useful family-based analysis tools, such as imputation, linkage, and association tools, have yet to be integrated and automated. We developed FamPipe, a comprehensive analysis pipeline, which includes several family-specific analysis modules, including the identification of shared chromosomal regions among affected family members, prioritizing variants assuming a disease model, imputation of untyped variants, and linkage and association tests. We used simulation studies to compare properties of some modules implemented in FamPipe, and based on the results, we provided suggestions for the selection of modules to achieve an optimal analysis strategy. The pipeline is under the GNU GPL License and can be downloaded for free at http://fampipe.sourceforge.net.
Collapse
Affiliation(s)
- Ren-Hua Chung
- Division of Biostatistics and Bioinformatics, Institute of Population Health Sciences, National Health Research Institutes, Zhunan, Miaoli County, Taiwan
- * E-mail:
| | - Wei-Yun Tsai
- Division of Biostatistics and Bioinformatics, Institute of Population Health Sciences, National Health Research Institutes, Zhunan, Miaoli County, Taiwan
| | - Chen-Yu Kang
- Division of Biostatistics and Bioinformatics, Institute of Population Health Sciences, National Health Research Institutes, Zhunan, Miaoli County, Taiwan
| | - Po-Ju Yao
- Division of Biostatistics and Bioinformatics, Institute of Population Health Sciences, National Health Research Institutes, Zhunan, Miaoli County, Taiwan
| | - Hui-Ju Tsai
- Division of Biostatistics and Bioinformatics, Institute of Population Health Sciences, National Health Research Institutes, Zhunan, Miaoli County, Taiwan
- Department of Public Health, China Medical University, Taichung, Taiwan
- Department of Pediatrics, Feinberg School of Medicine, Northwestern University, Chicago, Illinois, United States of America
| | - Chia-Hsiang Chen
- Department of Psychiatry, Chang Gung Memorial Hospital-Linkou, Gueishan, Taoyuan, Taiwan
- Department and Graduate Institute of Biomedical Sciences, Chang Gung University, Taoyuan, Taiwan
| |
Collapse
|
6
|
Genetic Candidate Variants in Two Multigenerational Families with Childhood Apraxia of Speech. PLoS One 2016; 11:e0153864. [PMID: 27120335 PMCID: PMC4847873 DOI: 10.1371/journal.pone.0153864] [Citation(s) in RCA: 26] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2015] [Accepted: 04/05/2016] [Indexed: 12/31/2022] Open
Abstract
Childhood apraxia of speech (CAS) is a severe and socially debilitating form of speech sound disorder with suspected genetic involvement, but the genetic etiology is not yet well understood. Very few known or putative causal genes have been identified to date, e.g., FOXP2 and BCL11A. Building a knowledge base of the genetic etiology of CAS will make it possible to identify infants at genetic risk and motivate the development of effective very early intervention programs. We investigated the genetic etiology of CAS in two large multigenerational families with familial CAS. Complementary genomic methods included Markov chain Monte Carlo linkage analysis, copy-number analysis, identity-by-descent sharing, and exome sequencing with variant filtering. No overlaps in regions with positive evidence of linkage between the two families were found. In one family, linkage analysis detected two chromosomal regions of interest, 5p15.1-p14.1, and 17p13.1-q11.1, inherited separately from the two founders. Single-point linkage analysis of selected variants identified CDH18 as a primary gene of interest and additionally, MYO10, NIPBL, GLP2R, NCOR1, FLCN, SMCR8, NEK8, and ANKRD12, possibly with additive effects. Linkage analysis in the second family detected five regions with LOD scores approaching the highest values possible in the family. A gene of interest was C4orf21 (ZGRF1) on 4q25-q28.2. Evidence for previously described causal copy-number variations and validated or suspected genes was not found. Results are consistent with a heterogeneous CAS etiology, as is expected in many neurogenic disorders. Future studies will investigate genome variants in these and other families with CAS.
Collapse
|