1
|
Dolzhenko E, English A, Dashnow H, De Sena Brandine G, Mokveld T, Rowell WJ, Karniski C, Kronenberg Z, Danzi MC, Cheung WA, Bi C, Farrow E, Wenger A, Chua KP, Martínez-Cerdeño V, Bartley TD, Jin P, Nelson DL, Zuchner S, Pastinen T, Quinlan AR, Sedlazeck FJ, Eberle MA. Characterization and visualization of tandem repeats at genome scale. Nat Biotechnol 2024; 42:1606-1614. [PMID: 38168995 DOI: 10.1038/s41587-023-02057-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2023] [Accepted: 11/06/2023] [Indexed: 01/05/2024]
Abstract
Tandem repeat (TR) variation is associated with gene expression changes and numerous rare monogenic diseases. Although long-read sequencing provides accurate full-length sequences and methylation of TRs, there is still a need for computational methods to profile TRs across the genome. Here we introduce the Tandem Repeat Genotyping Tool (TRGT) and an accompanying TR database. TRGT determines the consensus sequences and methylation levels of specified TRs from PacBio HiFi sequencing data. It also reports reads that support each repeat allele. These reads can be subsequently visualized with a companion TR visualization tool. Assessing 937,122 TRs, TRGT showed a Mendelian concordance of 98.38%, allowing a single repeat unit difference. In six samples with known repeat expansions, TRGT detected all expansions while also identifying methylation signals and mosaicism and providing finer repeat length resolution than existing methods. Additionally, we released a database with allele sequences and methylation levels for 937,122 TRs across 100 genomes.
Collapse
Affiliation(s)
| | - Adam English
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
| | - Harriet Dashnow
- Departments of Human Genetics and Biomedical Informatics, University of Utah, Salt Lake City, UT, USA
| | | | - Tom Mokveld
- Pacific Biosciences of California, Menlo Park, CA, USA
| | | | | | | | - Matt C Danzi
- Dr. John T. Macdonald Foundation Department of Human Genetics and John P. Hussman Institute for Human Genomics, University of Miami Miller School of Medicine, Miami, FL, USA
| | - Warren A Cheung
- Genomic Medicine Center, Children's Mercy Kansas City, Kansas City, MO, USA
| | - Chengpeng Bi
- Genomic Medicine Center, Children's Mercy Kansas City, Kansas City, MO, USA
| | - Emily Farrow
- Genomic Medicine Center, Children's Mercy Kansas City, Kansas City, MO, USA
| | - Aaron Wenger
- Pacific Biosciences of California, Menlo Park, CA, USA
| | - Khi Pin Chua
- Pacific Biosciences of California, Menlo Park, CA, USA
| | - Verónica Martínez-Cerdeño
- Institute for Pediatric Regenerative Medicine, Shriner's Hospital for Children and UC Davis School of Medicine, Sacramento, CA, USA
- Department of Pathology & Laboratory Medicine, UC Davis School of Medicine, Sacramento, CA, USA
- MIND Institute, UC Davis School of Medicine, Sacramento, CA, USA
| | - Trevor D Bartley
- Institute for Pediatric Regenerative Medicine, Shriner's Hospital for Children and UC Davis School of Medicine, Sacramento, CA, USA
- Department of Pathology & Laboratory Medicine, UC Davis School of Medicine, Sacramento, CA, USA
| | - Peng Jin
- Department of Human Genetics, Emory University School of Medicine, Atlanta, GA, USA
| | - David L Nelson
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
| | - Stephan Zuchner
- Dr. John T. Macdonald Foundation Department of Human Genetics and John P. Hussman Institute for Human Genomics, University of Miami Miller School of Medicine, Miami, FL, USA
| | - Tomi Pastinen
- Genomic Medicine Center, Children's Mercy Kansas City, Kansas City, MO, USA
| | - Aaron R Quinlan
- Departments of Human Genetics and Biomedical Informatics, University of Utah, Salt Lake City, UT, USA
| | - Fritz J Sedlazeck
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
- Department of Computer Science, Rice University, Houston, TX, USA
| | | |
Collapse
|
2
|
Song M, Zhou Y, Zhao C, Song F, Hou Y. YHP: Y-chromosome Haplogroup Predictor for predicting male lineages based on Y-STRs. Forensic Sci Int 2024; 361:112113. [PMID: 38936202 DOI: 10.1016/j.forsciint.2024.112113] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2024] [Revised: 05/24/2024] [Accepted: 06/16/2024] [Indexed: 06/29/2024]
Abstract
Human Y chromosome reflects the evolutionary process of males. Male lineage tracing by Y chromosome is of great use in evolutionary, forensic, and anthropological studies. Identifying the male lineage based on the specific distribution of Y haplogroups narrows down the investigation scope, which has been used in forensic scenarios. However, existing software aids in familial searching using Y-STRs (Y-chromosome short tandem repeats) to predict Y-SNP (Y-chromosome single nucleotide polymorphism) haplogroups, they often lack resolution. In this study, we developed YHP (Y Haplogroup Predictor), a novel software offering high-resolution haplogroup inference without requiring extensive Y-SNP sequencing. Leveraging existing datasets (219 haplogroups, 4064 samples in total), YHP predicts haplogroups with 0.923 accuracy under the highest haplogroup resolution, employing a random forest algorithm. YHP, available on Github (https://github.com/cissy123/YHP-Y-Haplogroup-Predictor-), facilitates high-resolution haplogroup prediction, haplotype mismatch analysis, and haplotype similarity comparison. Notably, it demonstrates efficacy in East Asian populations, benefiting from training data from eight distinct East Asian ethnic populations. Moreover, it enables seamless integration of additional training sets, extending its utility to diverse populations.
Collapse
Affiliation(s)
- Mengyuan Song
- Department of Forensic Genetics, West China School of Basic Medical Sciences & Forensic Medicine, Sichuan University, Chengdu 610041, China; Department of Laboratory Medicine, West China Hospital, Sichuan University, Chengdu, China
| | - Yuxiang Zhou
- Department of Forensic Genetics, West China School of Basic Medical Sciences & Forensic Medicine, Sichuan University, Chengdu 610041, China
| | - Chenxi Zhao
- College of Computer Science, Sichuan University, Chengdu, China
| | - Feng Song
- Department of Forensic Genetics, West China School of Basic Medical Sciences & Forensic Medicine, Sichuan University, Chengdu 610041, China.
| | - Yiping Hou
- Department of Forensic Genetics, West China School of Basic Medical Sciences & Forensic Medicine, Sichuan University, Chengdu 610041, China.
| |
Collapse
|
3
|
Link V, Zavaleta YJA, Reyes RJ, Ding L, Wang J, Rohlfs RV, Edge MD. Microsatellites used in forensics are in regions enriched for trait-associated variants. iScience 2023; 26:107992. [PMID: 37841589 PMCID: PMC10570123 DOI: 10.1016/j.isci.2023.107992] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2023] [Revised: 08/10/2023] [Accepted: 09/18/2023] [Indexed: 10/17/2023] Open
Abstract
The 20 short tandem repeat (STR) loci of the combined DNA index system (CODIS) are the basis of the vast majority of forensic genetics in the United States. One argument for permissive rules about the collection of CODIS genotypes is that the CODIS loci are thought to contain little information about ancestry or traits. However, in the past 20 years, a growing field has identified hundreds of thousands of genotype-trait associations. Here, we conduct a survey of the landscape of such associations surrounding the CODIS loci as compared with non-CODIS STRs. Although this study cannot establish or quantify associations between CODIS genotypes and phenotypes, we find that the regions around the CODIS loci are enriched for both known pathogenic variants (> 90th percentile) and for trait-associated SNPs identified in genome-wide association studies (GWAS) (≥ 95th percentile in 10kb and 100kb flanking regions), compared with other random sets of autosomal tetranucleotide-repeat STRs.
Collapse
Affiliation(s)
- Vivian Link
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA
| | | | - Rochelle-Jan Reyes
- Department of Biology, San Francisco State University, San Francisco, CA, USA
| | - Linda Ding
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA
| | - Judy Wang
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA
| | - Rori V. Rohlfs
- Department of Biology, San Francisco State University, San Francisco, CA, USA
- Department of Data Science and Institute of Ecology and Evolution, University of Oregon, Eugene, OR, USA
| | - Michael D. Edge
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA
| |
Collapse
|
4
|
Link V, Zavaleta YJA, Reyes RJ, Ding L, Wang J, Rohlfs RV, Edge MD. Microsatellites used in forensics are located in regions unusually rich in trait-associated variants. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.03.07.531629. [PMID: 36945578 PMCID: PMC10028909 DOI: 10.1101/2023.03.07.531629] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/12/2023]
Abstract
The 20 short tandem repeat (STR) markers of the combined DNA index system (CODIS) are the basis of the vast majority of forensic genetics in the United States. One argument for permissive rules about the collection of CODIS genotypes is that the CODIS markers are thought to contain information relevant to identification only (such as a human fingerprint would), with little information about ancestry or traits. However, in the past 20 years, a quickly growing field has identified hundreds of thousands of genotype-trait associations. Here we conduct a survey of the landscape of such associations surrounding the CODIS loci as compared with non-CODIS STRs. We find that the regions around the CODIS markers are enriched for both known pathogenic variants (>90th percentile) and for SNPs identified as trait-associated in genome-wide association studies (GWAS) (≥95th percentile in 10kb and 100kb flanking regions), compared with other random sets of autosomal tetranucleotide-repeat STRs. Although it is not obvious how much phenotypic information CODIS would need to convey to strain the "DNA fingerprint" analogy, the CODIS markers, considered as a set, are in regions unusually dense with variants with known phenotypic associations.
Collapse
Affiliation(s)
- Vivian Link
- Department of Quantitative and Computational Biology, University of Southern California
| | | | | | - Linda Ding
- Department of Quantitative and Computational Biology, University of Southern California
| | - Judy Wang
- Department of Quantitative and Computational Biology, University of Southern California
| | - Rori V. Rohlfs
- Department of Biology, San Francisco State University
- Department of Computer Science and Institute of Ecology and Evolution, University of Oregon
| | - Michael D. Edge
- Department of Quantitative and Computational Biology, University of Southern California
| |
Collapse
|
5
|
Bustos BI, Billingsley K, Blauwendraat C, Gibbs JR, Gan-Or Z, Krainc D, Singleton AB, Lubbe SJ. Genome-wide contribution of common short-tandem repeats to Parkinson's disease genetic risk. Brain 2023; 146:65-74. [PMID: 36347471 DOI: 10.1093/brain/awac301] [Citation(s) in RCA: 13] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2021] [Revised: 08/01/2022] [Accepted: 08/06/2022] [Indexed: 11/11/2022] Open
Abstract
Parkinson's disease is a complex neurodegenerative disorder with a strong genetic component, for which most known disease-associated variants are single nucleotide polymorphisms (SNPs) and small insertions and deletions (indels). DNA repetitive elements account for >50% of the human genome; however, little is known of their contribution to Parkinson's disease aetiology. While select short tandem repeats (STRs) within candidate genes have been studied in Parkinson's disease, their genome-wide contribution remains unknown. Here we present the first genome-wide association study of STRs in Parkinson's disease. Through a meta-analysis of 16 imputed genome-wide association study cohorts from the International Parkinson's Disease Genomic Consortium (IPDGC), totalling 39 087 individuals (16 642 cases and 22 445 controls of European ancestry), we identified 34 genome-wide significant STR loci (P < 5.34 × 10-6), with the strongest signal located in KANSL1 [chr17:44 205 351:[T]11, P = 3 × 10-39, odds ratio = 1.31 (95% confidence interval = 1.26-1.36)]. Conditional-joint analyses suggested that four significant STRs mapping nearby NDUFAF2, TRIML2, MIRNA-129-1 and NCOR1 were independent from known risk SNPs. Including STRs in heritability estimates increased the variance explained by SNPs alone. Gene expression analysis of STRs (eSTRs) in RNA sequencing data from 13 brain regions identified significant associations of STRs influencing the expression of multiple genes, including known Parkinson's disease genes. Further functional annotation of candidate STRs revealed that significant eSTRs within NUDFAF2 and ZSWIM7 overlap with regulatory features and are associated with change in the expression levels of nearby genes. Here, we show that STRs at known and novel candidate loci contribute to Parkinson's disease risk and have functional effects in disease-relevant tissues and pathways, supporting previously reported disease-associated genes and giving further evidence for their functional prioritization. These data represent a valuable resource for researchers currently dissecting Parkinson's disease risk loci.
Collapse
Affiliation(s)
- Bernabe I Bustos
- Ken and Ruth Davee Department of Neurology and Simpson Querrey Center for Neurogenetics, Northwestern University, Feinberg School of Medicine, Chicago, IL 60611, USA
| | - Kimberley Billingsley
- Molecular Genetics Section, Laboratory of Neurogenetics, National Institute on Aging, National Institutes of Health, Bethesda, MD 20892, USA
| | - Cornelis Blauwendraat
- Molecular Genetics Section, Laboratory of Neurogenetics, National Institute on Aging, National Institutes of Health, Bethesda, MD 20892, USA
| | - J Raphael Gibbs
- Computational Biology Group, Laboratory of Neurogenetics, National Institute on Aging, National Institutes of Health, Bethesda, MD 20892, USA
| | - Ziv Gan-Or
- The Neuro (Montreal Neurological Institute-Hospital), McGill University, Montréal, QC, Canada.,Department of Human Genetics, McGill University, Montréal, QC, Canada.,Department of Neurology and neurosurgery, McGill University, Montréal, QC, Canada
| | - Dimitri Krainc
- Ken and Ruth Davee Department of Neurology and Simpson Querrey Center for Neurogenetics, Northwestern University, Feinberg School of Medicine, Chicago, IL 60611, USA
| | - Andrew B Singleton
- Molecular Genetics Section, Laboratory of Neurogenetics, National Institute on Aging, National Institutes of Health, Bethesda, MD 20892, USA
| | - Steven J Lubbe
- Ken and Ruth Davee Department of Neurology and Simpson Querrey Center for Neurogenetics, Northwestern University, Feinberg School of Medicine, Chicago, IL 60611, USA
| | | |
Collapse
|
6
|
Xiao X, Zhang CY, Zhang Z, Hu Z, Li M, Li T. Revisiting tandem repeats in psychiatric disorders from perspectives of genetics, physiology, and brain evolution. Mol Psychiatry 2022; 27:466-475. [PMID: 34650204 DOI: 10.1038/s41380-021-01329-1] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/22/2021] [Revised: 09/16/2021] [Accepted: 09/28/2021] [Indexed: 01/28/2023]
Abstract
Genome-wide association studies (GWASs) have revealed substantial genetic components comprised of single nucleotide polymorphisms (SNPs) in the heritable risk of psychiatric disorders. However, genetic risk factors not covered by GWAS also play pivotal roles in these illnesses. Tandem repeats, which are likely functional but frequently overlooked by GWAS, may account for an important proportion in the "missing heritability" of psychiatric disorders. Despite difficulties in characterizing and quantifying tandem repeats in the genome, studies have been carried out in an attempt to describe impact of tandem repeats on gene regulation and human phenotypes. In this review, we have introduced recent research progress regarding the genomic distribution and regulatory mechanisms of tandem repeats. We have also summarized the current knowledge of the genetic architecture and biological underpinnings of psychiatric disorders brought by studies of tandem repeats. These findings suggest that tandem repeats, in candidate psychiatric risk genes or in different levels of linkage disequilibrium (LD) with psychiatric GWAS SNPs and haplotypes, may modulate biological phenotypes related to psychiatric disorders (e.g., cognitive function and brain physiology) through regulating alternative splicing, promoter activity, enhancer activity and so on. In addition, many tandem repeats undergo tight natural selection in the human lineage, and likely exert crucial roles in human brain evolution. Taken together, the putative roles of tandem repeats in the pathogenesis of psychiatric disorders is strongly implicated, and using examples from previous literatures, we wish to call for further attention to tandem repeats in the post-GWAS era of psychiatric disorders.
Collapse
Affiliation(s)
- Xiao Xiao
- Key Laboratory of Animal Models and Human Disease Mechanisms of the Chinese Academy of Sciences and Yunnan Province, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, Yunnan, China
| | - Chu-Yi Zhang
- Key Laboratory of Animal Models and Human Disease Mechanisms of the Chinese Academy of Sciences and Yunnan Province, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, Yunnan, China.,Kunming College of Life Science, University of Chinese Academy of Sciences, Kunming, Yunnan, China
| | - Zhuohua Zhang
- Institute of Molecular Precision Medicine and Hunan Key Laboratory of Molecular Precision Medicine, Xiangya Hospital, Central South University, Changsha, Hunan, China.,Center for Medical Genetics and Hunan Key Laboratory of Medical Genetics, School of Life Sciences, Central South University, Changsha, Hunan, China
| | - Zhonghua Hu
- Institute of Molecular Precision Medicine and Hunan Key Laboratory of Molecular Precision Medicine, Xiangya Hospital, Central South University, Changsha, Hunan, China. .,Center for Medical Genetics and Hunan Key Laboratory of Medical Genetics, School of Life Sciences, Central South University, Changsha, Hunan, China. .,Department of Critical Care Medicine, Xiangya Hospital, Central South University, Changsha, Hunan, China. .,National Clinical Research Center for Geriatric Disorders, Xiangya Hospital, Central South University, Changsha, Hunan, China. .,Hunan Key Laboratory of Animal Models for Human Diseases, School of Life Sciences, Central South University, Changsha, Hunan, China. .,Eye Center of Xiangya Hospital and Hunan Key Laboratory of Ophthalmology, Central South University, Changsha, Hunan, China. .,National Clinical Research Center on Mental Disorders, Changsha, Hunan, China.
| | - Ming Li
- Key Laboratory of Animal Models and Human Disease Mechanisms of the Chinese Academy of Sciences and Yunnan Province, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, Yunnan, China. .,CAS Center for Excellence in Brain Science and Intelligence Technology, Chinese Academy of Sciences, Shanghai, China. .,KIZ-CUHK Joint Laboratory of Bioresources and Molecular Research in Common Diseases, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, Yunnan, China.
| | - Tao Li
- Affiliated Mental Health Center & Hangzhou Seventh People's Hospital, Zhejiang University School of Medicine, Hangzhou, Zhejiang, China. .,Guangdong-Hong Kong-Macao Greater Bay Area Center for Brain Science and Brain-Inspired Intelligence, Guangzhou, China.
| |
Collapse
|
7
|
Further insight into the global variability of the OCA2-HERC2 locus for human pigmentation from multiallelic markers. Sci Rep 2021; 11:22530. [PMID: 34795370 PMCID: PMC8602267 DOI: 10.1038/s41598-021-01940-w] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2021] [Accepted: 11/02/2021] [Indexed: 11/20/2022] Open
Abstract
The OCA2-HERC2 locus is responsible for the greatest proportion of eye color variation in humans. Numerous studies extensively described both functional SNPs and associated patterns of variation over this region. The goal of our study is to examine how these haplotype structures and allelic associations vary when highly variable markers such as microsatellites are used. Eleven microsatellites spanning 357 Kb of OCA2-HERC2 genes are analyzed in 3029 individuals from worldwide populations. We found that several markers display large differences in allele frequency (10% to 35% difference) among Europeans, East Asians and Africans. In Europe, the alleles showing increased frequency can also discriminate individuals with (IrisPlex) predicted blue and brown eyes. Distinct haplotypes are identified around the variants C and T of the functional SNP rs12913832 (associated to blue eyes), with linkage disequilibrium r2 values significant up to 237 Kb. The haplotype carrying the allele rs12913832 C has high frequency (76%) in blue eye predicted individuals (30% in brown eye predicted individuals), while the haplotype associated to the allele rs12913832 T is restricted to brown eye predicted individuals. Finally, homozygosity values reach levels of 91% near rs12913832. Odds ratios show values of 4.2, 7.4 and 10.4 for four markers around rs12913832 and 7.1 for their core haplotype. Hence, this study provides an example on the informativeness of multiallelic markers that, despite their current limited potential contribution to forensic eye color prediction, supports the use of microsatellites for identifying causing variants showing similar genetic features and history.
Collapse
|
8
|
Li R, Budowle B, Sun H, Ge J. Linkage and linkage disequilibrium among the markers in the forensic MPS panels. J Forensic Sci 2021; 66:1637-1646. [PMID: 33885147 DOI: 10.1111/1556-4029.14724] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2021] [Revised: 03/11/2021] [Accepted: 03/22/2021] [Indexed: 11/28/2022]
Abstract
For the past two to three decades, forensic DNA evidence has been analyzed with a limited number of short tandem repeats (STRs), and these STRs are usually assumed to be independent for statistical calculations. With the development and implementation of the MPS technologies, more autosomal markers, both single nucleotide polymorphisms (SNPs) and STRs, can be analyzed. A number of these markers are physically very close to each other, and it may not be appropriate to assume all these markers are genetically unlinked or in linkage equilibrium. In this study, publicly accessible genomic data from five representative populations were used to evaluate the genetic linkage and linkage disequilibrium (LD) between autosomal markers represented in six major commercial panels (in total, 362 markers). Among the 3041 syntenic marker pairs, 1524 pairs had sex-average genetic distances <50 cM, and thus, these marker pairs can be considered as genetically linked. Among the 143 marker pairs with physical distances <1 Mb, 19 LD haplotype blocks (comprising 39 SNPs in total) were detected for at least one of the tested populations. Statistical methods for interpreting linked markers and/or markers in LD were suggested for various case scenarios.
Collapse
Affiliation(s)
- Ran Li
- Faculty of Forensic Medicine, Zhongshan School of Medicine, Sun Yat-Sen University, Guangzhou, China.,Guangdong Province Translational Forensic Medicine Engineering Technology Research Center, Zhongshan School of Medicine, Sun Yat-Sen University, Guangzhou, China
| | - Bruce Budowle
- Center for Human Identification, University of North Texas Health Science Center, Fort Worth, TX, USA.,Department of Microbiology, Immunology, and Genetics, University of North Texas Health Science Center, Fort Worth, TX, USA
| | - Hongyu Sun
- Faculty of Forensic Medicine, Zhongshan School of Medicine, Sun Yat-Sen University, Guangzhou, China.,Guangdong Province Translational Forensic Medicine Engineering Technology Research Center, Zhongshan School of Medicine, Sun Yat-Sen University, Guangzhou, China
| | - Jianye Ge
- Center for Human Identification, University of North Texas Health Science Center, Fort Worth, TX, USA.,Department of Microbiology, Immunology, and Genetics, University of North Texas Health Science Center, Fort Worth, TX, USA
| |
Collapse
|
9
|
Zhang C, Xiao X, Li T, Li M. Translational genomics and beyond in bipolar disorder. Mol Psychiatry 2021; 26:186-202. [PMID: 32424235 DOI: 10.1038/s41380-020-0782-9] [Citation(s) in RCA: 25] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/03/2019] [Revised: 05/05/2020] [Accepted: 05/07/2020] [Indexed: 02/08/2023]
Abstract
Genome-wide association studies (GWAS) have revealed multiple genomic loci conferring risk of bipolar disorder (BD), providing hints for its underlying pathobiology. However, there are still remaining questions to answer. For example, discordance exists between BD heritability estimated with earlier epidemiological evidence and that calculated based on common GWAS variations. Where is the "missing heritability"? How can we explain the biology of the disease based on genetic findings? In this review, we summarize the accomplishments and limitations of current BD GWAS, and discuss potential reasons for the "missing heritability." In addition, progresses of research for the biological mechanisms underlying BD genetic risk using brain tissues, reprogrammed cells, and model animals are reviewed. While our knowledge of BD genetic basis is significantly promoted by these efforts, the complexities of gene regulation in the genome, the spatial-temporal heterogeneity during brain development, and the limitations of different experimental models should always be considered. Notably, several genes have been widely studied given their relatively well-characterized involvement in BD (e.g., CACAN1C and ANK3), and findings of these genes are summarized to both outline possible biological mechanisms of BD and describe examples of translating GWAS discoveries into the pathophysiology.
Collapse
Affiliation(s)
- Chen Zhang
- Division of Mood Disorders, Shanghai Mental Health Center, Shanghai Jiao Tong University School of Medicine, Shanghai, China.,Shanghai Key Laboratory of Psychotic Disorders, Shanghai Mental Health Center, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Xiao Xiao
- Key Laboratory of Animal Models and Human Disease Mechanisms of the Chinese Academy of Sciences and Yunnan Province, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, Yunnan, China
| | - Tao Li
- Mental Health Center and Psychiatric Laboratory, State Key Laboratory of Biotherapy, West China Hospital of Sichuan University, Chengdu, Sichuan, China. .,West China Brain Research Center, West China Hospital of Sichuan University, Chengdu, Sichuan, China.
| | - Ming Li
- Key Laboratory of Animal Models and Human Disease Mechanisms of the Chinese Academy of Sciences and Yunnan Province, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, Yunnan, China.
| |
Collapse
|
10
|
Kang JTL, Rosenberg NA. Mathematical Properties of Linkage Disequilibrium Statistics Defined by Normalization of the Coefficient D = pAB - pApB. Hum Hered 2020; 84:127-143. [PMID: 32045910 DOI: 10.1159/000504171] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2019] [Accepted: 10/10/2019] [Indexed: 11/19/2022] Open
Abstract
BACKGROUND Many statistics for measuring linkage disequilibrium (LD) take the form of a normalization of the LD coefficient D. Different normalizations produce statistics with different ranges, interpretations, and arguments favoring their use. METHODS Here, to compare the mathematical properties of these normalizations, we consider 5 of these normalized statistics, describing their upper bounds, the mean values of their maxima over the set of possible allele frequency pairs, and the size of the allele frequency regions accessible given specified values of the statistics. RESULTS We produce detailed characterizations of these properties for the statistics d and ρ, analogous to computations previously performed for r2. We examine the relationships among the statistics, uncovering conditions under which some of them have close connections. CONCLUSION The results contribute insight into LD measurement, particularly the understanding of differences in the features of different LD measures when computed on the same data.
Collapse
Affiliation(s)
- Jonathan T L Kang
- Department of Biology, Stanford University, Stanford, California, USA,
| | - Noah A Rosenberg
- Department of Biology, Stanford University, Stanford, California, USA
| |
Collapse
|
11
|
Lyra DH, Galli G, Alves FC, Granato ÍSC, Vidotti MS, Bandeira E Sousa M, Morosini JS, Crossa J, Fritsche-Neto R. Modeling copy number variation in the genomic prediction of maize hybrids. TAG. THEORETICAL AND APPLIED GENETICS. THEORETISCHE UND ANGEWANDTE GENETIK 2019; 132:273-288. [PMID: 30382311 DOI: 10.1007/s00122-018-3215-2] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/04/2018] [Accepted: 10/20/2018] [Indexed: 06/08/2023]
Abstract
Our study indicates that copy variants may play an essential role in the phenotypic variation of complex traits in maize hybrids. Moreover, predicting hybrid phenotypes by combining additive-dominance effects with copy variants has the potential to be a viable predictive model. Non-additive effects resulting from the actions of multiple loci may influence trait variation in single-cross hybrids. In addition, complementation of allelic variation could be a valuable contributor to hybrid genetic variation, especially when crossing inbred lines with higher contents of copy gains. With this in mind, we aimed (1) to study the association between copy number variation (CNV) and hybrid phenotype, and (2) to compare the predictive ability (PA) of additive and additive-dominance genomic best linear unbiased prediction model when combined with the effects of CNV in two datasets of maize hybrids (USP and HELIX). In the USP dataset, we observed a significant negative phenotypic correlation of low magnitude between copy number loss and plant height, revealing a tendency that more copy losses lead to lower plants. In the same set, when CNV was combined with the additive plus dominance effects, the PA significantly increased only for plant height under low nitrogen. In this case, CNV effects explicitly capture relatedness between individuals and add extra information to the model. In the HELIX dataset, we observed a pronounced difference in PA between additive (0.50) and additive-dominance (0.71) models for predicting grain yield, suggesting a significant contribution of dominance. We conclude that copy variants may play an essential role in the phenotypic variation of complex traits in maize hybrids, although the inclusion of CNVs into datasets does not return significant gains concerning PA.
Collapse
Affiliation(s)
- Danilo Hottis Lyra
- Department of Genetics, Luiz de Queiroz College of Agriculture, University of São Paulo (ESALQ/USP), Piracicaba, São Paulo, Brazil.
- Department of Computational and Analytical Sciences, Rothamsted Research, West Common, Harpenden, AL52JQ, UK.
| | - Giovanni Galli
- Department of Genetics, Luiz de Queiroz College of Agriculture, University of São Paulo (ESALQ/USP), Piracicaba, São Paulo, Brazil
| | - Filipe Couto Alves
- Department of Genetics, Luiz de Queiroz College of Agriculture, University of São Paulo (ESALQ/USP), Piracicaba, São Paulo, Brazil
| | - Ítalo Stefanine Correia Granato
- Department of Genetics, Luiz de Queiroz College of Agriculture, University of São Paulo (ESALQ/USP), Piracicaba, São Paulo, Brazil
| | - Miriam Suzane Vidotti
- Department of Genetics, Luiz de Queiroz College of Agriculture, University of São Paulo (ESALQ/USP), Piracicaba, São Paulo, Brazil
| | - Massaine Bandeira E Sousa
- Department of Genetics, Luiz de Queiroz College of Agriculture, University of São Paulo (ESALQ/USP), Piracicaba, São Paulo, Brazil
| | - Júlia Silva Morosini
- Department of Genetics, Luiz de Queiroz College of Agriculture, University of São Paulo (ESALQ/USP), Piracicaba, São Paulo, Brazil
| | - José Crossa
- Biometrics and Statistics Unit, International Maize and Wheat Improvement Center (CIMMYT), 06600, Texcoco, D.F, Mexico
| | - Roberto Fritsche-Neto
- Department of Genetics, Luiz de Queiroz College of Agriculture, University of São Paulo (ESALQ/USP), Piracicaba, São Paulo, Brazil
| |
Collapse
|
12
|
Saini S, Mitra I, Mousavi N, Fotsing SF, Gymrek M. A reference haplotype panel for genome-wide imputation of short tandem repeats. Nat Commun 2018; 9:4397. [PMID: 30353011 PMCID: PMC6199332 DOI: 10.1038/s41467-018-06694-0] [Citation(s) in RCA: 47] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2018] [Accepted: 09/18/2018] [Indexed: 12/14/2022] Open
Abstract
Short tandem repeats (STRs) are involved in dozens of Mendelian disorders and have been implicated in complex traits. However, genotyping arrays used in genome-wide association studies focus on single nucleotide polymorphisms (SNPs) and do not readily allow identification of STR associations. We leverage next-generation sequencing (NGS) from 479 families to create a SNP + STR reference haplotype panel. Our panel enables imputing STR genotypes into SNP array data when NGS is not available for directly genotyping STRs. Imputed genotypes achieve mean concordance of 97% with observed genotypes in an external dataset compared to 71% expected under a naive model. Performance varies widely across STRs, with near perfect concordance at bi-allelic STRs vs. 70% at highly polymorphic repeats. Imputation increases power over individual SNPs to detect STR associations with gene expression. Imputing STRs into existing SNP datasets will enable the first large-scale STR association studies across a range of complex traits.
Collapse
Affiliation(s)
- Shubham Saini
- Department of Computer Science and Engineering, University of California San Diego, 9500 Gilman Drive, La Jolla, CA, 92093, USA
| | - Ileena Mitra
- Bioinformatics and Systems Biology Program, University of California San Diego, 9500 Gilman Drive, La Jolla, CA, 92093, USA
| | - Nima Mousavi
- Department of Electrical and Computer Engineering, University of California, San Diego, 9500 Gilman Drive, La Jolla, CA, 92093, USA
| | - Stephanie Feupe Fotsing
- Bioinformatics and Systems Biology Program, University of California San Diego, 9500 Gilman Drive, La Jolla, CA, 92093, USA
- Department of Biomedical Informatics, University of California San Diego, 9500 Gilman Drive, La Jolla, CA, 92093, USA
| | - Melissa Gymrek
- Department of Computer Science and Engineering, University of California San Diego, 9500 Gilman Drive, La Jolla, CA, 92093, USA.
- Department of Medicine, University of California, San Diego, 9500 Gilman Drive, La Jolla, CA, 92093, USA.
| |
Collapse
|
13
|
Press MO, McCoy RC, Hall AN, Akey JM, Queitsch C. Massive variation of short tandem repeats with functional consequences across strains of Arabidopsis thaliana. Genome Res 2018; 28:1169-1178. [PMID: 29970452 PMCID: PMC6071631 DOI: 10.1101/gr.231753.117] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2017] [Accepted: 06/26/2018] [Indexed: 11/24/2022]
Abstract
Short tandem repeat (STR) mutations may comprise more than half of the mutations in eukaryotic coding DNA, yet STR variation is rarely examined as a contributor to complex traits. We assessed this contribution across a collection of 96 strains of Arabidopsis thaliana, genotyping 2046 STR loci each, using highly parallel STR sequencing with molecular inversion probes. We found that 95% of examined STRs are polymorphic, with a median of six alleles per STR across these strains. STR expansions (large copy number increases) are found in most strains, several of which have evident functional effects. These include three of six intronic STR expansions we found to be associated with intron retention. Coding STRs were depleted of variation relative to noncoding STRs, and we detected a total of 56 coding STRs (11%) showing low variation consistent with the action of purifying selection. In contrast, some STRs show hypervariable patterns consistent with diversifying selection. Finally, we detected 133 novel STR-phenotype associations under stringent criteria, most of which could not be detected with SNPs alone, and validated some with follow-up experiments. Our results support the conclusion that STRs constitute a large, unascertained reservoir of functionally relevant genomic variation.
Collapse
Affiliation(s)
- Maximilian O Press
- Department of Genome Sciences, University of Washington, Seattle, Washington 98195, USA
| | - Rajiv C McCoy
- Department of Genome Sciences, University of Washington, Seattle, Washington 98195, USA
| | - Ashley N Hall
- Department of Genome Sciences, University of Washington, Seattle, Washington 98195, USA.,Molecular and Cellular Biology Program, University of Washington, Seattle, Washington 98195, USA
| | - Joshua M Akey
- Department of Genome Sciences, University of Washington, Seattle, Washington 98195, USA
| | - Christine Queitsch
- Department of Genome Sciences, University of Washington, Seattle, Washington 98195, USA
| |
Collapse
|
14
|
Linkage disequilibrium matches forensic genetic records to disjoint genomic marker sets. Proc Natl Acad Sci U S A 2017; 114:5671-5676. [PMID: 28507140 PMCID: PMC5465933 DOI: 10.1073/pnas.1619944114] [Citation(s) in RCA: 28] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/02/2023] Open
Abstract
Combining genotypes across datasets is central in facilitating advances in genetics. Data aggregation efforts often face the challenge of record matching-the identification of dataset entries that represent the same individual. We show that records can be matched across genotype datasets that have no shared markers based on linkage disequilibrium between loci appearing in different datasets. Using two datasets for the same 872 people-one with 642,563 genome-wide SNPs and the other with 13 short tandem repeats (STRs) used in forensic applications-we find that 90-98% of forensic STR records can be connected to corresponding SNP records and vice versa. Accuracy increases to 99-100% when ∼30 STRs are used. Our method expands the potential of data aggregation, but it also suggests privacy risks intrinsic in maintenance of databases containing even small numbers of markers-including databases of forensic significance.
Collapse
|
15
|
Zhang Z, Zheng Y, Zhang X, Liu C, Joyce BT, Kibbe WA, Hou L, Zhang W. Linking short tandem repeat polymorphisms with cytosine modifications in human lymphoblastoid cell lines. Hum Genet 2016; 135:223-32. [PMID: 26714498 PMCID: PMC4715638 DOI: 10.1007/s00439-015-1628-4] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2015] [Accepted: 12/17/2015] [Indexed: 01/26/2023]
Abstract
Inter-individual variation in cytosine modifications has been linked to complex traits in humans. Cytosine modification variation is partially controlled by single nucleotide polymorphisms (SNPs), known as modified cytosine quantitative trait loci (mQTL). However, little is known about the role of short tandem repeat polymorphisms (STRPs), a class of structural genetic variants, in regulating cytosine modifications. Utilizing the published data on the International HapMap Project lymphoblastoid cell lines (LCLs), we assessed the relationships between 721 STRPs and the modification levels of 283,540 autosomal CpG sites. Our findings suggest that, in contrast to the predominant cis-acting mode for SNP-based mQTL, STRPs are associated with cytosine modification levels in both cis-acting (local) and trans-acting (distant) modes. In local scans within the ±1 Mb windows of target CpGs, 21, 9, and 21 cis-acting STRP-based mQTL were detected in CEU (Caucasian residents from Utah, USA), YRI (Yoruba people from Ibadan, Nigeria), and the combined samples, respectively. In contrast, 139,420, 76,817, and 121,866 trans-acting STRP-based mQTL were identified in CEU, YRI, and the combined samples, respectively. A substantial proportion of CpG sites detected with local STRP-based mQTL were not associated with SNP-based mQTL, suggesting that STRPs represent an independent class of mQTL. Functionally, genetic variants neighboring CpG-associated STRPs are enriched with genome-wide association study (GWAS) loci for a variety of complex traits and diseases, including cancers, based on the National Human Genome Research Institute (NHGRI) GWAS Catalog. Therefore, elucidating these STRP-based mQTL in addition to SNP-based mQTL can provide novel insights into the genetic architectures of complex traits.
Collapse
Affiliation(s)
- Zhou Zhang
- Driskill Graduate Program in Life Sciences, Northwestern University Feinberg School of Medicine, Chicago, IL, 60611, USA
- Department of Preventive Medicine, Northwestern University Feinberg School of Medicine, 680 N. Lake Shore Dr., Suite 1400, Chicago, IL, 60611, USA
| | - Yinan Zheng
- Department of Preventive Medicine, Northwestern University Feinberg School of Medicine, 680 N. Lake Shore Dr., Suite 1400, Chicago, IL, 60611, USA
- Institute for Public Health and Medicine, Feinberg School of Medicine, Northwestern University, Chicago, IL, 60611, USA
| | - Xu Zhang
- Section of Hematology/Oncology, Department of Medicine, University of Illinois at Chicago, Chicago, IL, 60612, USA
| | - Cong Liu
- Department of Bioengineering, University of Illinois at Chicago, Chicago, IL, 60612, USA
| | - Brian Thomas Joyce
- Department of Preventive Medicine, Northwestern University Feinberg School of Medicine, 680 N. Lake Shore Dr., Suite 1400, Chicago, IL, 60611, USA
- Division of Epidemiology and Biostatistics, School of Public Health, University of Illinois at Chicago, Chicago, IL, 60612, USA
| | - Warren A Kibbe
- Center for Biomedical Informatics and Information Technology, National Cancer Institute, Rockville, MD, 20850, USA
| | - Lifang Hou
- Department of Preventive Medicine, Northwestern University Feinberg School of Medicine, 680 N. Lake Shore Dr., Suite 1400, Chicago, IL, 60611, USA
- The Robert H. Lurie Comprehensive Cancer Center, Northwestern University Feinberg School of Medicine, Chicago, IL, 60611, USA
| | - Wei Zhang
- Department of Preventive Medicine, Northwestern University Feinberg School of Medicine, 680 N. Lake Shore Dr., Suite 1400, Chicago, IL, 60611, USA.
- The Robert H. Lurie Comprehensive Cancer Center, Northwestern University Feinberg School of Medicine, Chicago, IL, 60611, USA.
- Center for Genetic Medicine, Northwestern University Feinberg School of Medicine, Chicago, IL, 60611, USA.
| |
Collapse
|
16
|
Abundant contribution of short tandem repeats to gene expression variation in humans. Nat Genet 2015; 48:22-9. [PMID: 26642241 DOI: 10.1038/ng.3461] [Citation(s) in RCA: 238] [Impact Index Per Article: 26.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2015] [Accepted: 11/12/2015] [Indexed: 12/16/2022]
Abstract
The contribution of repetitive elements to quantitative human traits is largely unknown. Here we report a genome-wide survey of the contribution of short tandem repeats (STRs), which constitute one of the most polymorphic and abundant repeat classes, to gene expression in humans. Our survey identified 2,060 significant expression STRs (eSTRs). These eSTRs were replicable in orthogonal populations and expression assays. We used variance partitioning to disentangle the contribution of eSTRs from that of linked SNPs and indels and found that eSTRs contribute 10-15% of the cis heritability mediated by all common variants. Further functional genomic analyses showed that eSTRs are enriched in conserved regions, colocalize with regulatory elements and may modulate certain histone modifications. By analyzing known genome-wide association study (GWAS) signals and searching for new associations in 1,685 whole genomes from deeply phenotyped individuals, we found that eSTRs are enriched in various clinically relevant conditions. These results highlight the contribution of STRs to the genetic architecture of quantitative human traits.
Collapse
|
17
|
Willems T, Gymrek M, Highnam G, Mittelman D, Erlich Y. The landscape of human STR variation. Genome Res 2014; 24:1894-904. [PMID: 25135957 PMCID: PMC4216929 DOI: 10.1101/gr.177774.114] [Citation(s) in RCA: 176] [Impact Index Per Article: 17.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2014] [Accepted: 08/15/2014] [Indexed: 02/06/2023]
Abstract
Short tandem repeats are among the most polymorphic loci in the human genome. These loci play a role in the etiology of a range of genetic diseases and have been frequently utilized in forensics, population genetics, and genetic genealogy. Despite this plethora of applications, little is known about the variation of most STRs in the human population. Here, we report the largest-scale analysis of human STR variation to date. We collected information for nearly 700,000 STR loci across more than 1000 individuals in Phase 1 of the 1000 Genomes Project. Extensive quality controls show that reliable allelic spectra can be obtained for close to 90% of the STR loci in the genome. We utilize this call set to analyze determinants of STR variation, assess the human reference genome's representation of STR alleles, find STR loci with common loss-of-function alleles, and obtain initial estimates of the linkage disequilibrium between STRs and common SNPs. Overall, these analyses further elucidate the scale of genetic variation beyond classical point mutations.
Collapse
Affiliation(s)
- Thomas Willems
- Whitehead Institute for Biomedical Research, Cambridge, Massachusetts 02142, USA; Computational and Systems Biology Program, MIT, Cambridge, Massachusetts 02139, USA
| | - Melissa Gymrek
- Whitehead Institute for Biomedical Research, Cambridge, Massachusetts 02142, USA; Harvard-MIT Division of Health Sciences and Technology, MIT, Cambridge, Massachusetts 02139, USA; Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA; Department of Molecular Biology and Diabetes Unit, Massachusetts General Hospital, Boston, Massachusetts 02114, USA
| | - Gareth Highnam
- Virginia Bioinformatics Institute and Department of Biological Sciences, Virginia Tech, Blacksburg, Virginia 24061, USA
| | - David Mittelman
- Virginia Bioinformatics Institute and Department of Biological Sciences, Virginia Tech, Blacksburg, Virginia 24061, USA; Gene by Gene, Ltd., Houston, Texas 77008, USA
| | - Yaniv Erlich
- Whitehead Institute for Biomedical Research, Cambridge, Massachusetts 02142, USA;
| |
Collapse
|
18
|
Davis RC, van Nas A, Bennett B, Orozco L, Pan C, Rau CD, Eskin E, Lusis AJ. Genome-wide association mapping of blood cell traits in mice. Mamm Genome 2013; 24:105-18. [PMID: 23417284 DOI: 10.1007/s00335-013-9448-0] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2012] [Accepted: 01/11/2013] [Indexed: 12/13/2022]
Abstract
Genetic variations in blood cell parameters can impact clinical traits. We report here the mapping of blood cell traits in a panel of 100 inbred strains of mice of the Hybrid Mouse Diversity Panel (HMDP) using genome-wide association (GWA). We replicated a locus previously identified in using linkage analysis in several genetic crosses for mean corpuscular volume (MCV) and a number of other red blood cell traits on distal chromosome 7. Our peak for SNP association to MCV occurred in a linkage disequilibrium (LD) block spanning from 109.38 to 111.75 Mb that includes Hbb-b1, the likely causal gene. Altogether, we identified five loci controlling red blood cell traits (on chromosomes 1, 7, 11, 12, and 16), and four of these correspond to loci for red blood cell traits reported in a recent human GWA study. For white blood cells, including granulocytes, monocytes, and lymphocytes, a total of six significant loci were identified on chromosomes 1, 6, 8, 11, 12, and 15. An average of ten candidate genes were found at each locus and those were prioritized by examining functional variants in the HMDP such as missense and expression variants. These results provide intermediate phenotypes and candidate loci for genetic studies of atherosclerosis and cancer as well as inflammatory and immune disorders in mice.
Collapse
Affiliation(s)
- Richard C Davis
- Department of Medicine/Division of Cardiology, David Geffen School of Medicine, University of California, Los Angeles, CA, USA
| | | | | | | | | | | | | | | |
Collapse
|
19
|
Karjalainen MK, Huusko JM, Ulvila J, Sotkasiira J, Luukkonen A, Teramo K, Plunkett J, Anttila V, Palotie A, Haataja R, Muglia LJ, Hallman M. A potential novel spontaneous preterm birth gene, AR, identified by linkage and association analysis of X chromosomal markers. PLoS One 2012; 7:e51378. [PMID: 23227263 PMCID: PMC3515491 DOI: 10.1371/journal.pone.0051378] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2012] [Accepted: 11/07/2012] [Indexed: 11/20/2022] Open
Abstract
Preterm birth is the major cause of neonatal mortality and morbidity. In many cases, it has severe life-long consequences for the health and neurological development of the newborn child. More than 50% of all preterm births are spontaneous, and currently there is no effective prevention. Several studies suggest that genetic factors play a role in spontaneous preterm birth (SPTB). However, its genetic background is insufficiently characterized. The aim of the present study was to perform a linkage analysis of X chromosomal markers in SPTB in large northern Finnish families with recurrent SPTBs. We found a significant linkage signal (HLOD = 3.72) on chromosome locus Xq13.1 when the studied phenotype was being born preterm. There were no significant linkage signals when the studied phenotype was giving preterm deliveries. Two functional candidate genes, those encoding the androgen receptor (AR) and the interleukin-2 receptor gamma subunit (IL2RG), located near this locus were analyzed as candidates for SPTB in subsequent case-control association analyses. Nine single-nucleotide polymorphisms (SNPs) within these genes and an AR exon-1 CAG repeat, which was previously demonstrated to be functionally significant, were analyzed in mothers with preterm delivery (n = 272) and their offspring (n = 269), and in mothers with exclusively term deliveries (n = 201) and their offspring (n = 199), all originating from northern Finland. A replication study population consisting of individuals born preterm (n = 111) and term (n = 197) from southern Finland was also analyzed. Long AR CAG repeats (≥26) were overrepresented and short repeats (≤19) underrepresented in individuals born preterm compared to those born at term. Thus, our linkage and association results emphasize the role of the fetal genome in genetic predisposition to SPTB and implicate AR as a potential novel fetal susceptibility gene for SPTB.
Collapse
Affiliation(s)
- Minna K Karjalainen
- Department of Pediatrics, Institute of Clinical Medicine, University of Oulu, Oulu, Finland
| | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
20
|
Tiercy JM. Immunogenetics of hematopoietic stem cell transplantation: the contribution of microsatellite polymorphism studies. Int J Immunogenet 2011; 38:365-72. [PMID: 21816003 DOI: 10.1111/j.1744-313x.2011.01026.x] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Polymorphisms of short tandem repeats of <10 nucleotides, or microsatellites (Msat), are largely used for post-transplant chimerism analyses in clinical hematopoietic stem cell transplantation (HSCT). Compared to single nucleotide polymorphisms (SNP), they have the advantage of a higher degree of allelic polymorphism and thus a potentially larger degree of informativity. Msat markers contribute to approximately 3% of the human genome and have been highly informative in disease association studies, population genetics, forensic medicine and organ and HSC transplantation. They allowed to expand our knowledge of the haplotypic structure of the HLA complex, including the noncoding sequences in the MHC, and to reach a better characterization of immunological phenotypes. Among the different immunogenetic studies in HSCT patients reviewed here, four Msat loci linked to cytokine genes have been analysed by a number of laboratories as potential candidates markers for HSCT outcome: IFNG, TNFd, IL-10(-1064) and IL-1RN. The low patient numbers and high diversity of clinical parameters account for some heterogeneity of the results. Among the trends starting to emerge from these studies, specific TNFd Msat alleles seem to be associated with acute graft-versus-host disease and mortality. Patient/donor Msat incompatibilities have also been used as surrogate markers to map biologically relevant polymorphisms, with a main focus on MHC-resident genetic variation. High throughput SNP typing and next-generation sequencing technologies will allow acquisition of large-scale genomic data and should allow refined analyses of clinically relevant genotypes in the transplantation settting, although the heterogeneity of the study cohorts will remain an issue. The analysis of Msat polymorphisms may still have a place in functional studies on the impact of Msat diversity in the control of immune response gene expression.
Collapse
Affiliation(s)
- J-M Tiercy
- National Reference Laboratory for Histocompatibility, Department of Internal Medicine, University Hospital Geneva, Geneva, Switzerland.
| |
Collapse
|
21
|
Including copy number variation in association studies to predict genotypic values. Genet Res (Camb) 2010; 92:115-25. [PMID: 20515515 DOI: 10.1017/s0016672310000091] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023] Open
Abstract
The objective of this study was to investigate, both empirically and deterministically, the ability to explain genetic variation resulting from a copy number polymorphism (CNP) by including the CNP, either by its genotype or by a continuous derivation thereof, alone or together with a nearby single nucleotide polymorphism (SNP) in the model. This continuous measure of a CNP genotype could be a raw hybridization measurement, or a predicted CNP genotype. Results from simulations showed that the linkage disequilibrium (LD) between an SNP and CNP was lower than LD between two SNPs, due to the higher mutation rate at the CNP loci. The model R(2) values from analysing the simulated data were very similar to the R(2) values predicted with the deterministic formulae. Under the assumption that x copies at a CNP locus lead to the effect of x times the effect of 1 copy, including a continuous measure of a CNP locus in the model together with the genotype of a nearby SNP increased power to explain variation at the CNP locus, even when the continuous measure explained only 15% of the variation at the CNP locus.
Collapse
|
22
|
Medina-Acosta E. Interlocus non-random association of multiallelic polymorphisms spanning the coagulation factor VIII gene on human chromosome distalmost Xq28. Haemophilia 2010; 16:525-37. [PMID: 20050928 DOI: 10.1111/j.1365-2516.2009.02161.x] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
The most common severe hereditary bleeding disorder phenotype in humans, the coagulation factor VIII (F8) deficiency haemophilia A (HEMA), maps on Xq28 band, a region that comprises 11.7% of genes and 14.2% of phenotypes on X chromosome. Information about the distribution and extent of gametic disequilibrium (GD) covering the F8 gene is scarce, despite its relevance for linkage and association studies. The aim of this study was to determine the patterns, by frequency and strength, of non-random multiallelic interallelic associations between two-locus combinations of seven microsatellite loci (REN90833, F8Int25.2, F8Int22, F8Int13.2, HEMA154311.3, TMLHEInt5 and HEMA154507.3, in that physical order) spanning 0.813 Mb on distalmost Xq28. We measured sign-based interallelic D' coefficients in 106 men and in 100 women drawn from a single unrelated Brazilian population. Significance and patterns of GD using haploid and phased diploid sample probabilities were close to conformity. Only 9.18% of the variance of D' could be accounted for by changes in length, indicating that GD is not a monotonically decreasing function of length. We defined two regions of overlapping long-range GD extending 698 735 base pairs (bp) (REN90833/TMLHEInt5 block) and 689 900 bp (F8Int13.2/HEMA154507.3 block) The extent of GD overlap is 575 637 bp (F8Int13.2/TMLHEInt5 interstice). Extended haplotype homozygosity analysis centred at the F8 intronic loci revealed that the most frequent core haplotypes decay the least in the flanking GD. The F8 intronic loci attend distinct non-random association forces; F8Int13.2 serves at maintenance of the long-range overlapping pattern of GD, whereas F8Int25.2 and F8Int22 serve at lessening it in force or effect.
Collapse
Affiliation(s)
- E Medina-Acosta
- Núcleo de Diagnóstico e Investigação Molecular, Universidade Estadual do Norte Fluminense Darcy Ribeiro, Campos dos Goytacazes, Rio de Janeiro, Brazil.
| |
Collapse
|
23
|
Xiong S, Hao Y, Rao S, Huang W, Hu B, Labu, Pubuzhuoma, Gesangzhuogab, Wang Y. Effects of cutoff thresholds for minor allele frequencies on HapMap resolution: A real dataset-based evaluation of the Chinese Han and Tibetan populations. Sci Bull (Beijing) 2009. [DOI: 10.1007/s11434-009-0302-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
|
24
|
Wong AK, Neff MW. DOGSET: pre-designed primer sets for fine-scale mapping and DNA sequence interrogation in the dog. Anim Genet 2009; 40:569-71. [PMID: 19392818 DOI: 10.1111/j.1365-2052.2009.01875.x] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
DOGSET is an online resource that provides access to primer sequences that have been computationally mined from the reference genome using heuristic algorithms. The electronic repository includes PCR primers corresponding to 32,135 markers for genetic mapping and 334,657 sequence-tagged gene elements for targeted re-sequencing and mutation discovery. A customized report that tailors primer design to wet bench protocols can be exported for a region of interest by specifying genome coordinates in a graphical user interface.
Collapse
Affiliation(s)
- A K Wong
- Veterinary Genetics Laboratory, School of Veterinary Medicine, University of California, Davis, CA 95616, USA.
| | | |
Collapse
|
25
|
Payseur BA, Jing P. A genomewide comparison of population structure at STRPs and nearby SNPs in humans. Mol Biol Evol 2009; 26:1369-77. [PMID: 19289600 DOI: 10.1093/molbev/msp052] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022] Open
Abstract
Patterns of population structure provide insights into evolutionary processes and help identify groups of individuals for genotype-phenotype association studies. With increasing availability of polymorphic molecular markers across genomes, the examination of population structure using large numbers of unlinked loci has become a common practice in evolutionary biology and human genetics. The two classes of molecular variation most widely used for this purpose, short tandem repeat polymorphisms (STRPs) and single-nucleotide polymorphisms (SNPs), differ in mutational properties expected to affect population structure. To measure the relative ability of these loci to describe population structure, we compared diversity at neighboring STRPs and SNPs from 720 genomic regions in the four populations that comprise the Human HapMap. Comparing loci from the same genomic regions allowed us to focus on the contribution of mutational differences (rather than variation in genealogical history) to disparities in population structure between STRPs and SNPs. Relative to average values for SNPs from the same regions, STRPs had lower F(st), but higher G(st)' and I(n) values. STRP-SNP correlations in population structure across genomic regions were statistically significant but weak in magnitude. Separate analyses by repeat type showed that these correlations were driven primarily by tetranucleotide and trinucleotide STRPs; measures of population structure at dinucleotides and SNPs were not significantly correlated. Pairwise comparisons among populations revealed effects of divergence time on differences in population structure between STRPs and SNPs. Collectively, these results confirm that individual STRPs can provide more information about population structure than individual SNPs, but suggest that the difference in structure at STRPs and SNPs depends on local genealogical history. Our study motivates theoretical comparisons of population structure at loci with different mutational properties.
Collapse
Affiliation(s)
- Bret A Payseur
- Laboratory of Genetics, University of Wisconsin, WI, USA.
| | | |
Collapse
|
26
|
A mutation in the signal sequence ofLRP5in a family with an osteoporosis-pseudoglioma syndrome (OPPG)-like phenotype indicates a novel disease mechanism for trinucleotide repeats. Hum Mutat 2009; 30:641-8. [DOI: 10.1002/humu.20916] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
|
27
|
New genetic evidence for involvement of the dopamine system in migraine with aura. Hum Genet 2009; 125:265-79. [PMID: 19152006 DOI: 10.1007/s00439-009-0623-z] [Citation(s) in RCA: 50] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2008] [Accepted: 01/06/2009] [Indexed: 12/12/2022]
Abstract
In order to systematically test the hypothesis that genetic variation in the dopamine system contributes to the susceptibility to migraine with aura (MA), we performed a comprehensive genetic association study of altogether ten genes from the dopaminergic system in a large German migraine with aura case-control sample. Based on the genotyping results of 53 variants across the ten genes in 270 MA cases and 272 controls, three genes-DBH, DRD2 and SLC6A3-were chosen to proceed to additional genotyping of 380 MA cases and 378 controls. Four of the 26 genotyped polymorphisms in these three genes displayed nominally significant allelic P-values in the sample of 650 MA patients and 650 controls. Three of these SNPs [rs2097629 in DBH (uncorrected allelic P value = 0.0012, OR = 0.77), rs7131056 in DRD2 (uncorrected allelic P value = 0.0018, OR = 1.28) and rs40184 in SLC6A3 (uncorrected allelic P value = 0.0082, OR = 0.81)] remained significant after gene-wide correction for multiple testing by permutation analysis. Further consideration of imputed genotype data from 2,937 British control individuals did not affirm the association with DRD2, but supported the associations with DBH and SLC6A3. Our data provide new evidence for an involvement of components of the dopaminergic system-in particular the dopamine-beta hydroxylase and dopamine transporter genes-to the pathogenesis of migraine with aura.
Collapse
|
28
|
Simpson CL, Lemmens R, Miskiewicz K, Broom WJ, Hansen VK, van Vught PWJ, Landers JE, Sapp P, Van Den Bosch L, Knight J, Neale BM, Turner MR, Veldink JH, Ophoff RA, Tripathi VB, Beleza A, Shah MN, Proitsi P, Van Hoecke A, Carmeliet P, Horvitz HR, Leigh PN, Shaw CE, van den Berg LH, Sham PC, Powell JF, Verstreken P, Brown RH, Robberecht W, Al-Chalabi A. Variants of the elongator protein 3 (ELP3) gene are associated with motor neuron degeneration. Hum Mol Genet 2008; 18:472-81. [PMID: 18996918 PMCID: PMC2638803 DOI: 10.1093/hmg/ddn375] [Citation(s) in RCA: 205] [Impact Index Per Article: 12.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
Abstract
Amyotrophic lateral sclerosis (ALS) is a spontaneous, relentlessly progressive motor neuron disease, usually resulting in death from respiratory failure within 3 years. Variation in the genes SOD1 and TARDBP accounts for a small percentage of cases, and other genes have shown association in both candidate gene and genome-wide studies, but the genetic causes remain largely unknown. We have performed two independent parallel studies, both implicating the RNA polymerase II component, ELP3, in axonal biology and neuronal degeneration. In the first, an association study of 1884 microsatellite markers, allelic variants of ELP3 were associated with ALS in three human populations comprising 1483 people (P = 1.96 × 10−9). In the second, an independent mutagenesis screen in Drosophila for genes important in neuronal communication and survival identified two different loss of function mutations, both in ELP3 (R475K and R456K). Furthermore, knock down of ELP3 protein levels using antisense morpholinos in zebrafish embryos resulted in dose-dependent motor axonal abnormalities [Pearson correlation: −0.49, P = 1.83 × 10−12 (start codon morpholino) and −0.46, P = 4.05 × 10−9 (splice-site morpholino), and in humans, risk-associated ELP3 genotypes correlated with reduced brain ELP3 expression (P = 0.01). These findings add to the growing body of evidence implicating the RNA processing pathway in neurodegeneration and suggest a critical role for ELP3 in neuron biology and of ELP3 variants in ALS.
Collapse
Affiliation(s)
- Claire L Simpson
- Department of Neurology, King's College London, London SE5 8AF, UK
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
29
|
Zhang W, Huang RS, Dolan ME. Cell-based Models for Discovery of Pharmacogenomic Markers of Anticancer Agent Toxicity. TRENDS IN CANCER RESEARCH 2008; 4:1-13. [PMID: 21499559 PMCID: PMC3076057] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Subscribe] [Scholar Register] [Indexed: 05/30/2023]
Abstract
The field of pharmacogenomics is challenging because of the multigenic nature of drug response and toxicity. The candidate gene approach has been traditionally utilized to determine the contribution of genetic variation to a particular phenotype; however, the sequencing of the human genome and the genetic resource provided by the International HapMap Project has allowed researchers to perform genome-wide studies without a priori knowledge. Recent work has demonstrated the usefulness of cell-based models for pharmacogenomic discovery using the HapMap samples, which are a panel of well-genotyped, human lymphoblastoid cell lines (LCLs) derived from 90 Utah residents with ancestry from northern and western Europe (CEU), 90 Yoruba in Ibadan, Nigeria (YRI), 45 Japanese in Tokyo, Japan (JPT) and 45 Han Chinese in Beijing, China (CHB). Using these cell-based models, investigators are able to study not only individual variation in drug response, but also population differences in drug response. Finally, besides single nucleotide polymorphisms (SNPs) and gene expression, these cell-based models can also be used to investigate other genetic (e.g. copy number variants, CNVs), epigenetic or environmental factors responsible for drug response.
Collapse
Affiliation(s)
- Wei Zhang
- Section of Hematology/Oncology, Department of Medicine, The University of Chicago, Chicago, IL 60637, USA
| | - R. Stephanie Huang
- Section of Hematology/Oncology, Department of Medicine, The University of Chicago, Chicago, IL 60637, USA
| | - M. Eileen Dolan
- Section of Hematology/Oncology, Department of Medicine, The University of Chicago, Chicago, IL 60637, USA
- Committee on Clinical Pharmacology and Pharmacogenomics, The University of Chicago, Chicago, IL 60637, USA
- Cancer Research Center, The University of Chicago, Chicago, IL 60637, USA
| |
Collapse
|