1
|
Genome-wide association study reveals WRKY42 as a novel plant transcription factor that influences oviposition preference of Pieris butterflies. JOURNAL OF EXPERIMENTAL BOTANY 2023; 74:1690-1704. [PMID: 36560910 PMCID: PMC10010613 DOI: 10.1093/jxb/erac501] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/24/2022] [Accepted: 12/22/2022] [Indexed: 06/17/2023]
Abstract
Insect herbivores are amongst the most destructive plant pests, damaging both naturally occurring and domesticated plants. As sessile organisms, plants make use of structural and chemical barriers to counteract herbivores. However, over 75% of herbivorous insect species are well adapted to their host's defenses and these specialists are generally difficult to ward off. By actively antagonizing the number of insect eggs deposited on plants, future damage by the herbivore's offspring can be limited. Therefore, it is important to understand which plant traits influence attractiveness for oviposition, especially for specialist insects that are well adapted to their host plants. In this study, we investigated the oviposition preference of Pieris butterflies (Lepidoptera: Pieridae) by offering them the choice between 350 different naturally occurring Arabidopsis accessions. Using a genome-wide association study of the oviposition data and subsequent fine mapping with full genome sequences of 164 accessions, we identified WRKY42 and AOC1 as candidate genes that are associated with the oviposition preference observed for Pieris butterflies. Host plant choice assays with Arabidopsis genotypes impaired in WRKY42 or AOC1 function confirmed a clear role for WRKY42 in oviposition preference of female Pieris butterflies, while for AOC1 the effect was mild. In contrast, WRKY42-impaired plants, which were preferred for oviposition by butterflies, negatively impacted offspring performance. These findings exemplify that plant genotype can have opposite effects on oviposition preference and caterpillar performance. This knowledge can be used for breeding trap crops or crops that are unattractive for oviposition by pest insects.
Collapse
|
2
|
SR4R: An Integrative SNP Resource for Genomic Breeding and Population Research in Rice. GENOMICS, PROTEOMICS & BIOINFORMATICS 2020; 18:173-185. [PMID: 32619768 PMCID: PMC7646087 DOI: 10.1016/j.gpb.2020.03.002] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/05/2019] [Revised: 03/24/2020] [Accepted: 03/25/2020] [Indexed: 12/16/2022]
Abstract
The information commons for rice (IC4R) database is a collection of 18 million single nucleotide polymorphisms (SNPs) identified by resequencing of 5152 rice accessions. Although IC4R offers ultra-high density rice variation map, these raw SNPs are not readily usable for the public. To satisfy different research utilizations of SNPs for population genetics, evolutionary analysis, association studies, and genomic breeding in rice, raw genotypic data of these 18 million SNPs were processed by unified bioinformatics pipelines. The outcomes were used to develop a daughter database of IC4R - SnpReady for Rice (SR4R). SR4R presents four reference SNP panels, including 2,097,405 hapmapSNPs after data filtration and genotype imputation, 156,502 tagSNPs selected from linkage disequilibrium-based redundancy removal, 1180 fixedSNPs selected from genes exhibiting selective sweep signatures, and 38 barcodeSNPs selected from DNA fingerprinting simulation. SR4R thus offers a highly efficient rice variation map that combines reduced SNP redundancy with extensive data describing the genetic diversity of rice populations. In addition, SR4R provides rice researchers with a web interface that enables them to browse all four SNP panels, use online toolkits, as well as retrieve the original data and scripts for a variety of population genetics analyses on local computers. SR4R is freely available to academic users at http://sr4r.ic4r.org/.
Collapse
|
3
|
Gigwa v2-Extended and improved genotype investigator. Gigascience 2019; 8:5488103. [PMID: 31077313 PMCID: PMC6511067 DOI: 10.1093/gigascience/giz051] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2018] [Revised: 02/19/2019] [Accepted: 04/08/2019] [Indexed: 11/19/2022] Open
Abstract
Background The study of genetic variations is the basis of many research domains in biology. From genome structure to population dynamics, many applications involve the use of genetic variants. The advent of next-generation sequencing technologies led to such a flood of data that the daily work of scientists is often more focused on data management than data analysis. This mass of genotyping data poses several computational challenges in terms of storage, search, sharing, analysis, and visualization. While existing tools try to solve these challenges, few of them offer a comprehensive and scalable solution. Results Gigwa v2 is an easy-to-use, species-agnostic web application for managing and exploring high-density genotyping data. It can handle multiple databases and may be installed on a local computer or deployed as an online data portal. It supports various standard import and export formats, provides advanced filtering options, and offers means to visualize density charts or push selected data into various stand-alone or online tools. It implements 2 standard RESTful application programming interfaces, GA4GH, which is health-oriented, and BrAPI, which is breeding-oriented, thus offering wide possibilities of interaction with third-party applications. The project home page provides a list of live instances allowing users to test the system on public data (or reasonably sized user-provided data). Conclusions This new version of Gigwa provides a more intuitive and more powerful way to explore large amounts of genotyping data by offering a scalable solution to search for genotype patterns, functional annotations, or more complex filtering. Furthermore, its user-friendliness and interoperability make it widely accessible to the life science community.
Collapse
|
4
|
Comparison of variation in frequency for SNPs associated with asthma or liver disease between Estonia, HapMap populations and the 1000 genome project populations. Int J Immunogenet 2019; 46:49-58. [PMID: 30659741 DOI: 10.1111/iji.12413] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2018] [Accepted: 12/04/2018] [Indexed: 11/28/2022]
Abstract
Allele-specific analyses to understand frequency differences across populations, particularly populations not well studied, are important to help identify variants that may have a functional effect on disease mechanisms and phenotypic predisposition, facilitating new Genome-Wide Association Studies (GWAS). We aimed to compare the allele frequency of 11 asthma-associated and 16 liver disease-associated single nucleotide polymorphisms (SNPs) between the Estonian, HapMap and 1000 genome project populations. When comparing EGCUT with HapMap populations, the largest difference in allele frequencies was observed with the Maasai population in Kinyawa, Kenya, with 12 SNP variants reporting statistical significance. Similarly, when comparing EGCUT with 1000 genomes project populations, the largest difference in allele frequencies was observed with pooled African populations with 22 SNP variants reporting statistical significance. For 11 asthma-associated and 16 liver disease-associated SNPs, Estonians are genetically similar to other European populations but significantly different from African populations. Understanding differences in genetic architecture between ethnic populations is important to facilitate new GWAS targeted at underserved ethnic groups to enable novel genetic findings to aid the development of new therapies to reduce morbidity and mortality.
Collapse
|
5
|
Using variant databases for variant prioritization and to detect erroneous genotype-phenotype associations. BMC Bioinformatics 2017; 18:535. [PMID: 29191167 PMCID: PMC5710091 DOI: 10.1186/s12859-017-1951-y] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2017] [Accepted: 11/22/2017] [Indexed: 01/08/2023] Open
Abstract
Background In the search for novel causal mutations, public and/or private variant databases are nearly always used to facilitate the search as they result in a massive reduction of putative variants in one step. Practically, variant filtering is often done by either using all variants from the variant database (called the absence-approach, i.e. it is assumed that disease-causing variants do not reside in variant databases) or by using the subset of variants with an allelic frequency > 1% (called the 1%-approach). We investigate the validity of these two approaches in terms of false negatives (the true disease-causing variant does not pass all filters) and false positives (a harmless mutation passes all filters and is erroneously retained in the list of putative disease-causing variants) and compare it with an novel approach which we named the quantile-based approach. This approach applies variable instead of static frequency thresholds and the calculation of these thresholds is based on prior knowledge of disease prevalence, inheritance models, database size and database characteristics. Results Based on real-life data, we demonstrate that the quantile-based approach outperforms the absence-approach in terms of false negatives. At the same time, this quantile-based approach deals more appropriately with the variable allele frequencies of disease-causing alleles in variant databases relative to the 1%-approach and as such allows a better control of the number of false positives. We also introduce an alternative application for variant database usage and the quantile-based approach. If disease-causing variants in variant databases deviate substantially from theoretical expectancies calculated with the quantile-based approach, their association between genotype and phenotype had to be reconsidered in 12 out of 13 cases. Conclusions We developed a novel method and demonstrated that this so-called quantile-based approach is a highly suitable method for variant filtering. In addition, the quantile-based approach can also be used for variant flagging. For user friendliness, lookup tables and easy-to-use R calculators are provided. Electronic supplementary material The online version of this article (doi: 10.1186/s12859-017-1951-y) contains supplementary material, which is available to authorized users.
Collapse
|
6
|
Methodology for single nucleotide polymorphism selection in promoter regions for clinical use. An example of its applicability. INTERNATIONAL JOURNAL OF MOLECULAR EPIDEMIOLOGY AND GENETICS 2016; 7:126-136. [PMID: 27766139 PMCID: PMC5069276] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Received: 02/01/2016] [Accepted: 09/01/2016] [Indexed: 06/06/2023]
Abstract
Genetic variability in humans can explain many differences in disease risk factors. Polymorphism-related studies focus mainly on the single nucleotide polymorphisms (SNPs) of coding regions of the genes. SNPs on DNA binding motifs of the promoter region have been less explored. On a recent study of SNPs in patients with non-Hodgkin lymphomas we faced the problem of SNP selection from promoter regions and developed a practical methodology for clinical studies. The process consists in identifying SNPs in the coding and promoter regions of the antigen-processing system using the 'dbSNP' database. With the 'HapMap' program, we select SNPs with frequencies >20% in Caucasian populations. For coding regions, we sought biologically and clinically relevant SNPs described in the literature. For the promoter regions, we determined their chromosomal location on 'QiagenSABioscience' site database. The nucleotide sequence of ancestral and variant alleles is available in the 'dbSNP'. These sequences were used in 'Promoter TESS' to determine binding differences of transcription factors. Each sequence may have affinity to different TFs. Thus, SNP selection on the promoter regions was based in the differences on TF binding pattern between the old and the new allele. The potential clinical relevance of the new TFs was also evaluated before the final selection. With this approach, we found that almost half of the relevant SNP fall within the promoter region. In conclusion, we were able to develop a methodology of oriented selection of promoter regions of human genes, comparing the TF with affinity to the ancestral allele with the TF to a variant allele. We selected those SNPs that change the TF's affinity to a pattern with functional significance.
Collapse
|
7
|
Abstract
Background Accurate genotype calling for high throughput Illumina data is an important step to extract more genetic information for a large scale genome wide association studies. Many popular calling algorithms use mixture models to infer genotypes of a large number of single nucleotide polymorphisms in a fast and efficient way. In practice, mixture models are mostly restricted to infer genotypes for common SNPs where their minor allele frequencies are quite large. However, it is still challenging to accurately genotype rare variants, especially for some rare variants where the boundaries of their genotypes are not clearly defined. Results To further improve the call accuracy and the quality of genotypes on rare variants, a new model calling procedure, named M-D, is proposed to infer genotypes for the Illumina BeadArray data. In this calling procedure, a Gaussian Mixture Model and a Dirichlet Process Gaussian Mixture Model are integrated to infer genotypes. Conclusions Applications to Illumina data illustrate that this new approach can improve calling performance compared to other popular genotyping algorithms.
Collapse
|
8
|
Transcriptional similarity in couples reveals the impact of shared environment and lifestyle on gene regulation through modified cytosines. PeerJ 2016; 4:e2123. [PMID: 27326381 PMCID: PMC4911945 DOI: 10.7717/peerj.2123] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2015] [Accepted: 05/20/2016] [Indexed: 12/25/2022] Open
Abstract
Gene expression is a complex and quantitative trait that is influenced by both genetic and non-genetic regulators including environmental factors. Evaluating the contribution of environment to gene expression regulation and identifying which genes are more likely to be influenced by environmental factors are important for understanding human complex traits. We hypothesize that by living together as couples, there can be commonly co-regulated genes that may reflect the shared living environment (e.g., diet, indoor air pollutants, behavioral lifestyle). The lymphoblastoid cell lines (LCLs) derived from unrelated couples of African ancestry (YRI, Yoruba people from Ibadan, Nigeria) from the International HapMap Project provided a unique model for us to characterize gene expression pattern in couples by comparing gene expression levels between husbands and wives. Strikingly, 778 genes were found to show much smaller variances in couples than random pairs of individuals at a false discovery rate (FDR) of 5%. Since genetic variation between unrelated family members in a general population is expected to be the same assuming a random-mating society, non-genetic factors (e.g., epigenetic systems) are more likely to be the mediators for the observed transcriptional similarity in couples. We thus evaluated the contribution of modified cytosines to those genes showing transcriptional similarity in couples as well as the relationships these CpG sites with other gene regulatory elements, such as transcription factor binding sites (TFBS). Our findings suggested that transcriptional similarity in couples likely reflected shared common environment partially mediated through cytosine modifications.
Collapse
|
9
|
Abstract
BACKGROUND Exploring the structure of genomes and analyzing their evolution is essential to understanding the ecological adaptation of organisms. However, with the large amounts of data being produced by next-generation sequencing, computational challenges arise in terms of storage, search, sharing, analysis and visualization. This is particularly true with regards to studies of genomic variation, which are currently lacking scalable and user-friendly data exploration solutions. DESCRIPTION Here we present Gigwa, a web-based tool that provides an easy and intuitive way to explore large amounts of genotyping data by filtering it not only on the basis of variant features, including functional annotations, but also on genotype patterns. The data storage relies on MongoDB, which offers good scalability properties. Gigwa can handle multiple databases and may be deployed in either single- or multi-user mode. In addition, it provides a wide range of popular export formats. CONCLUSIONS The Gigwa application is suitable for managing large amounts of genomic variation data. Its user-friendly web interface makes such processing widely accessible. It can either be simply deployed on a workstation or be used to provide a shared data portal for a given community of researchers.
Collapse
|
10
|
Inference of kinship using spatial distributions of SNPs for genome-wide association studies. BMC Genomics 2016; 17:372. [PMID: 27206321 PMCID: PMC4873983 DOI: 10.1186/s12864-016-2696-0] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2015] [Accepted: 05/06/2016] [Indexed: 01/22/2023] Open
Abstract
Background Genome-wide association studies (GWASs) are powerful in identifying genetic loci which cause complex traits of common diseases. However, it is well known that inappropriately accounting for pedigree or population structure leads to spurious associations. GWASs have often encountered increased type I error rates due to the correlated genotypes of cryptically related individuals or subgroups. Therefore, accurate pedigree information is crucial for successful GWASs. Results We propose a distance-based method KIND to estimate kinship coefficients among individuals. Our method utilizes the spatial distribution of SNPs in the genome that represents how far each minor-allele variant is located from its neighboring minor-allele variants. The SNP distribution of each individual was presented in a feature vector in Euclidean space, and then the kinship coefficient was inferred from the two vectors of each individual pair. We demonstrate that the distance information can measure the similarity of genetic variants of individuals accurately and efficiently. We applied our method to a synthetic data set and two real data sets (i.e. the HapMap phase III and the 1000 genomes data). We investigated the estimation accuracy of kinship coefficients not only within homogeneous populations but also for a population with extreme stratification. Conclusions Our method KIND usually produces more accurate and more robust kinship coefficient estimates than existing methods especially for populations with extreme stratification. It can serve as an important and very efficient tool for GWASs. Electronic supplementary material The online version of this article (doi:10.1186/s12864-016-2696-0) contains supplementary material, which is available to authorized users.
Collapse
|
11
|
Addressing population-specific multiple testing burdens in genetic association studies. Ann Hum Genet 2015; 79:136-47. [PMID: 25644736 DOI: 10.1111/ahg.12095] [Citation(s) in RCA: 53] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2014] [Accepted: 10/06/2014] [Indexed: 01/06/2023]
Abstract
The number of effectively independent tests performed in genome-wide association studies (GWAS) varies by population, making a universal P-value threshold inappropriate. We estimated the number of independent SNPs in Phase 3 HapMap samples by: (1) the LD-pruning function in PLINK, and (2) an autocorrelation-based approach. Autocorrelation was also used to estimate the number of independent SNPs in whole genome sequences from 1000 Genomes. Both approaches yielded consistent estimates of numbers of independent SNPs, which were used to calculate new population-specific thresholds for genome-wide significance. African populations had the most stringent thresholds (1.49 × 10(-7) for YRI at r(2) = 0.3), East Asian populations the least (3.75 × 10(-7) for JPT at r(2) = 0.3). We also assessed how using population-specific significance thresholds compared to using a single multiple testing threshold at the conventional 5 × 10(-8) cutoff. Applied to a previously published GWAS of melanoma in Caucasians, our approach identified two additional genes, both previously associated with the phenotype. In a Chinese breast cancer GWAS, our approach identified 48 additional genes, 19 of which were in or near genes previously associated with the phenotype. We conclude that the conventional genome-wide significance threshold generates an excess of Type 2 errors, particularly in GWAS performed on more recently founded populations.
Collapse
|
12
|
Tailored selection of study individuals to be sequenced in order to improve the accuracy of genotype imputation. Genet Epidemiol 2014; 39:114-21. [PMID: 25537753 DOI: 10.1002/gepi.21873] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2014] [Revised: 08/28/2014] [Accepted: 11/05/2014] [Indexed: 01/21/2023]
Abstract
The addition of sequence data from own-study individuals to genotypes from external data repositories, for example, the HapMap, has been shown to improve the accuracy of imputed genotypes. Early approaches for reference panel selection favored individuals who best reflect recombination patterns in the study population. By contrast, a maximization of genetic diversity in the reference panel has been recently proposed. We investigate here a novel strategy to select individuals for sequencing that relies on the characterization of the ancestral kernel of the study population. The simulated study scenarios consisted of several combinations of subpopulations from HapMap. HapMap individuals who did not belong to the study population constituted an external reference panel which was complemented with the sequences of study individuals selected according to different strategies. In addition to a random choice, individuals with the largest statistical depth according to the first genetic principal components were selected. In all simulated scenarios the integration of sequences from own-study individuals increased imputation accuracy. The selection of individuals based on the statistical depth resulted in the highest imputation accuracy for European and Asian study scenarios, whereas random selection performed best for an African-study scenario. Present findings indicate that there is no universal 'best strategy' to select individuals for sequencing. We propose to use the methodology described in the manuscript to assess the advantage of focusing on the ancestral kernel under own study characteristics (study size, genetic diversity, availability and properties of external reference panels, frequency of imputed variants…).
Collapse
|
13
|
Harmonization of study and reference data by PhaseLift: saving time when imputing study data. Genet Epidemiol 2014; 38:381-8. [PMID: 24962562 DOI: 10.1002/gepi.21812] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2014] [Revised: 04/09/2014] [Accepted: 04/10/2014] [Indexed: 11/11/2022]
Abstract
Genome-wide association studies are usually accompanied by imputation techniques to complement genome-wide SNP chip genotypes. Current imputation approaches separate the phasing of study data from imputing, which makes the phasing independent from the reference data. The two-step approach allows for updating the imputation for a new reference panel without repeating the tedious phasing step. This advantage, however, does no longer hold, when the build of the study data differs from the build of the reference data. In this case, the current approach is to harmonize the study data annotation with the reference data (prephasing lift-over), requiring rephasing and re-imputing. As a novel approach, we propose to harmonize study haplotypes with reference haplotypes (postphasing lift-over). This allows for updating imputed study data for new reference panels without requiring rephasing. With continuously updated reference panels, our approach can save considerable computing time of up to 1 month per re-imputation. We evaluated the rephasing and postphasing lift-over approaches by using data from 1,644 unrelated individuals imputed by both approaches and comparing it with directly typed genotypes. On average, both approaches perform equally well with mean concordances of 93% between imputed and typed genotypes for both approaches. Also, imputation qualities are similar (mean difference in RSQ < 0.1%). We demonstrate that our novel postphasing lift-over approach is a practical and time-saving alternative to the prephasing lift-over. This might encourage study partners to accommodate updated reference builds and ultimately improve the information content of study data. Our novel approach is implemented in the software PhaseLift.
Collapse
|
14
|
Abstract
The burdens of type 2 diabetes (T2D) and cardiovascular diseases (CVD) are increasing in Africa. T2D and CVD are the result of the complex interaction between inherited characteristics, lifestyle, and environmental factors. The epidemic of obesity is largely behind the exploding global incidence of T2D. However, not all obese individuals develop diabetes and positive family history is a powerful risk factor for diabetes and CVD. Recent implementations of high throughput genotyping and sequencing approaches have advanced our understanding of the genetic basis of diabetes and CVD by identifying several genomic loci that were not previously linked to the pathobiology of these diseases. However, African populations have not been adequately represented in these global genomic efforts. Here, we summarize the state of knowledge of the genetic epidemiology of T2D and CVD in Africa and highlight new genomic initiatives that promise to inform disease etiology, public health and clinical medicine in Africa.
Collapse
|
15
|
A genetic model of differential susceptibility to human respiratory syncytial virus (RSV) infection. FASEB J 2014; 28:1947-56. [PMID: 24421397 DOI: 10.1096/fj.13-239855] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
Respiratory syncytial virus (RSV) is the primary cause of lower respiratory tract infection during childhood and causes severe symptoms in some patients, which may cause hospitalization and death. Mechanisms for differential responses to RSV are unknown. Our objective was to develop an in vitro model of RSV infection to evaluate interindividual variation in response to RSV and identify susceptibility genes. Populations of human-derived HapMap lymphoblastoid cell lines (LCLs) were infected with RSV. Compared with controls, RSV-G mRNA expression varied from ~1- to 400-fold between LCLs. Basal expression of a number of gene transcripts, including myxovirus (influenza virus) resistance 1 (MX1), significantly correlated with RSV-G expression in HapMap LCLs. Individuals in a case-control population of RSV-infected children who were homozygous (n=94) or heterozygous (n=172) for the predicted deleterious A allele in a missense G/A SNP in MX1 had significantly greater risk for developing severe RSV disease relative to those with the major allele (n=108) (χ(2)=5.305, P=0.021; OR: 1.750, 95% CI: 1.110, 2.758, P=0.021). We conclude that genetically diverse human LCLs enable identification of susceptibility genes (e.g., MX1) for RSV disease severity in children, providing insight for disease risk.
Collapse
|
16
|
Detecting novel SNPs and breed-specific haplotypes at calpastatin gene in Iranian fat- and thin-tailed sheep breeds and their effects on protein structure. Gene 2014; 537:132-9. [PMID: 24401538 DOI: 10.1016/j.gene.2013.12.023] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2013] [Revised: 11/28/2013] [Accepted: 12/11/2013] [Indexed: 10/25/2022]
Abstract
Calpastatin has been introduced as a potential candidate gene for growth and meat quality traits. In this study, genetic variability was investigated in the exon 6 and its intron boundaries of ovine CAST gene by PCR-SSCP analysis and DNA sequencing. Also a protein sequence and structural analysis were performed to predict the possible impact of amino acid substitutions on physicochemical properties and structure of the CAST protein. A total of 487 animals belonging to four ancient Iranian sheep breeds with different fat metabolisms, Lori-Bakhtiari and Chall (fat-tailed), Zel-Atabay cross-bred (medium fat-tailed) and Zel (thin-tailed), were analyzed. Eight unique SSCP patterns, representing eight different sequences or haplotypes, CAST-1, CAST-2 and CAST-6 to CAST-11, were identified. Haplotypes CAST-1 and CAST-2 were most common with frequency of 0.365 and 0.295. The novel haplotype CAST-8 had considerable frequency in Iranian sheep breeds (0.129). All the consensus sequences showed 98-99%, 94-98%, 92-93% and 82-83% similarity to the published ovine, caprine, bovine and porcine CAST locus sequences, respectively. Sequence analysis revealed four SNPs in intron 5 (C24T, G62A, G65T and T69-) and three SNPs in exon 6 (c.197A>T, c.282G>T and c.296C>G). All three SNPs in exon 6 were missense mutations which would result in p.Gln 66 Leu, p.Glu 94 Asp and p.Pro 99 Arg substitutions, respectively, in CAST protein. All three amino acid substitutions affected the physicochemical properties of ovine CAST protein including hydrophobicity, amphiphilicity and net charge and subsequently might influence its structure and effect on the activity of Ca2+ channels; hence, they might regulate calpain activity and afterwards meat tenderness and growth rate. The Lori-Bakhtiari population showed the highest heterozygosity in the ovine CAST locus (0.802). Frequency difference of haplotypes CAST-10 and CAST-8 between Lori-Bakhtiari (fat-tailed) and Zel (thin-tailed) breeds was highly significant (P<0.001), indicating that these two haplotypes might be breed-specific haplotypes that distinguish between fat-tailed and thin-tailed sheep breeds.
Collapse
|
17
|
Association analysis of ERBB2 amplicon genetic polymorphisms and STARD3 expression with risk of gastric cancer in the Chinese population. Gene 2013; 535:225-32. [PMID: 24291029 DOI: 10.1016/j.gene.2013.11.030] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2013] [Revised: 11/12/2013] [Accepted: 11/14/2013] [Indexed: 11/24/2022]
Abstract
The purpose of this study was to investigate whether risk of gastric cancer (GC) was associated with single nucleotide polymorphisms (SNPs) in a gene cluster on the chromosome 17q12-q21 (ERBB2 amplicon) in the Chinese Han population. We detected twenty-six SNPs in this gene cluster containing steroidogenic acute regulatory-related lipid transfer domain containing 3 (STARD3), protein phosphatase 1 regulatory subunit 1B (PPP1R1B/DARPP32), titin-cap (TCAP), per1-like domain containing 1(PERLD1/CAB2), human epidermal growth factor receptor-2 (ERBB2/HER2), zinc-finger protein subfamily 1A 3 (ZNFN1A3/IKZF3) and DNA topoisomerase 2-alpha (TOP2A) genes in 311 patients with GC and in 425 controls by Sequenom. We found no associations between genetic variations and GC risk. However, haplotype analysis implied that the haplotype CCCT of STARD3 (rs9972882, rs881844, rs11869286 and rs1877031) conferred a protective effect on the susceptibility to GC (P=0.043, odds ratio [OR]=0.805, 95% confidence intervals [95% CI]=0.643-0.992). The STARD3 rs1877031 TC genotype endued histogenesis of gastric mucinous adenocarcinoma and signet-ring cell carcinoma (P=0.021, OR=2.882, 95% CI=1.173-7.084). We examined the expression of STARD3 in 243 tumor tissues out of the 311 GC patients and 20 adjacent normal gastric tissues using immumohistochemical (IHC) analysis and tissue microarrays (TMA). The expression of STARD3 was observed in the gastric parietal cells and in gastric tumor tissues and significantly correlated with gender (P=0.004), alcohol drinking (P<0.001), tumor location (P=0.007), histological type (P=0.005) and differentiation (P=0.023) in GC. We concluded that the combined effect of haplotype CCCT of STARD3 might affect GC susceptibility. STARD3 expression might be related to the tumorigenesis of GC in the Chinese population.
Collapse
|
18
|
Is CD36 gene polymorphism in region encoding lipid-binding domain associated with early onset CAD? Gene 2013; 530:134-7. [PMID: 23856131 DOI: 10.1016/j.gene.2013.06.061] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2013] [Revised: 05/22/2013] [Accepted: 06/16/2013] [Indexed: 12/21/2022]
Abstract
CD36 is a fatty acid translocase in striated muscle cells and cardiomyocytes. Some study suggested that alterations in CD36 gene may be associated with coronary artery disease (CAD) risk. The aim of the current study was to compare the frequency of CD36 variants in region encoding lipid-binding domain in Caucasian patients with early-onset CAD, no-CAD adult controls and neonates. The study group comprised 100 patients with early onset CAD. The genetic control groups were 306 infants and 40 no-CAD adults aged over 70years. Exons 4, 5 and 6 including fragments of flanking introns were studied using the denaturing high-performance liquid chromatography technique and direct sequencing. Changes detected in analyzed fragment of CD36: IVS3-6 T/C (rs3173798), IVS4-10 G/A (rs3211892), C311T (Thr104Ile, not described so far) in exon 5, G550A (Asp184Asn, rs138897347), C572T (Pro191Leu, rs143150225), G573A (Pro191Pro, rs5956) and A591T (Thr197Thr, rs141680676) in exon 6. No significant differences in the CD36 genotype, allele and haplotype frequencies were found between the three groups. Only borderline differences (p=0.066) were found between early onset CAD patients and newborns in the frequencies of 591T allele (2.00% vs 0.50%) and CGCGCGT haplotype (2.00% vs 0.50%) with both IVS3-6C and 591T variant alleles. In conclusion, CD36 variants: rs3173798, rs3211892, rs138897347, rs5956, rs143150225 rs141680676 and C311T do not seem to be involved in the risk of early-onset CAD in Caucasian population.
Collapse
|
19
|
Genome-wide discovery of genetic variants affecting tamoxifen sensitivity and their clinical and functional validation. Ann Oncol 2013; 24:1867-1873. [PMID: 23508821 PMCID: PMC3690911 DOI: 10.1093/annonc/mdt125] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2013] [Revised: 02/12/2013] [Accepted: 02/14/2013] [Indexed: 12/26/2022] Open
Abstract
BACKGROUND Beyond estrogen receptor (ER), there are no validated predictors for tamoxifen (TAM) efficacy and toxicity. We utilized a genome-wide cell-based model to comprehensively evaluate genetic variants for their contribution to cellular sensitivity to TAM. DESIGN Our discovery model incorporates multidimensional datasets, including genome-wide genotype, gene expression, and endoxifen-induced cellular growth inhibition in the International HapMap lymphoblastoid cell lines (LCLs). Genome-wide findings were further evaluated in NCI60 cancer cell lines. Gene knock-down experiments were performed in four breast cancer cell lines. Genetic variants identified in the cell-based model were examined in 245 Caucasian breast cancer patients who underwent TAM treatment. RESULTS We identified seven novel single-nucleotide polymorphisms (SNPs) associated with endoxifen sensitivity through the expression of 10 genes using the genome-wide integrative analysis. All 10 genes identified in LCLs were associated with TAM sensitivity in NCI60 cancer cell lines, including USP7. USP7 knock-down resulted in increasing resistance to TAM in four breast cancer cell lines tested, which is consistent with the finding in LCLs and in the NCI60 cells. Furthermore, we identified SNPs that were associated with TAM-induced toxicities in breast cancer patients, after adjusting for other clinical factors. CONCLUSION Our work demonstrates the utility of a cell-based model in genome-wide identification of pharmacogenomic markers.
Collapse
|
20
|
Population analysis of vitamin D receptor polymorphisms and the role of genetic ancestry in an admixed population. Genet Mol Biol 2011; 34:377-85. [PMID: 21931507 PMCID: PMC3168175 DOI: 10.1590/s1415-47572011000300003] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2011] [Accepted: 04/27/2011] [Indexed: 12/04/2022] Open
Abstract
The vitamin D receptor (VDR) is an essential protein related to bone metabolism. Some VDR alleles are differentially distributed among ethnic populations and display variable patterns of linkage disequilibrium (LD). In this study, 200 unrelated Brazilians were genotyped using 21 VDR single nucleotide polymorphisms (SNPs) and 28 ancestry informative markers. The patterns of LD and haplotype distribution were compared among Brazilian and the HapMap populations of African (YRI), European (CEU) and Asian (JPT+CHB) origins. Conditional regression and haplotype-specific analysis were performed using estimates of individual genetic ancestry in Brazilians as a quantitative trait. Similar patterns of LD were observed in the 5′ and 3′ gene regions. However, the frequency distribution of haplotype blocks varied among populations. Conditional regression analysis identified haplotypes associated with European and Amerindian ancestry, but not with the proportion of African ancestry. Individual ancestry estimates were associated with VDR haplotypes. These findings reinforce the need to correct for population stratification when performing genetic association studies in admixed populations.
Collapse
|
21
|
Integrating mechanistic and polymorphism data to characterize human genetic susceptibility for environmental chemical risk assessment in the 21st century. Toxicol Appl Pharmacol 2011; 271:395-404. [PMID: 21291902 DOI: 10.1016/j.taap.2011.01.015] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2010] [Revised: 12/28/2010] [Accepted: 01/24/2011] [Indexed: 12/27/2022]
Abstract
Response to environmental chemicals can vary widely among individuals and between population groups. In human health risk assessment, data on susceptibility can be utilized by deriving risk levels based on a study of a susceptible population and/or an uncertainty factor may be applied to account for the lack of information about susceptibility. Defining genetic susceptibility in response to environmental chemicals across human populations is an area of interest in the NAS' new paradigm of toxicity pathway-based risk assessment. Data from high-throughput/high content (HT/HC), including -omics (e.g., genomics, transcriptomics, proteomics, metabolomics) technologies, have been integral to the identification and characterization of drug target and disease loci, and have been successfully utilized to inform the mechanism of action for numerous environmental chemicals. Large-scale population genotyping studies may help to characterize levels of variability across human populations at identified target loci implicated in response to environmental chemicals. By combining mechanistic data for a given environmental chemical with next generation sequencing data that provides human population variation information, one can begin to characterize differential susceptibility due to genetic variability to environmental chemicals within and across genetically heterogeneous human populations. The integration of such data sources will be informative to human health risk assessment.
Collapse
|
22
|
HapMap filter 1.0: a tool to preprocess the HapMap genotypic data for association studies. Bioinformation 2008; 2:322-4. [PMID: 18685717 PMCID: PMC2478729 DOI: 10.6026/97320630002322] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2008] [Accepted: 05/06/2008] [Indexed: 11/23/2022] Open
Abstract
UNLABELLED The International HapMap Project provides a resource of genotypic data on single nucleotide polymorphisms (SNPs), which can be used in various association studies to identify the genetic determinants for phenotypic variations. Prior to the association studies, the HapMap dataset should be preprocessed in order to reduce the computation time and control the multiple testing problem. The less informative SNPs including those with very low genotyping rate and SNPs with rare minor allele frequencies to some extent in one or more population are removed. Some research designs only use SNPs in a subset of HapMap cell lines. Although the HapMap website and other association software packages have provided some basic tools for optimizing these datasets, a fast and user-friendly program to generate the output for filtered genotypic data would be beneficial for association studies. Here, we present a flexible, straight-forward bioinformatics program that can be useful in preparing the HapMap genotypic data for association studies by specifying cell lines and two common filtering criteria: minor allele frequencies and genotyping rate. The software was developed for Microsoft Windows and written in C++. AVAILABILITY The Windows executable and source code in Microsoft Visual C++ are available at Google Code (http://hapmap-filter-v1.googlecode.com/) or upon request. Their distribution is subject to GNU General Public License v3.
Collapse
|
23
|
The HapMap Resource is Providing New Insights into Ourselves and its Application to Pharmacogenomics. Bioinform Biol Insights 2008; 2:15-23. [PMID: 18392109 PMCID: PMC2288550 DOI: 10.4137/bbi.s455] [Citation(s) in RCA: 40] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022] Open
Abstract
The exploration of quantitative variation in complex traits such as gene expression and drug response in human populations has become one of the major priorities for medical genetics. The International HapMap Project provides a key resource of genotypic data on human lymphoblastoid cell lines derived from four major world populations of European, African, Chinese and Japanese ancestry for researchers to associate with various phenotypic data to find genes affecting health, disease and response to drugs. Recent progress in dissecting genetic contribution to natural variation in gene expression within and among human populations and variation in drug response are two examples in which researchers have utilized the HapMap resource. The HapMap Project provides new insights into the human genome and has applicability to pharmacogenomics studies leading to personalized medicine.
Collapse
|
24
|
Abstract
The International HapMap Project provides a key resource of genotypic data on human lymphoblastoid cell lines derived from four major world populations
of European, African, Chinese and Japanese ancestry for researchers to associate with various phenotypic data to find genes affecting health, disease and
response to drugs. Recently, the HapMap resource has significantly benefited research areas such as gene expression variation studies. Besides some intrinsic
limitations, there are a few challenges that should be considered in the next wave of research using this tremendous resource. We suggest that overcoming
these challenges or considering the confounding variables in the interpretation of results can provide more insights into the current views of the human
genome as well as complex traits such as drug response variation and susceptibility to common diseases.
Collapse
|