51
|
Ding K, Kullo IJ. Methods for the selection of tagging SNPs: a comparison of tagging efficiency and performance. Eur J Hum Genet 2006; 15:228-36. [PMID: 17164795 DOI: 10.1038/sj.ejhg.5201755] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022] Open
Abstract
There is great interest in the use of tagging single nucleotide polymorphisms (tSNPs) to facilitate association studies of complex diseases. This is based on the premise that a minimum set of tSNPs may be sufficient to capture most of the variation in certain regions of the human genome. Several methods have been described to select tSNPs, based on either haplotype-block structure or independent of the underlying block structure. In this paper, we compare eight methods for choosing tSNPs in 10 representative resequenced candidate genes (a total of 194.2 kb) with different levels of linkage disequilibrium (LD) in a sample of European-Americans. We compared tagging efficiency (TE) and prediction accuracy of tSNPs identified by these methods, as a function of several factors, including LD level, minor allele frequency, and tagging criteria. We also assessed tagging consistency between each method. We found that tSNPs selected based on the methods Haplotype Diversity and Haplotype r2 provided the highest TE, whereas the prediction accuracy was comparable among different methods. Tagging consistency between different methods of tSNPs selection was poor. This work demonstrates that when tSNPs-based association studies are undertaken, the choice of method for selecting tSNPs requires careful consideration.
Collapse
Affiliation(s)
- Keyue Ding
- Division of Cardiovascular Diseases, Mayo Clinic and Foundation, Rochester, MN 55905, USA
| | | |
Collapse
|
52
|
Paschou P, Mahoney MW, Javed A, Kidd JR, Pakstis AJ, Gu S, Kidd KK, Drineas P. Intra- and interpopulation genotype reconstruction from tagging SNPs. Genome Res 2006; 17:96-107. [PMID: 17151345 PMCID: PMC1716273 DOI: 10.1101/gr.5741407] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
The optimal method to be used for tSNP selection, the applicability of a reference LD map to unassayed populations, and the scalability of these methods to genome-wide analysis, all remain subjects of debate. We propose novel, scalable matrix algorithms that address these issues and we evaluate them on genotypic data from 38 populations and four genomic regions (248 SNPs typed for approximately 2000 individuals). We also evaluate these algorithms on a second data set consisting of genotypes available from the HapMap database (1336 SNPs for four populations) over the same genomic regions. Furthermore, we test these methods in the setting of a real association study using a publicly available family data set. The algorithms we use for tSNP selection and unassayed SNP reconstruction do not require haplotype inference and they are, in principle, scalable even to genome-wide analysis. Moreover, they are greedy variants of recently developed matrix algorithms with provable performance guarantees. Using a small set of carefully selected tSNPs, we achieve very good reconstruction accuracy of "untyped" genotypes for most of the populations studied. Additionally, we demonstrate in a quantitative manner that the chosen tSNPs exhibit substantial transferability, both within and across different geographic regions. Finally, we show that reconstruction can be applied to retrieve significant SNP associations with disease, with important genotyping savings.
Collapse
Affiliation(s)
- Peristera Paschou
- Department of Genetics, Yale University School of Medicine, New Haven, CT 06511, USA.
| | | | | | | | | | | | | | | |
Collapse
|
53
|
iHAP--integrated haplotype analysis pipeline for characterizing the haplotype structure of genes. BMC Bioinformatics 2006; 7:525. [PMID: 17137522 PMCID: PMC1698582 DOI: 10.1186/1471-2105-7-525] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2006] [Accepted: 12/01/2006] [Indexed: 11/10/2022] Open
Abstract
Background The advent of genotype data from large-scale efforts that catalog the genetic variants of different populations have given rise to new avenues for multifactorial disease association studies. Recent work shows that genotype data from the International HapMap Project have a high degree of transferability to the wider population. This implies that the design of genotyping studies on local populations may be facilitated through inferences drawn from information contained in HapMap populations. Results To facilitate analysis of HapMap data for characterizing the haplotype structure of genes or any chromosomal regions, we have developed an integrated web-based resource, iHAP. In addition to incorporating genotype and haplotype data from the International HapMap Project and gene information from the UCSC Genome Browser Database, iHAP also provides capabilities for inferring haplotype blocks and selecting tag SNPs that are representative of haplotype patterns. These include block partitioning algorithms, block definitions, tag SNP definitions, as well as SNPs to be "force included" as tags. Based on the parameters defined at the input stage, iHAP performs on-the-fly analysis and displays the result graphically as a webpage. To facilitate analysis, intermediate and final result files can be downloaded. Conclusion The iHAP resource, available at , provides a convenient yet flexible approach for the user community to analyze HapMap data and identify candidate targets for genotyping studies.
Collapse
|
54
|
Sham PC, Ao SI, Kwan JSH, Kao P, Cheung F, Fong PY, Ng MK. Combining functional and linkage disequilibrium information in the selection of tag SNPs. Bioinformatics 2006; 23:129-31. [PMID: 17060359 DOI: 10.1093/bioinformatics/btl532] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
UNLABELLED We have developed an online program, WCLUSTAG, for tag SNP selection that allows the user to specify variable tagging thresholds for different SNPs. Tag SNPs are selected such that a SNP with user-specified tagging threshold C will have a minimum R2 of C with at least one tag SNP. This flexible feature is useful for researchers who wish to prioritize genomic regions or SNPs in an association study. AVAILABILITY The online WCLUSTAG program is available at http://bioinfo.hku.hk/wclustag/
Collapse
Affiliation(s)
- P C Sham
- Department of Psychiatry, Institute of Psychiatry, King's College London, UK
| | | | | | | | | | | | | |
Collapse
|
55
|
Tantoso E, Yang Y, Li KB. How well do HapMap SNPs capture the untyped SNPs? BMC Genomics 2006; 7:238. [PMID: 16982009 PMCID: PMC1586200 DOI: 10.1186/1471-2164-7-238] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2006] [Accepted: 09/19/2006] [Indexed: 01/17/2023] Open
Abstract
Background The recent advancement in human genome sequencing and genotyping has revealed millions of single nucleotide polymorphisms (SNP) which determine the variation among human beings. One of the particular important projects is The International HapMap Project which provides the catalogue of human genetic variation for disease association studies. In this paper, we analyzed the genotype data in HapMap project by using National Institute of Environmental Health Sciences Environmental Genome Project (NIEHS EGP) SNPs. We first determine whether the HapMap data are transferable to the NIEHS data. Then, we study how well the HapMap SNPs capture the untyped SNPs in the region. Finally, we provide general guidelines for determining whether the SNPs chosen from HapMap may be able to capture most of the untyped SNPs. Results Our analysis shows that HapMap data are not robust enough to capture the untyped variants for most of the human genes. The performance of SNPs for European and Asian samples are marginal in capturing the untyped variants, i.e. approximately 55%. Expectedly, the SNPs from HapMap YRI panel can only capture approximately 30% of the variants. Although the overall performance is low, however, the SNPs for some genes perform very well and are able to capture most of the variants along the gene. This is observed in the European and Asian panel, but not in African panel. Through observation, we concluded that in order to have a well covered SNPs reference panel, the SNPs density and the association among reference SNPs are important to estimate the robustness of the chosen SNPs. Conclusion We have analyzed the coverage of HapMap SNPs using NIEHS EGP data. The results show that HapMap SNPs are transferable to the NIEHS SNPs. However, HapMap SNPs cannot capture some of the untyped SNPs and therefore resequencing may be needed to uncover more SNPs in the missing region.
Collapse
Affiliation(s)
- Erwin Tantoso
- Bioinformatics Institute, 30 Biopolis Street, #07-01 Matrix, 138671, Singapore
| | - Yuchen Yang
- Bioinformatics Institute, 30 Biopolis Street, #07-01 Matrix, 138671, Singapore
| | - Kuo-Bin Li
- Bioinformatics Center, National Yang-Ming University, Taipei, 112, Taiwan
| |
Collapse
|
56
|
Lim YP, Lim TT, Chan YL, Song ACM, Yeo BH, Vojtesek B, Coomber D, Rajagopal G, Lane D. The p53 knowledgebase: an integrated information resource for p53 research. Oncogene 2006; 26:1517-21. [PMID: 16953220 DOI: 10.1038/sj.onc.1209952] [Citation(s) in RCA: 35] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
The p53 tumor suppressor protein plays a central role in maintaining genomic integrity by occupying a nodal point in the DNA damage control pathway. Here it integrates a wide variety of signals, responding in one of several ways, that is, cell cycle arrest, senescence or programmed cell death (apoptosis). Mutations in the tumor suppressor gene tp53, which affects the key transcriptional regulatory processes in cell growth and death, occur frequently in cancer and helps explain why p53 has been called the guardian of the genome. There is a vast body of published knowledge on all aspects of p53's role in cancer. To facilitate research, it would be helpful if this information could be collected, curated and updated in a format that is easily accessible to the user community. To this end, we initiated the p53 knowledgebase project (http://p53.bii.a-star.edu.sg). The p53 knowledgebase is a user-friendly web portal incorporating visualization and analysis tools that integrates information from the published literature with other manually curated information to facilitate knowledge discovery. This includes curated information on sequence, structural, mutation, polymorphisms, protein-protein interactions, transcription factors, transcriptional targets, antibodies and post-translational modifications that involve p53. The goal is to collect and maintain all relevant data on p53 and present it in an easily accessible format that will be useful to researchers in the field.
Collapse
Affiliation(s)
- Y P Lim
- Bioinformatics Institute, Matrix, Singapore, Singapore
| | | | | | | | | | | | | | | | | |
Collapse
|
57
|
Abstract
This paper reviews the theoretical basis for single nucleotide polymorphism (SNP) tagging and considers the use of current software made freely available for this task. A distinction between haplotype block-based and non-block-based approaches yields two classes of procedures. Analysis of two different sets of SNP genotype data from the HapMap is used to judge the practical aspects of using each of the programs considered, as well as to make some general observations about the performance of the programs in finding optimal sets of tagging SNPs. Pairwise R2 methods, while the simplest of those considered, do tend to pick more tagging SNPs than are strictly needed to predict unmeasured (non-tagging) SNPs, since a combination of two or more tagging SNPs can form a prediction of SNPs that have no direct (pairwise) surrogate. Block-based methods that exploit the linkage disequilibrium structure within haplotype blocks exploit this sort of redundancy, but run a risk of over-fitting if used without some care. A compromise approach which eliminates the need first to analyse block structure, but which still exploits simple relationships between SNPs, appears promising.
Collapse
Affiliation(s)
- Daniel O Stram
- Division of Biostatistics and Genetic Epidemiology, Keck School of Medicine, University of Southern California, Los Angeles, CA 90033, USA.
| |
Collapse
|
58
|
Nicolas P, Sun F, Li LM. A model-based approach to selection of tag SNPs. BMC Bioinformatics 2006; 7:303. [PMID: 16776821 PMCID: PMC1525207 DOI: 10.1186/1471-2105-7-303] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2006] [Accepted: 06/15/2006] [Indexed: 11/23/2022] Open
Abstract
Background Single Nucleotide Polymorphisms (SNPs) are the most common type of polymorphisms found in the human genome. Effective genetic association studies require the identification of sets of tag SNPs that capture as much haplotype information as possible. Tag SNP selection is analogous to the problem of data compression in information theory. According to Shannon's framework, the optimal tag set maximizes the entropy of the tag SNPs subject to constraints on the number of SNPs. This approach requires an appropriate probabilistic model. Compared to simple measures of Linkage Disequilibrium (LD), a good model of haplotype sequences can more accurately account for LD structure. It also provides a machinery for the prediction of tagged SNPs and thereby to assess the performances of tag sets through their ability to predict larger SNP sets. Results Here, we compute the description code-lengths of SNP data for an array of models and we develop tag SNP selection methods based on these models and the strategy of entropy maximization. Using data sets from the HapMap and ENCODE projects, we show that the hidden Markov model introduced by Li and Stephens outperforms the other models in several aspects: description code-length of SNP data, information content of tag sets, and prediction of tagged SNPs. This is the first use of this model in the context of tag SNP selection. Conclusion Our study provides strong evidence that the tag sets selected by our best method, based on Li and Stephens model, outperform those chosen by several existing methods. The results also suggest that information content evaluated with a good model is more sensitive for assessing the quality of a tagging set than the correct prediction rate of tagged SNPs. Besides, we show that haplotype phase uncertainty has an almost negligible impact on the ability of good tag sets to predict tagged SNPs. This justifies the selection of tag SNPs on the basis of haplotype informativeness, although genotyping studies do not directly assess haplotypes. A software that implements our approach is available.
Collapse
Affiliation(s)
- Pierre Nicolas
- Molecular and Computational Biology Program, Department of Biological Sciences, University of Southern California, Los Angeles, USA
- Mathématique, Informatique et Génome, INRA, Jouy-en-Josas, France
| | - Fengzhu Sun
- Molecular and Computational Biology Program, Department of Biological Sciences, University of Southern California, Los Angeles, USA
| | - Lei M Li
- Molecular and Computational Biology Program, Department of Biological Sciences, University of Southern California, Los Angeles, USA
- Department of Mathematics, University of Southern California, Los Angeles, USA
| |
Collapse
|
59
|
Gunderson KL, Kuhn KM, Steemers FJ, Ng P, Murray SS, Shen R. Whole-genome genotyping of haplotype tag single nucleotide polymorphisms. Pharmacogenomics 2006; 7:641-8. [PMID: 16768648 DOI: 10.2217/14622416.7.4.641] [Citation(s) in RCA: 71] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022] Open
Abstract
The International HapMap Consortium recently completed genotyping over 3.8 million single nucleotide polymorphisms (SNPs) in three major populations, and the results of studying patterns of linkage disequilibrium indicate that characterization of 300,000–500,000 tag SNPs is sufficient to provide good genomic coverage for linkage-disequilibrium-based association studies in many populations. These whole-genome association studies will be used to dissect the genetics of complex diseases and pharmacogenomic drug responses. As such, the development of a cost-effective genotyping platform that can assay hundred of thousands of SNPs across thousands of samples is essential. In this review, we describe the development of a whole-genome genotyping (WGG) assay that enables unconstrained SNP selection and effectively unlimited multiplexing from a single sample preparation. The development of WGG in concert with high-density BeadChips™ has enabled the creation of three different high-density SNP genotyping BeadChips: the Sentrix™ Human-1 Genotyping BeadChip containing over 109,000 exon-centric SNPs; the HumanHap300 BeadChip containing over 317,000 tag SNPs, and the HumanHap550 Beadchip containing over 550,000 tag SNPs.
Collapse
|
60
|
Kim S, Zhao K, Jiang R, Molitor J, Borevitz JO, Nordborg M, Marjoram P. Association mapping with single-feature polymorphisms. Genetics 2006; 173:1125-33. [PMID: 16510789 PMCID: PMC1526505 DOI: 10.1534/genetics.105.052720] [Citation(s) in RCA: 30] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2005] [Accepted: 02/21/2006] [Indexed: 11/18/2022] Open
Abstract
We develop methods for exploiting "single-feature polymorphism" data, generated by hybridizing genomic DNA to oligonucleotide expression arrays. Our methods enable the use of such data, which can be regarded as very high density, but imperfect, polymorphism data, for genomewide association or linkage disequilibrium mapping. We use a simulation-based power study to conclude that our methods should have good power for organisms like Arabidopsis thaliana, in which linkage disequilibrium is extensive, the reason being that the noisiness of single-feature polymorphism data is more than compensated for by their great number. Finally, we show how power depends on the accuracy with which single-feature polymorphisms are called.
Collapse
Affiliation(s)
- Sung Kim
- Department of Molecular and Computational Biology, University of Southern California, Los Angeles, CA 90089-2910, USA
| | | | | | | | | | | | | |
Collapse
|
61
|
Kukita Y, Miyatake K, Stokowski R, Hinds D, Higasa K, Wake N, Hirakawa T, Kato H, Matsuda T, Pant K, Cox D, Tahira T, Hayashi K. Genome-wide definitive haplotypes determined using a collection of complete hydatidiform moles. Genome Res 2006; 15:1511-8. [PMID: 16251461 PMCID: PMC1310639 DOI: 10.1101/gr.4371105] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
Abstract
We present genome-wide definitive haplotypes, determined using a collection of 74 Japanese complete hydatidiform moles, each carrying a genome derived from a single sperm. The haplotypes incorporate 281,439 common SNPs, genotyped with a high throughput array-based oligonucleotide hybridization technique. Comparison of haplotypes inferred from pseudoindividuals (constructed from randomized mole pairs) with those of moles showed some switch errors in resolution of phases by the computational inference method. The effects of these errors on local haplotype structure and selection of tag SNPs are discussed. We also show that definitive haplotypes of moles may be useful for elucidation of long-range haplotype structure, and should be more effective for detecting extended haplotype homozygosity indicative of positive selection.
Collapse
Affiliation(s)
- Yoji Kukita
- Division of Genome Analysis, Research Center for Genetic Information, Medical Institute of Bioregulation, Kyushu University, Fukuoka, Fukuoka 812-8582, Japan
| | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
62
|
Zhou XX, Jia WH, Shen GP, Qin HD, Yu XJ, Chen LZ, Feng QS, Shugart YY, Zeng YX. Sequence Variants in Toll-Like Receptor 10 Are Associated with Nasopharyngeal Carcinoma Risk. Cancer Epidemiol Biomarkers Prev 2006; 15:862-6. [PMID: 16702361 DOI: 10.1158/1055-9965.epi-05-0874] [Citation(s) in RCA: 45] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
Nasopharyngeal carcinoma (NPC) is a common malignancy in southern China and Southeast Asia. Genetic susceptibility is a major factor in determining the individual risk of NPC in these areas. To test the association between NPC and variants in Toll-like receptor 10 (TLR10), we conducted a hospital-based case-control study in a Cantonese-speaking population in Guangdong province. Seven single nucleotide polymorphisms in TLR10, selected with a tagging algorithm, were genotyped. When assessing each unique haplotype compared with the most common haplotype, "GAGTGAA," with the expectation-maximization algorithm in Haplo.stats, the risk of developing NPC was significantly elevated among men who carried the haplotype "GCGTGGC" (P = 0.005). After adjusting for age, gender, and VCA-IgA antibody titers, this association was more significant (P = 0.0007). To further assess the overall differences of haplotype frequency profiles between cases and healthy controls, the global score test, considering all haplotypes and adjusting for age, gender, and VCA-IgA antibody titers, gave a haplo score of 27.52 with P = 0.002. The haplotype specific odds ratio was 2.66 (confidence interval, 1.34-3.82) for GCGTGGC. We concluded that in this Cantonese population-based study, haplotype GCGTGGC with frequency of 11.4% in TLR10 was found to be associated with NPC and this association was statistically significant after adjusting for age, gender, and VCA-IgA antibody titers. It is possible that this is not a causal haplotype for NPC; rather, it is in strong linkage disequilibrium with a causal single nucleotide polymorphism in close proximity.
Collapse
Affiliation(s)
- Xin-Xi Zhou
- Sun Yat-sen University, Cancer Center, 651 Dong-Feng Road East, 510060 Guangzhou, China
| | | | | | | | | | | | | | | | | |
Collapse
|
63
|
Montpetit A, Nelis M, Laflamme P, Magi R, Ke X, Remm M, Cardon L, Hudson TJ, Metspalu A. An evaluation of the performance of tag SNPs derived from HapMap in a Caucasian population. PLoS Genet 2006; 2:e27. [PMID: 16532062 PMCID: PMC1391920 DOI: 10.1371/journal.pgen.0020027] [Citation(s) in RCA: 94] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2005] [Accepted: 01/23/2006] [Indexed: 11/18/2022] Open
Abstract
The Haplotype Map (HapMap) project recently generated genotype data for more than 1 million single-nucleotide polymorphisms (SNPs) in four population samples. The main application of the data is in the selection of tag single-nucleotide polymorphisms (tSNPs) to use in association studies. The usefulness of this selection process needs to be verified in populations outside those used for the HapMap project. In addition, it is not known how well the data represent the general population, as only 90–120 chromosomes were used for each population and since the genotyped SNPs were selected so as to have high frequencies. In this study, we analyzed more than 1,000 individuals from Estonia. The population of this northern European country has been influenced by many different waves of migrations from Europe and Russia. We genotyped 1,536 randomly selected SNPs from two 500-kbp ENCODE regions on Chromosome 2. We observed that the tSNPs selected from the CEPH (Centre d'Etude du Polymorphisme Humain) from Utah (CEU) HapMap samples (derived from US residents with northern and western European ancestry) captured most of the variation in the Estonia sample. (Between 90% and 95% of the SNPs with a minor allele frequency of more than 5% have an r2 of at least 0.8 with one of the CEU tSNPs.) Using the reverse approach, tags selected from the Estonia sample could almost equally well describe the CEU sample. Finally, we observed that the sample size, the allelic frequency, and the SNP density in the dataset used to select the tags each have important effects on the tagging performance. Overall, our study supports the use of HapMap data in other Caucasian populations, but the SNP density and the bias towards high-frequency SNPs have to be taken into account when designing association studies. The recent completion of the Haplotype Map (HapMap) project of the human genome provides considerable information on the patterns of variation in the genome of four populations. One of the applications is a description of a set of tags that act as proxies for many other surrounding variants. This will greatly help researchers in their quest to find complex disease genes by reducing the number of genetic variants to test in association studies. To evaluate its usefulness, several aspects of the map, including its transferability to other populations, still needed to be verified experimentally. Using genomic regions where variants had been thoroughly documented in Caucasian samples from Estonia, the researchers found that the transferability of tags is extremely good. The researchers also found that variants with low frequency in the general population (i.e., less than 5%) could not be accurately captured with tags, and that the regional density of variants in the HapMap project had a major impact on the performance of the tags. This research indicates that the HapMap project will be useful, but that careful consideration of hypotheses and study design will be essential for the success of association studies.
Collapse
Affiliation(s)
- Alexandre Montpetit
- McGill University and Genome Quebec Innovation Centre, Montreal, Quebec, Canada
| | - Mari Nelis
- Institute of Molecular and Cell Biology of the University of Tartu, Tartu, Estonia
- Estonian Biocentre, Tartu, Estonia
| | - Philippe Laflamme
- McGill University and Genome Quebec Innovation Centre, Montreal, Quebec, Canada
| | - Reedik Magi
- Institute of Molecular and Cell Biology of the University of Tartu, Tartu, Estonia
| | - Xiayi Ke
- Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, United Kingdom
| | - Maido Remm
- Institute of Molecular and Cell Biology of the University of Tartu, Tartu, Estonia
| | - Lon Cardon
- Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, United Kingdom
| | - Thomas J Hudson
- McGill University and Genome Quebec Innovation Centre, Montreal, Quebec, Canada
| | - Andres Metspalu
- Institute of Molecular and Cell Biology of the University of Tartu, Tartu, Estonia
- Estonian Biocentre, Tartu, Estonia
- The Estonian Genome Project Foundation, Tartu, Estonia
- * To whom correspondence should be addressed. E-mail:
| |
Collapse
|
64
|
Abstract
The genome era provides two sources of knowledge to investigators whose goal is to discover new cancer therapies: first, information on the 20,000 to 40,000 genes that comprise the human genome, the proteins they encode, and the variation in these genes and proteins in human populations that place individuals at risk or that occur in disease; second, genome-wide analysis of cancer cells and tissues leads to the identification of new drug targets and the design of new therapeutic interventions. Using genome resources requires the storage and analysis of large amounts of diverse information on genetic variation, gene and protein functions, and interactions in regulatory processes and biochemical pathways. Cancer bioinformatics deals with organizing and analyzing the data so that important trends and patterns can be identified. Specific gene and protein targets on which cancer cells depend can be identified. Therapeutic agents directed against these targets can then be developed and evaluated. Finally, molecular and genetic variation within a population may become the basis of individualized treatment.
Collapse
Affiliation(s)
- David W Mount
- Arizona Cancer Center, University of Arizona, 1515 North Campbell Avenue, P.O. Box 245024, Tucson, AZ 85724-5024, USA.
| | | |
Collapse
|
65
|
Green RF, Moore C. Incorporating genetic analyses into birth defects cluster investigations: Strategies for identifying candidate genes. ACTA ACUST UNITED AC 2006; 76:798-810. [PMID: 17036308 DOI: 10.1002/bdra.20280] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]
Abstract
BACKGROUND Incorporating genetic analyses into birth defect cluster investigations may increase understanding of both genetic and environmental risk factors for the defect. Current constraints of most birth defect cluster investigations make candidate gene selection the most feasible approach. Here, we describe strategies for choosing candidate genes for such investigations, which will also be applicable to more general gene-environment studies. METHODS We reviewed publicly available web-based resources for selection of candidate genes and identification of risk factors, as well as publications on different strategies for candidate gene selection. RESULTS Candidate gene selection requires consideration of available gene-disease databases, previous epidemiological studies, animal model research, linkage and expression studies, and other resources. We describe general considerations for utilizing available resources, as well as provide an example of a search for candidate genes related to gastroschisis. CONCLUSIONS Available web resources could facilitate selection of candidate genes, but selection of optimal candidates will still require a strong understanding of genetics and the pathogenesis of the defect, as well as careful consideration of previous epidemiological studies.
Collapse
Affiliation(s)
- Ridgely Fisk Green
- National Center on Birth Defects and Developmental Disabilities, Centers for Disease Control and Prevention, Atlanta, Georgia 30333, USA.
| | | |
Collapse
|
66
|
Indap AR, Marth GT, Struble CA, Tonellato P, Olivier M. Analysis of concordance of different haplotype block partitioning algorithms. BMC Bioinformatics 2005; 6:303. [PMID: 16356172 PMCID: PMC1343594 DOI: 10.1186/1471-2105-6-303] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2005] [Accepted: 12/15/2005] [Indexed: 12/04/2022] Open
Abstract
BACKGROUND Different classes of haplotype block algorithms exist and the ideal dataset to assess their performance would be to comprehensively re-sequence a large genomic region in a large population. Such data sets are expensive to collect. Alternatively, we performed coalescent simulations to generate haplotypes with a high marker density and compared block partitioning results from diversity based, LD based, and information theoretic algorithms under different values of SNP density and allele frequency. RESULTS We simulated 1000 haplotypes using the standard coalescent for three world populations--European, African American, and East Asian--and applied three classes of block partitioning algorithms--diversity based, LD based, and information theoretic. We assessed algorithm differences in number, size, and coverage of blocks inferred under different conditions of SNP density, allele frequency, and sample size. Each algorithm inferred blocks differing in number, size, and coverage under different density and allele frequency conditions. Different partitions had few if any matching block boundaries. However they still overlapped and a high percentage of total chromosomal region was common to all methods. This percentage was generally higher with a higher density of SNPs and when rarer markers were included. CONCLUSION A gold standard definition of a haplotype block is difficult to achieve, but collecting haplotypes covered with a high density of SNPs, partitioning them with a variety of block algorithms, and identifying regions common to all methods may be the best way to identify genomic regions that harbor SNP variants that cause disease.
Collapse
Affiliation(s)
- Amit R Indap
- Human and Molecular Genetics Center, Medical College of Wisconsin, Milwaukee, USA
| | - Gabor T Marth
- Department of Biology, Boston College, Chestnut Hill, USA
| | - Craig A Struble
- Department of Mathematics, Statistics, and Computer Science, Marquette University, Milwaukee, USA
| | - Peter Tonellato
- Human and Molecular Genetics Center, Medical College of Wisconsin, Milwaukee, USA
| | - Michael Olivier
- Human and Molecular Genetics Center, Medical College of Wisconsin, Milwaukee, USA
| |
Collapse
|
67
|
de Bakker PIW, Yelensky R, Pe'er I, Gabriel SB, Daly MJ, Altshuler D. Efficiency and power in genetic association studies. Nat Genet 2005; 37:1217-23. [PMID: 16244653 DOI: 10.1038/ng1669] [Citation(s) in RCA: 1376] [Impact Index Per Article: 72.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2005] [Accepted: 09/27/2005] [Indexed: 02/06/2023]
Abstract
We investigated selection and analysis of tag SNPs for genome-wide association studies by specifically examining the relationship between investment in genotyping and statistical power. Do pairwise or multimarker methods maximize efficiency and power? To what extent is power compromised when tags are selected from an incomplete resource such as HapMap? We addressed these questions using genotype data from the HapMap ENCODE project, association studies simulated under a realistic disease model, and empirical correction for multiple hypothesis testing. We demonstrate a haplotype-based tagging method that uniformly outperforms single-marker tests and methods for prioritization that markedly increase tagging efficiency. Examining all observed haplotypes for association, rather than just those that are proxies for known SNPs, increases power to detect rare causal alleles, at the cost of reduced power to detect common causal alleles. Power is robust to the completeness of the reference panel from which tags are selected. These findings have implications for prioritizing tag SNPs and interpreting association studies.
Collapse
Affiliation(s)
- Paul I W de Bakker
- Center for Human Genetic Research, Massachusetts General Hospital, 185 Cambridge Street, CPZN-6818, Boston, Massachusetts 02114-2790, USA
| | | | | | | | | | | |
Collapse
|
68
|
Karsak M, Cohen-Solal M, Freudenberg J, Ostertag A, Morieux C, Kornak U, Essig J, Erxlebe E, Bab I, Kubisch C, de Vernejoul MC, Zimmer A. Cannabinoid receptor type 2 gene is associated with human osteoporosis. Hum Mol Genet 2005; 14:3389-96. [PMID: 16204352 DOI: 10.1093/hmg/ddi370] [Citation(s) in RCA: 143] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Osteoporosis is one of the most common degenerative diseases. It is characterized by reduced bone mineral density (BMD) with an increased risk for bone fractures. There is a substantial genetic contribution to BMD, although the genetic factors involved in the pathogenesis of human osteoporosis are largely unknown. Mice with a targeted deletion of either the cannabinoid receptor type 1 (Cnr1) or type 2 (Cnr2) gene show an alteration of bone mass, and pharmacological modification of both receptors can regulate osteoclast activity and BMD. We therefore analyzed both genes in a systematic genetic association study in a human sample of postmenopausal osteoporosis patients and matched female controls. We found a significant association of single polymorphisms (P = 0.0014) and haplotypes (P = 0.0001) encompassing the CNR2 gene on human chromosome 1p36, whereas we found no convincing association for CNR1. These results demonstrate a role for the peripherally expressed CB2 receptor in the etiology of osteoporosis and provide an interesting novel therapeutical target for this severe and common disease.
Collapse
Affiliation(s)
- Meliha Karsak
- Department of Psychiatry, Life and Brain Center, University of Bonn, Germany
| | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
69
|
Gu S, Pakstis AJ, Kidd KK. HAPLOT: a graphical comparison of haplotype blocks, tagSNP sets and SNP variation for multiple populations. Bioinformatics 2005; 21:3938-9. [PMID: 16131520 DOI: 10.1093/bioinformatics/bti649] [Citation(s) in RCA: 28] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
UNLABELLED Understanding of human variation relevant to association studies can benefit from population comparison, especially comparing populations in the same geographical region. Variations in linkage disequilibrium patterns, in tagSNP sets, and in SNP heterozygosities among populations can be used to infer the evolutionary pattern. We present here a win32 system based Perl/Tk application for visual comparisons of these variations in different populations. AVAILABILITY The application package is available at http://info.med.yale.edu/genetics/kkidd/programs.html CONTACT sheng.gu@yale.edu.
Collapse
Affiliation(s)
- Sheng Gu
- Department of Genetics, Yale University School of Medicine, New Haven, USA.
| | | | | |
Collapse
|
70
|
Sklar P. Principles of haplotype mapping and potential applications to attention-deficit/hyperactivity disorder. Biol Psychiatry 2005; 57:1357-66. [PMID: 15950008 DOI: 10.1016/j.biopsych.2005.01.005] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/18/2004] [Revised: 12/07/2004] [Accepted: 01/03/2005] [Indexed: 10/25/2022]
Abstract
Approaches to the study of common, complex genetic disorders like attention-deficit/hyperactivity disorder (ADHD) are evolving rapidly. Traditional linkage and association mapping each have distinct roles to play. Rapid advances in genomic information and technologies make association studies more attractive, including the possibility in the near future of whole genome association scans. This review covers the following broad topics: 1) the principles of linkage and association analyses as they apply to ADHD, and 2) the implications of genome architecture for association studies of complex diseases like ADHD. The structure of linkage disequilibrium is approached through review of the statistical measures of allelic associations and their relationship to observed haplotypes. The patterns of haplotypes across the human genome are discussed, as well as the implications of linkage disequilibrium mapping for association studies in general and ADHD specifically. Finally, the extent to which the allelic architecture of a candidate ADHD gene is publicly available and the web resources to access this information are covered. Today, the wealth of polymorphism data available on the worldwide web enables researchers to focus powerful methodologic tools on candidate genes and regions of interest. Coupling this with larger patient collections and more refined phenotyping will move forward the identification of disease-associated polymorphisms and ultimately the development of genetically based pharmaceuticals and diagnostic tests.
Collapse
Affiliation(s)
- Pamela Sklar
- Harvard Medical School, Department of Psychiatry, and the Psychiatric and Neurodevelopmental Genetics Unit, Massachusetts General Hospital, Charlestown, Massachusetts 02129, USA.
| |
Collapse
|
71
|
Luo HR, Hou ZF, Wu J, Zhang YP, Wan YJY. Evolution of the DRD2 gene haplotype and its association with alcoholism in Mexican Americans. Alcohol 2005; 36:117-25. [PMID: 16396745 DOI: 10.1016/j.alcohol.2005.09.003] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2005] [Revised: 09/01/2005] [Accepted: 09/08/2005] [Indexed: 01/21/2023]
Abstract
The human D2 dopamine receptor gene (DRD2) plays a central role in the neuromodulation of appetitive behaviors and is implicated in having a possible role in susceptibility to alcoholism. We genotyped an SNP in DRD2 Exon 8 in 251 nonalcoholic, unrelated, healthy controls and 200 alcoholic Mexican Americans. The DRD2 haplotypes were analyzed using the Exon 8 genotype in combination with five other SNP genotypes, which were obtained from our previous study. The ancestral origins of the DRD2 polymorphisms have been determined by sequencing the homologous region in other higher primates. Twenty DRD2 haplotypes, defined as H1 to H20 based on their frequency from high to low, were obtained in this major minority population. The ancestral haplotype "I-B2-G-C-G-A1" and two one-step mutation haplotypes were absent in our study population. The haplotype H1, "I-B1-T-C-A-A1", with the highest frequency in the population, is a three-step mutation from the ancestral form. The first five or eight major haplotypes make up 87% or 95% of the entire population, respectively. The prevalence of the haplotype H1+ (H1/H1 and H1/Hn genotypes) is significantly higher in alcoholics and alcoholic subgroups, including early onset drinkers and benders, than in their respective control groups. The Promoter -141C allele is in linkage disequilibrium (LD) with five other loci in the nonalcoholic group, but not in the alcoholic group. All of the other five loci are in LD in both the alcoholic and control groups. The DRD2 TaqI B allele is in complete LD with the allele located in intron 6. Five SNPs, Promoter -141C, TaqI B (or Intron 6), Exon 7, Exon 8, and TaqI A, are sufficient to define the DRD2 haplotypes in Mexican Americans. Our data indicate that the DRD2 haplotypes are associated with alcoholism in Mexican Americans.
Collapse
Affiliation(s)
- Huai-Rong Luo
- Department of Pharmacology, Toxicology & Therapeutics, Breidenthal Building, Mail Stop 1018, The University of Kansas Medical Center, 3901 Rainbow Blvd., Kansas City, KS 66160-7417, USA
| | | | | | | | | |
Collapse
|
72
|
Kanková K, Stejskalová A, Hertlová M, Znojil V. Haplotype analysis of the RAGE gene: identification of a haplotype marker for diabetic nephropathy in type 2 diabetes mellitus. Nephrol Dial Transplant 2005; 20:1093-102. [PMID: 15790669 DOI: 10.1093/ndt/gfh711] [Citation(s) in RCA: 37] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
BACKGROUND Diabetic nephropathy (DN) represents a devastating complication of diabetes. Family clustering, heterogeneity in the onset and progression and results of segregation studies indicate that susceptibility to DN is a complex trait. METHODS Common single nucleotide polymorphisms in the RAGE (receptor of advanced glycation end-products) gene (-429T/C, -374T/A, G82S, 1704G/T, 2184A/G and 2245G/A) were studied in the association study comprising 605 Caucasian subjects by means of haplotype analysis in order to identify an eventual haplotype marker for DN in type 2 diabetes. Haplotypes were constructed computationally; frequencies were compared among groups of subjects with type 2 diabetes (DM) and DN, diabetics without DN and non-diabetics. Survival analysis was carried out to ascertain whether certain RAGE haplotypes influence onset of DN in type 2 diabetics. RESULTS Significant differences in haplotype frequencies among DM + DN vs DM non-DN and non-DM groups were found (P = 0.0007 and 0.0013, respectively; permutation test). Frequency of the RAGE(2) haplotype containing minor alleles in positions -429 and 2184 (CTGGGG) in the DN group was significantly higher than in the two control groups (21.7% vs 12.8% and 13.8%, both P(corr)<0.003; two-tail Fisher exact test); odds ratios 1.65 [95% confidence interval (CI): 1.08-2.50; P = 0.020] and 1.79 (95% CI: 1.22-2.62; P = 0.003), respectively. In survival analysis, duration of diabetes until the onset of DN (e.g. appearance of persistent proteinuria) was significantly different among RAGE(2) diplotype groups (P<0.05); median DN-free interval was 9.6 years in RAGE(2) +/+ homozygotes, 15.2 years in +/- heterozygotes and 17.0 years in the -/- combination. CONCLUSIONS The RAGE(2) haplotype is associated with DN in type 2 diabetics and with earlier DN onset and, thus, can be regarded a marker for DN.
Collapse
Affiliation(s)
- Katerina Kanková
- Masaryk University Brno, Faculty of Medicine, Department of Pathophysiology, Komenského nám. 2, 66243 Brno, Czech Republic.
| | | | | | | |
Collapse
|
73
|
Zhang F, Zhao Z. SNPNB: analyzing neighboring-nucleotide biases on single nucleotide polymorphisms (SNPs). Bioinformatics 2005; 21:2517-9. [PMID: 15769840 DOI: 10.1093/bioinformatics/bti377] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
UNLABELLED SNPNB is a user-friendly and platform-independent application for analyzing Single Nucleotide Polymorphism NeighBoring sequence context and nucleotide bias patterns, and subsequently evaluating the effective SNP size for the bias patterns observed from the whole data. It was implemented by Java and Perl. SNPNB can efficiently handle genome-wide or chromosome-wide SNP data analysis in a PC or a workstation. It provides visualizations of the bias patterns for SNPs or each type of SNPs. AVAILABILITY SNPNB and its full description are freely available at http://bioinfo.vipbg.vcu.edu/SNPNB/
Collapse
Affiliation(s)
- Fengkai Zhang
- Virginia Institute for Psychiatric and Behavioral Genetics, Virginia Commonwealth University, Richmond, VA 23298, USA
| | | |
Collapse
|
74
|
Current Awareness on Comparative and Functional Genomics. Comp Funct Genomics 2005. [PMCID: PMC2447509 DOI: 10.1002/cfg.490] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022] Open
|