1
|
Montasser ME, Van Hout CV, Miloscio L, Howard AD, Rosenberg A, Callaway M, Shen B, Li N, Locke AE, Verweij N, De T, Ferreira MA, Lotta LA, Baras A, Daly TJ, Hartford SA, Lin W, Mao Y, Ye B, White D, Gong G, Perry JA, Ryan KA, Fang Q, Tzoneva G, Pefanis E, Hunt C, Tang Y, Lee L, Sztalryd-Woodle C, Mitchell BD, Healy M, Streeten EA, Taylor SI, O'Connell JR, Economides AN, Della Gatta G, Shuldiner AR. Genetic and functional evidence links a missense variant in B4GALT1 to lower LDL and fibrinogen. Science 2021; 374:1221-1227. [PMID: 34855475 DOI: 10.1126/science.abe0348] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/14/2023]
Abstract
[Figure: see text].
Collapse
Affiliation(s)
- May E Montasser
- Division of Endocrinology, Diabetes and Nutrition and Program for Personalized and Genomic Medicine, Department of Medicine, University of Maryland School of Medicine, Baltimore, MD 21201, USA
| | - Cristopher V Van Hout
- Regeneron Genetics Center, LLC, Tarrytown, NY 10591, USA.,Laboratorio Internacional de Investigatión sobre el Genoma Humano, Campus Juriquilla de la Universidad Nacional Autónoma de México, Querétaro, Querétaro 76230, México
| | | | - Alicia D Howard
- Division of Endocrinology, Diabetes and Nutrition and Program for Personalized and Genomic Medicine, Department of Medicine, University of Maryland School of Medicine, Baltimore, MD 21201, USA.,Center for Biologics Evaluation and Research, US Food and Drug Administration, Silver Spring, MD 20993, USA
| | | | | | - Biao Shen
- Regeneron Pharmaceuticals, Inc., Tarrytown, NY 10591, USA
| | - Ning Li
- Regeneron Pharmaceuticals, Inc., Tarrytown, NY 10591, USA
| | - Adam E Locke
- Regeneron Genetics Center, LLC, Tarrytown, NY 10591, USA
| | - Niek Verweij
- Regeneron Genetics Center, LLC, Tarrytown, NY 10591, USA
| | - Tanima De
- Regeneron Genetics Center, LLC, Tarrytown, NY 10591, USA
| | | | - Luca A Lotta
- Regeneron Genetics Center, LLC, Tarrytown, NY 10591, USA
| | - Aris Baras
- Regeneron Genetics Center, LLC, Tarrytown, NY 10591, USA
| | - Thomas J Daly
- Regeneron Pharmaceuticals, Inc., Tarrytown, NY 10591, USA
| | | | - Wei Lin
- Regeneron Pharmaceuticals, Inc., Tarrytown, NY 10591, USA
| | - Yuan Mao
- Regeneron Pharmaceuticals, Inc., Tarrytown, NY 10591, USA
| | - Bin Ye
- Regeneron Genetics Center, LLC, Tarrytown, NY 10591, USA
| | - Derek White
- Regeneron Pharmaceuticals, Inc., Tarrytown, NY 10591, USA
| | - Guochun Gong
- Regeneron Pharmaceuticals, Inc., Tarrytown, NY 10591, USA
| | - James A Perry
- Division of Endocrinology, Diabetes and Nutrition and Program for Personalized and Genomic Medicine, Department of Medicine, University of Maryland School of Medicine, Baltimore, MD 21201, USA
| | - Kathleen A Ryan
- Division of Endocrinology, Diabetes and Nutrition and Program for Personalized and Genomic Medicine, Department of Medicine, University of Maryland School of Medicine, Baltimore, MD 21201, USA
| | - Qing Fang
- Regeneron Pharmaceuticals, Inc., Tarrytown, NY 10591, USA
| | - Gannie Tzoneva
- Regeneron Genetics Center, LLC, Tarrytown, NY 10591, USA
| | | | - Charleen Hunt
- Regeneron Pharmaceuticals, Inc., Tarrytown, NY 10591, USA
| | - Yajun Tang
- Regeneron Pharmaceuticals, Inc., Tarrytown, NY 10591, USA
| | - Lynn Lee
- Regeneron Pharmaceuticals, Inc., Tarrytown, NY 10591, USA
| | | | - Carole Sztalryd-Woodle
- Division of Endocrinology, Diabetes and Nutrition and Program for Personalized and Genomic Medicine, Department of Medicine, University of Maryland School of Medicine, Baltimore, MD 21201, USA.,US Department of Veterans Affairs, Washington, DC 20420 USA
| | - Braxton D Mitchell
- Division of Endocrinology, Diabetes and Nutrition and Program for Personalized and Genomic Medicine, Department of Medicine, University of Maryland School of Medicine, Baltimore, MD 21201, USA.,Geriatrics Research and Education Clinical Center, VA Medical Center, Baltimore, MD 21201, USA
| | | | - Elizabeth A Streeten
- Division of Endocrinology, Diabetes and Nutrition and Program for Personalized and Genomic Medicine, Department of Medicine, University of Maryland School of Medicine, Baltimore, MD 21201, USA.,Division of Genetics, Department of Pediatrics, University of Maryland School of Medicine, Baltimore, MD 21201, USA
| | - Simeon I Taylor
- Division of Endocrinology, Diabetes and Nutrition and Program for Personalized and Genomic Medicine, Department of Medicine, University of Maryland School of Medicine, Baltimore, MD 21201, USA
| | - Jeffrey R O'Connell
- Division of Endocrinology, Diabetes and Nutrition and Program for Personalized and Genomic Medicine, Department of Medicine, University of Maryland School of Medicine, Baltimore, MD 21201, USA
| | - Aris N Economides
- Regeneron Genetics Center, LLC, Tarrytown, NY 10591, USA.,Regeneron Pharmaceuticals, Inc., Tarrytown, NY 10591, USA
| | | | | |
Collapse
|
2
|
Rare variants regulate expression of nearby individual genes in multiple tissues. PLoS Genet 2021; 17:e1009596. [PMID: 34061836 PMCID: PMC8195400 DOI: 10.1371/journal.pgen.1009596] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2021] [Revised: 06/11/2021] [Accepted: 05/11/2021] [Indexed: 12/30/2022] Open
Abstract
The rapid decrease in sequencing cost has enabled genetic studies to discover rare variants associated with complex diseases and traits. Once this association is identified, the next step is to understand the genetic mechanism of rare variants on how the variants influence diseases. Similar to the hypothesis of common variants, rare variants may affect diseases by regulating gene expression, and recently, several studies have identified the effects of rare variants on gene expression using heritability and expression outlier analyses. However, identifying individual genes whose expression is regulated by rare variants has been challenging due to the relatively small sample size of expression quantitative trait loci studies and statistical approaches not optimized to detect the effects of rare variants. In this study, we analyze whole-genome sequencing and RNA-seq data of 681 European individuals collected for the Genotype-Tissue Expression (GTEx) project (v8) to identify individual genes in 49 human tissues whose expression is regulated by rare variants. To improve statistical power, we develop an approach based on a likelihood ratio test that combines effects of multiple rare variants in a nonlinear manner and has higher power than previous approaches. Using GTEx data, we identify many genes regulated by rare variants, and some of them are only regulated by rare variants and not by common variants. We also find that genes regulated by rare variants are enriched for expression outliers and disease-causing genes. These results suggest the regulatory effects of rare variants, which would be important in interpreting associations of rare variants with complex traits. It has been shown that rare variants may affect many diseases including both rare and common diseases with the advent of next-generation sequencing technology. An important question is how rare variants affect diseases or traits, especially whether or how they regulate gene expression as they may affect diseases through gene regulation. However, it is challenging to identify the regulatory effects of rare variants because it often requires large sample sizes and the existing statistical approaches are not optimized for it. Here, we develop a novel method, LRT-q, based on a likelihood ratio test that aggregates the effects of multiple rare variants nonlinearly to achieve higher statistical power than previous rare variant association methods. We apply LRT-q to the latest GTEx v8 dataset and identify regulatory effect of rare variants on individual genes. We also observe that genes regulated by rare variants are likely to be disease-causing genes. These results demonstrate the functional effects of rare variants, especially on gene expression, which provides important biological insights in understanding the genetic mechanism of rare variants in complex traits and diseases.
Collapse
|
3
|
Hormozdiari F, Jung J, Eskin E, J. Joo JW. MARS: leveraging allelic heterogeneity to increase power of association testing. Genome Biol 2021; 22:128. [PMID: 33931127 PMCID: PMC8086090 DOI: 10.1186/s13059-021-02353-8] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2020] [Accepted: 04/15/2021] [Indexed: 11/10/2022] Open
Abstract
In standard genome-wide association studies (GWAS), the standard association test is underpowered to detect associations between loci with multiple causal variants with small effect sizes. We propose a statistical method, Model-based Association test Reflecting causal Status (MARS), that finds associations between variants in risk loci and a phenotype, considering the causal status of variants, only requiring the existing summary statistics to detect associated risk loci. Utilizing extensive simulated data and real data, we show that MARS increases the power of detecting true associated risk loci compared to previous approaches that consider multiple variants, while controlling the type I error.
Collapse
Affiliation(s)
- Farhad Hormozdiari
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, 02115 MA USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA USA
| | - Junghyun Jung
- Department of Life Science, Dongguk University-Seoul, Seoul, 04620 South Korea
| | - Eleazar Eskin
- Department of Computer Science, University of California, Los Angeles, Los Angeles, 90095 CA USA
- Department of Human Genetics, University of California, Los Angeles, Los Angeles, 90095 CA USA
| | - Jong Wha J. Joo
- Department of Computer Science and Engineering, Dongguk University-Seoul, Seoul, 04620 South Korea
| |
Collapse
|
4
|
[An improved association analysis pipeline for tumor susceptibility variant in haplotype amplification area]. NAN FANG YI KE DA XUE XUE BAO = JOURNAL OF SOUTHERN MEDICAL UNIVERSITY 2020; 40:1493-1499. [PMID: 33118521 PMCID: PMC7606235 DOI: 10.12122/j.issn.1673-4254.2020.10.16] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Indexed: 01/23/2023]
Abstract
OBJECTIVE Haplotype amplification on germline variants is suggested to imply potential selective advantages and clonal expansion susceptibility and has become an important signature for seeking cancer susceptibility gene.Here we propose an improved association method that fully considers the haplotype amplification status. METHODS The haplotype amplification status was estimated by the variant allelic frequencies.We adopted a permutation test on variant allelic frequencies to divide the candidate variants into multiple groups.A likelihood clustering method was then applied to establish the neighborhood system of the hidden Markov random field framework.A filtering pipeline was introduced into the proposed method to further refine the candidate variants, including a Wilson's interval filter and a false discovery rate controller.The final candidate set along with the haplotype amplification status was collapsed into the weighted virtual sites for association tests. RESULTS Through simulated tests on a series of datasets, we compared the type Ⅰ error rates of different minor allele frequencies, which stably fell within 2%, suggesting good robustness of the algorithm.In addition, we compared another 5 published association approaches for Type-Ⅰ and Type-Ⅱ error rates with the proposed method, which resulted in the error rates all within 2%, demonstrating significant advantages and a good statistical ability of the proposed method. CONCLUSIONS The proposed method can accurately identify tumor susceptibility variants in haplotype amplification area with good robustness and stability.
Collapse
|
5
|
Geng Y, Zhao Z, Zhang X, Wang W, Cui X, Ye K, Xiao X, Wang J. An improved burden-test pipeline for identifying associations from rare germline and somatic variants. BMC Genomics 2017. [PMID: 29513197 PMCID: PMC5657102 DOI: 10.1186/s12864-017-4133-4] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Identifying rare germline and somatic variants associated with cancer progression is an important research topic in cancer genomics. Although many approaches are proposed for rare variant association study, they are not fit for cancer sequencing data due to multiple issues, such as overly relying on pre-selection, losing sight of interacting hotspots, etc. RESULTS In this article, we propose an improved pipeline to identify germline variant and somatic mutation interactions influencing cancer susceptibility from pair-wise cancer sequencing data. The proposed pipeline, RareProb-C performs an algorithmic selection on the given variants by incorporating the variant allelic frequencies. The interactions among the variants are considered within the regions which are limited by a four-gamete test. Then it filters singular cases according to the posterior probability at each site. Finally, it outputs the selected candidates that pass a collapse test. CONCLUSIONS We apply RareProb-C on a series of carefully constructed simulation cases and it outperforms six existing genetic model-free approaches. We also test RareProb-C on 429 TCGA ovarian cancer cases, and RareProb-C successfully identifies the known highlighted variants which are considered increasing disease susceptibilities.
Collapse
Affiliation(s)
- Yu Geng
- School of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, 710049, Shaanxi, China.,Jinzhou Medical University, Jinzhou, Liaoning, 121001, China
| | - Zhongmeng Zhao
- School of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, 710049, Shaanxi, China. .,Institute of Data Science and Information Quality, Shaanxi Engineering Research Center of Medical and Health Big Data, Xi'an Jiaotong University, Xi'an, 710049, Shaanxi, China.
| | - Xuanping Zhang
- School of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, 710049, Shaanxi, China.,Institute of Data Science and Information Quality, Shaanxi Engineering Research Center of Medical and Health Big Data, Xi'an Jiaotong University, Xi'an, 710049, Shaanxi, China
| | - Wenke Wang
- School of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, 710049, Shaanxi, China
| | - Xingjian Cui
- School of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, 710049, Shaanxi, China.,Institute of Data Science and Information Quality, Shaanxi Engineering Research Center of Medical and Health Big Data, Xi'an Jiaotong University, Xi'an, 710049, Shaanxi, China
| | - Kai Ye
- School of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, 710049, Shaanxi, China.,Institute of Data Science and Information Quality, Shaanxi Engineering Research Center of Medical and Health Big Data, Xi'an Jiaotong University, Xi'an, 710049, Shaanxi, China
| | - Xiao Xiao
- Institute of Data Science and Information Quality, Shaanxi Engineering Research Center of Medical and Health Big Data, Xi'an Jiaotong University, Xi'an, 710049, Shaanxi, China.,State Key Laboratory of Cancer Biology, Xijing Hospital of Digestive Diseases, Xi'an, 710032, Shaanxi, China
| | - Jiayin Wang
- School of Management, Xi'an Jiaotong University, Xi'an, 710049, Shaanxi, China. .,Institute of Data Science and Information Quality, Shaanxi Engineering Research Center of Medical and Health Big Data, Xi'an Jiaotong University, Xi'an, 710049, Shaanxi, China.
| |
Collapse
|
6
|
Incorporating Non-Coding Annotations into Rare Variant Analysis. PLoS One 2016; 11:e0154181. [PMID: 27128317 PMCID: PMC4851421 DOI: 10.1371/journal.pone.0154181] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2015] [Accepted: 04/11/2016] [Indexed: 11/25/2022] Open
Abstract
BACKGROUND The success of collapsing methods which investigate the combined effect of rare variants on complex traits has so far been limited. The manner in which variants within a gene are selected prior to analysis has a crucial impact on this success, which has resulted in analyses conventionally filtering variants according to their consequence. This study investigates whether an alternative approach to filtering, using annotations from recently developed bioinformatics tools, can aid these types of analyses in comparison to conventional approaches. METHODS & RESULTS We conducted a candidate gene analysis using the UK10K sequence and lipids data, filtering according to functional annotations using the resource CADD (Combined Annotation-Dependent Depletion) and contrasting results with 'nonsynonymous' and 'loss of function' consequence analyses. Using CADD allowed the inclusion of potentially deleterious intronic variants, which was not possible when filtering by consequence. Overall, different filtering approaches provided similar evidence of association, although filtering according to CADD identified evidence of association between ANGPTL4 and High Density Lipoproteins (P = 0.02, N = 3,210) which was not observed in the other analyses. We also undertook genome-wide analyses to determine how filtering in this manner compared to conventional approaches for gene regions. Results suggested that filtering by annotations according to CADD, as well as other tools known as FATHMM-MKL and DANN, identified association signals not detected when filtering by variant consequence and vice versa. CONCLUSION Incorporating variant annotations from non-coding bioinformatics tools should prove to be a valuable asset for rare variant analyses in the future. Filtering by variant consequence is only possible in coding regions of the genome, whereas utilising non-coding bioinformatics annotations provides an opportunity to discover unknown causal variants in non-coding regions as well. This should allow studies to uncover a greater number of causal variants for complex traits and help elucidate their functional role in disease.
Collapse
|
7
|
Hormozdiari F, Kichaev G, Yang WY, Pasaniuc B, Eskin E. Identification of causal genes for complex traits. Bioinformatics 2015; 31:i206-13. [PMID: 26072484 PMCID: PMC4542778 DOI: 10.1093/bioinformatics/btv240] [Citation(s) in RCA: 62] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open
Abstract
Motivation: Although genome-wide association studies (GWAS) have identified thousands of variants associated with common diseases and complex traits, only a handful of these variants are validated to be causal. We consider ‘causal variants’ as variants which are responsible for the association signal at a locus. As opposed to association studies that benefit from linkage disequilibrium (LD), the main challenge in identifying causal variants at associated loci lies in distinguishing among the many closely correlated variants due to LD. This is particularly important for model organisms such as inbred mice, where LD extends much further than in human populations, resulting in large stretches of the genome with significantly associated variants. Furthermore, these model organisms are highly structured and require correction for population structure to remove potential spurious associations. Results: In this work, we propose CAVIAR-Gene (CAusal Variants Identification in Associated Regions), a novel method that is able to operate across large LD regions of the genome while also correcting for population structure. A key feature of our approach is that it provides as output a minimally sized set of genes that captures the genes which harbor causal variants with probability ρ. Through extensive simulations, we demonstrate that our method not only speeds up computation, but also have an average of 10% higher recall rate compared with the existing approaches. We validate our method using a real mouse high-density lipoprotein data (HDL) and show that CAVIAR-Gene is able to identify Apoa2 (a gene known to harbor causal variants for HDL), while reducing the number of genes that need to be tested for functionality by a factor of 2. Availability and implementation: Software is freely available for download at genetics.cs.ucla.edu/caviar. Contact: eeskin@cs.ucla.edu
Collapse
Affiliation(s)
- Farhad Hormozdiari
- Department of Computer Science, Inter-Departmental Program in Bioinformatics, Department of Human Genetics and Department of Pathology and Laboratory Medicine, University of California, Los Angeles, CA 90095, USA
| | - Gleb Kichaev
- Department of Computer Science, Inter-Departmental Program in Bioinformatics, Department of Human Genetics and Department of Pathology and Laboratory Medicine, University of California, Los Angeles, CA 90095, USA
| | - Wen-Yun Yang
- Department of Computer Science, Inter-Departmental Program in Bioinformatics, Department of Human Genetics and Department of Pathology and Laboratory Medicine, University of California, Los Angeles, CA 90095, USA
| | - Bogdan Pasaniuc
- Department of Computer Science, Inter-Departmental Program in Bioinformatics, Department of Human Genetics and Department of Pathology and Laboratory Medicine, University of California, Los Angeles, CA 90095, USA Department of Computer Science, Inter-Departmental Program in Bioinformatics, Department of Human Genetics and Department of Pathology and Laboratory Medicine, University of California, Los Angeles, CA 90095, USA Department of Computer Science, Inter-Departmental Program in Bioinformatics, Department of Human Genetics and Department of Pathology and Laboratory Medicine, University of California, Los Angeles, CA 90095, USA
| | - Eleazar Eskin
- Department of Computer Science, Inter-Departmental Program in Bioinformatics, Department of Human Genetics and Department of Pathology and Laboratory Medicine, University of California, Los Angeles, CA 90095, USA Department of Computer Science, Inter-Departmental Program in Bioinformatics, Department of Human Genetics and Department of Pathology and Laboratory Medicine, University of California, Los Angeles, CA 90095, USA
| |
Collapse
|
8
|
Hormozdiari F, Kostem E, Kang EY, Pasaniuc B, Eskin E. Identifying causal variants at loci with multiple signals of association. Genetics 2014; 198:497-508. [PMID: 25104515 PMCID: PMC4196608 DOI: 10.1534/genetics.114.167908] [Citation(s) in RCA: 299] [Impact Index Per Article: 27.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2014] [Accepted: 07/18/2014] [Indexed: 12/22/2022] Open
Abstract
Although genome-wide association studies have successfully identified thousands of risk loci for complex traits, only a handful of the biologically causal variants, responsible for association at these loci, have been successfully identified. Current statistical methods for identifying causal variants at risk loci either use the strength of the association signal in an iterative conditioning framework or estimate probabilities for variants to be causal. A main drawback of existing methods is that they rely on the simplifying assumption of a single causal variant at each risk locus, which is typically invalid at many risk loci. In this work, we propose a new statistical framework that allows for the possibility of an arbitrary number of causal variants when estimating the posterior probability of a variant being causal. A direct benefit of our approach is that we predict a set of variants for each locus that under reasonable assumptions will contain all of the true causal variants with a high confidence level (e.g., 95%) even when the locus contains multiple causal variants. We use simulations to show that our approach provides 20-50% improvement in our ability to identify the causal variants compared to the existing methods at loci harboring multiple causal variants. We validate our approach using empirical data from an expression QTL study of CHI3L2 to identify new causal variants that affect gene expression at this locus. CAVIAR is publicly available online at http://genetics.cs.ucla.edu/caviar/.
Collapse
Affiliation(s)
- Farhad Hormozdiari
- Department of Computer Science, University of California, Los Angeles, California 90095
| | - Emrah Kostem
- Department of Computer Science, University of California, Los Angeles, California 90095
| | - Eun Yong Kang
- Department of Computer Science, University of California, Los Angeles, California 90095
| | - Bogdan Pasaniuc
- Department of Human Genetics, University of California, Los Angeles, California 90095 Department of Pathology and Laboratory Medicine, University of California, Los Angeles, California 90095
| | - Eleazar Eskin
- Department of Computer Science, University of California, Los Angeles, California 90095 Department of Human Genetics, University of California, Los Angeles, California 90095
| |
Collapse
|
9
|
Derkach A, Lawless JF, Sun L. Pooled Association Tests for Rare Genetic Variants: A Review and Some New Results. Stat Sci 2014. [DOI: 10.1214/13-sts456] [Citation(s) in RCA: 51] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
10
|
Wang J, Zhao Z, Cao Z, Yang A, Zhang J. A probabilistic method for identifying rare variants underlying complex traits. BMC Genomics 2013; 14 Suppl 1:S11. [PMID: 23369113 PMCID: PMC3549819 DOI: 10.1186/1471-2164-14-s1-s11] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Identifying the genetic variants that contribute to disease susceptibilities is important both for developing methodologies and for studying complex diseases in molecular biology. It has been demonstrated that the spectrum of minor allelic frequencies (MAFs) of risk genetic variants ranges from common to rare. Although association studies are shifting to incorporate rare variants (RVs) affecting complex traits, existing approaches do not show a high degree of success, and more efforts should be considered. RESULTS In this article, we focus on detecting associations between multiple rare variants and traits. Similar to RareCover, a widely used approach, we assume that variants located close to each other tend to have similar impacts on traits. Therefore, we introduce elevated regions and background regions, where the elevated regions are considered to have a higher chance of harboring causal variants. We propose a hidden Markov random field (HMRF) model to select a set of rare variants that potentially underlie the phenotype, and then, a statistical test is applied. Thus, the association analysis can be achieved without pre-selection by experts. In our model, each variant has two hidden states that represent the causal/non-causal status and the region status. In addition, two Bayesian processes are used to compare and estimate the genotype, phenotype and model parameters. We compare our approach to the three current methods using different types of datasets, and though these are simulation experiments, our approach has higher statistical power than the other methods. The software package, RareProb and the simulation datasets are available at: http://www.engr.uconn.edu/~jiw09003.
Collapse
Affiliation(s)
- Jiayin Wang
- Department of Computer Science and Technology, Xi'an Jiaotong University, Xi'an, PR China.
| | | | | | | | | |
Collapse
|
11
|
|
12
|
Stitziel NO, Kiezun A, Sunyaev S. Computational and statistical approaches to analyzing variants identified by exome sequencing. Genome Biol 2011; 12:227. [PMID: 21920052 PMCID: PMC3308043 DOI: 10.1186/gb-2011-12-9-227] [Citation(s) in RCA: 99] [Impact Index Per Article: 7.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/30/2023] Open
Abstract
New sequencing technology has enabled the identification of thousands of single nucleotide polymorphisms in the exome, and many computational and statistical approaches to identify disease-association signals have emerged.
Collapse
Affiliation(s)
- Nathan O Stitziel
- Division of Cardiovascular Medicine, Brigham and Women’s Hospital, Harvard Medical School, 75 Francis Street, Boston, MA 02115, USA
| | | | | |
Collapse
|