1
|
Chen H, Shu J, Maley CC, Liu L. A Mouse-Specific Model to Detect Genes under Selection in Tumors. Cancers (Basel) 2023; 15:5156. [PMID: 37958330 PMCID: PMC10647215 DOI: 10.3390/cancers15215156] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2023] [Revised: 10/16/2023] [Accepted: 10/18/2023] [Indexed: 11/15/2023] Open
Abstract
The mouse is a widely used model organism in cancer research. However, no computational methods exist to identify cancer driver genes in mice due to a lack of labeled training data. To address this knowledge gap, we adapted the GUST (Genes Under Selection in Tumors) model, originally trained on human exomes, to mouse exomes via transfer learning. The resulting tool, called GUST-mouse, can estimate long-term and short-term evolutionary selection in mouse tumors, and distinguish between oncogenes, tumor suppressor genes, and passenger genes using high-throughput sequencing data. We applied GUST-mouse to analyze 65 exomes of mouse primary breast cancer models and 17 exomes of mouse leukemia models. Comparing the predictions between cancer types and between human and mouse tumors revealed common and unique driver genes. The GUST-mouse method is available as an open-source R package on github.
Collapse
Affiliation(s)
- Hai Chen
- College of Health Solutions, Arizona State University, Phoenix, AZ 85004, USA; (H.C.); (J.S.)
- Biodesign Institute, Arizona State University, Tempe, AZ 85281, USA;
| | - Jingmin Shu
- College of Health Solutions, Arizona State University, Phoenix, AZ 85004, USA; (H.C.); (J.S.)
- Biodesign Institute, Arizona State University, Tempe, AZ 85281, USA;
| | - Carlo C. Maley
- Biodesign Institute, Arizona State University, Tempe, AZ 85281, USA;
- Arizona Cancer Evolution Center, Arizona State University, Tempe, AZ 85281, USA
| | - Li Liu
- College of Health Solutions, Arizona State University, Phoenix, AZ 85004, USA; (H.C.); (J.S.)
- Biodesign Institute, Arizona State University, Tempe, AZ 85281, USA;
- Arizona Cancer Evolution Center, Arizona State University, Tempe, AZ 85281, USA
| |
Collapse
|
2
|
Chandrashekar P, Ahmadinejad N, Wang J, Sekulic A, Egan JB, Asmann YW, Kumar S, Maley C, Liu L. Somatic selection distinguishes oncogenes and tumor suppressor genes. Bioinformatics 2020; 36:1712-1717. [PMID: 32176769 PMCID: PMC7703750 DOI: 10.1093/bioinformatics/btz851] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2019] [Revised: 10/22/2019] [Accepted: 11/12/2019] [Indexed: 02/06/2023] Open
Abstract
Motivation Functions of cancer driver genes vary substantially across tissues and organs. Distinguishing passenger genes, oncogenes (OGs) and tumor-suppressor genes (TSGs) for each cancer type is critical for understanding tumor biology and identifying clinically actionable targets. Although many computational tools are available to predict putative cancer driver genes, resources for context-aware classifications of OGs and TSGs are limited. Results We show that the direction and magnitude of somatic selection of protein-coding mutations are significantly different for passenger genes, OGs and TSGs. Based on these patterns, we develop a new method (genes under selection in tumors) to discover OGs and TSGs in a cancer-type specific manner. Genes under selection in tumors shows a high accuracy (92%) when evaluated via strict cross-validations. Its application to 10 172 tumor exomes found known and novel cancer drivers with high tissue-specificities. In 11 out of 13 OGs shared among multiple cancer types, we found functional domains selectively engaged in different cancers, suggesting differences in disease mechanisms. Availability and implementation An R implementation of the GUST algorithm is available at https://github.com/liliulab/gust. A database with pre-computed results is available at https://liliulab.shinyapps.io/gust. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Pramod Chandrashekar
- College of Health Solutions, Arizona State University, Phoenix, AZ, 85004, USA.,Center for Personalized Diagnostics, Biodesign Institute, Arizona State University, Tempe, AZ, 85281, USA
| | - Navid Ahmadinejad
- College of Health Solutions, Arizona State University, Phoenix, AZ, 85004, USA.,Center for Personalized Diagnostics, Biodesign Institute, Arizona State University, Tempe, AZ, 85281, USA
| | - Junwen Wang
- College of Health Solutions, Arizona State University, Phoenix, AZ, 85004, USA.,Department of Health Sciences Research & Center for Individualized Medicine, Mayo Clinic Arizona, Scottsdale, AZ, 85259, USA
| | - Aleksandar Sekulic
- Department of Health Sciences Research & Center for Individualized Medicine, Mayo Clinic Arizona, Scottsdale, AZ, 85259, USA
| | - Jan B Egan
- Department of Health Sciences Research & Center for Individualized Medicine, Mayo Clinic Arizona, Scottsdale, AZ, 85259, USA
| | - Yan W Asmann
- Department of Health Sciences Research, Mayo Clinic Florida, Jacksonville, AZ, 32224, USA
| | - Sudhir Kumar
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA, 19122, USA.,Department of Biology, Temple University, Philadelphia, PA, 19122, USA
| | - Carlo Maley
- Center for Personalized Diagnostics, Biodesign Institute, Arizona State University, Tempe, AZ, 85281, USA
| | - Li Liu
- College of Health Solutions, Arizona State University, Phoenix, AZ, 85004, USA.,Center for Personalized Diagnostics, Biodesign Institute, Arizona State University, Tempe, AZ, 85281, USA.,Department of Health Sciences Research & Center for Individualized Medicine, Mayo Clinic Arizona, Scottsdale, AZ, 85259, USA
| |
Collapse
|
3
|
Guan X, Runger G, Liu L. Dynamic incorporation of prior knowledge from multiple domains in biomarker discovery. BMC Bioinformatics 2020; 21:77. [PMID: 32164534 PMCID: PMC7068914 DOI: 10.1186/s12859-020-3344-x] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022] Open
Abstract
Background In biomarker discovery, applying domain knowledge is an effective approach to eliminating false positive features, prioritizing functionally impactful markers and facilitating the interpretation of predictive signatures. Several computational methods have been developed that formulate the knowledge-based biomarker discovery as a feature selection problem guided by prior information. These methods often require that prior information is encoded as a single score and the algorithms are optimized for biological knowledge of a specific type. However, in practice, domain knowledge from diverse resources can provide complementary information. But no current methods can integrate heterogeneous prior information for biomarker discovery. To address this problem, we developed the Know-GRRF (know-guided regularized random forest) method that enables dynamic incorporation of domain knowledge from multiple disciplines to guide feature selection. Results Know-GRRF embeds domain knowledge in a regularized random forest framework. It combines prior information from multiple domains in a linear model to derive a composite score, which, together with other tuning parameters, controls the regularization of the random forests model. Know-GRRF concurrently optimizes the weight given to each type of domain knowledge and other tuning parameters to minimize the AIC of out-of-bag predictions. The objective is to select a compact feature subset that has a high discriminative power and strong functional relevance to the biological phenotype. Via rigorous simulations, we show that Know-GRRF guided by multiple-domain prior information outperforms feature selection methods guided by single-domain prior information or no prior information. We then applied Known-GRRF to a real-world study to identify prognostic biomarkers of prostate cancers. We evaluated the combination of cancer-related gene annotations, evolutionary conservation and pre-computed statistical scores as the prior knowledge to assemble a panel of biomarkers. We discovered a compact set of biomarkers with significant improvements on prediction accuracies. Conclusions Know-GRRF is a powerful novel method to incorporate knowledge from multiple domains for feature selection. It has a broad range of applications in biomarker discoveries. We implemented this method and released a KnowGRRF package in the R/CRAN archive.
Collapse
Affiliation(s)
- Xin Guan
- College of Health Solutions, Arizona State University, Phoenix, AZ, 85004, USA.,Intel Corporation, Chandler, AZ, 85226, USA
| | - George Runger
- College of Health Solutions, Arizona State University, Phoenix, AZ, 85004, USA
| | - Li Liu
- College of Health Solutions, Arizona State University, Phoenix, AZ, 85004, USA. .,Biodesign Institute, Arizona State University, Tempe, AZ, 85287, USA. .,Department of Neurology, Mayo Clinic, Scottsdale, AZ, 85259, USA.
| |
Collapse
|
4
|
Kim D, Han SK, Lee K, Kim I, Kong J, Kim S. Evolutionary coupling analysis identifies the impact of disease-associated variants at less-conserved sites. Nucleic Acids Res 2019; 47:e94. [PMID: 31199866 PMCID: PMC6895274 DOI: 10.1093/nar/gkz536] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2018] [Revised: 05/03/2019] [Accepted: 06/05/2019] [Indexed: 12/20/2022] Open
Abstract
Genome-wide association studies have discovered a large number of genetic variants in human patients with the disease. Thus, predicting the impact of these variants is important for sorting disease-associated variants (DVs) from neutral variants. Current methods to predict the mutational impacts depend on evolutionary conservation at the mutation site, which is determined using homologous sequences and based on the assumption that variants at well-conserved sites have high impacts. However, many DVs at less-conserved but functionally important sites cannot be predicted by the current methods. Here, we present a method to find DVs at less-conserved sites by predicting the mutational impacts using evolutionary coupling analysis. Functionally important and evolutionarily coupled sites often have compensatory variants on cooperative sites to avoid loss of function. We found that our method identified known intolerant variants in a diverse group of proteins. Furthermore, at less-conserved sites, we identified DVs that were not identified using conservation-based methods. These newly identified DVs were frequently found at protein interaction interfaces, where species-specific mutations often alter interaction specificity. This work presents a means to identify less-conserved DVs and provides insight into the relationship between evolutionarily coupled sites and human DVs.
Collapse
Affiliation(s)
- Donghyo Kim
- Department of Life Sciences, Pohang University of Science and Technology, Pohang 790-784, Korea
| | - Seong Kyu Han
- Department of Life Sciences, Pohang University of Science and Technology, Pohang 790-784, Korea
| | - Kwanghwan Lee
- Department of Life Sciences, Pohang University of Science and Technology, Pohang 790-784, Korea
| | - Inhae Kim
- Department of Life Sciences, Pohang University of Science and Technology, Pohang 790-784, Korea
| | - JungHo Kong
- Department of Life Sciences, Pohang University of Science and Technology, Pohang 790-784, Korea
| | - Sanguk Kim
- Department of Life Sciences, Pohang University of Science and Technology, Pohang 790-784, Korea
| |
Collapse
|
5
|
Liu L, Sanderford MD, Patel R, Chandrashekar P, Gibson G, Kumar S. Biological relevance of computationally predicted pathogenicity of noncoding variants. Nat Commun 2019; 10:330. [PMID: 30659175 PMCID: PMC6338804 DOI: 10.1038/s41467-018-08270-y] [Citation(s) in RCA: 29] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2018] [Accepted: 12/19/2018] [Indexed: 11/15/2022] Open
Abstract
Computational prediction of the phenotypic propensities of noncoding single nucleotide variants typically combines annotation of genomic, functional and evolutionary attributes into a single score. Here, we evaluate if the claimed excellent accuracies of these predictions translate into high rates of success in addressing questions important in biological research, such as fine mapping causal variants, distinguishing pathogenic allele(s) at a given position, and prioritizing variants for genetic risk assessment. A significant disconnect is found to exist between the statistical modelling and biological performance of predictive approaches. We discuss fundamental reasons underlying these deficiencies and suggest that future improvements of computational predictions need to address confounding of allelic, positional and regional effects as well as imbalance of the proportion of true positive variants in candidate lists. Researchers can make use of a variety of computational tools to prioritize genetic variants and predict their pathogenicity. Here, the authors evaluate the performance of six of these tools in three typical biological tasks and find generally low concordance of predictions and experimental confirmation.
Collapse
Affiliation(s)
- Li Liu
- College of Health Solutions, Biodesign Institute, Arizona State University, Tempe, AZ, USA
| | - Maxwell D Sanderford
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA, USA
| | - Ravi Patel
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA, USA.,Department of Biology, Temple University, Philadelphia, PA, USA
| | - Pramod Chandrashekar
- College of Health Solutions, Biodesign Institute, Arizona State University, Tempe, AZ, USA
| | - Greg Gibson
- School of Biological Sciences, Georgia Institute of Technology, Atlanta, GA, USA.
| | - Sudhir Kumar
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA, USA. .,Department of Biology, Temple University, Philadelphia, PA, USA.
| |
Collapse
|
6
|
Patel R, Scheinfeldt LB, Sanderford MD, Lanham TR, Tamura K, Platt A, Glicksberg BS, Xu K, Dudley JT, Kumar S. Adaptive Landscape of Protein Variation in Human Exomes. Mol Biol Evol 2018; 35:2015-2025. [PMID: 29846678 PMCID: PMC6063297 DOI: 10.1093/molbev/msy107] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022] Open
Abstract
The human genome contains hundreds of thousands of missense mutations. However, only a handful of these variants are known to be adaptive, which implies that adaptation through protein sequence change is an extremely rare phenomenon in human evolution. Alternatively, existing methods may lack the power to pinpoint adaptive variation. We have developed and applied an Evolutionary Probability Approach (EPA) to discover candidate adaptive polymorphisms (CAPs) through the discordance between allelic evolutionary probabilities and their observed frequencies in human populations. EPA reveals thousands of missense CAPs, which suggest that a large number of previously optimal alleles experienced a reversal of fortune in the human lineage. We explored nonadaptive mechanisms to explain CAPs, including the effects of demography, mutation rate variability, and negative and positive selective pressures in modern humans. Many nonadaptive hypotheses were tested, but failed to explain the data, which suggests that a large proportion of CAP alleles have increased in frequency due to beneficial selection. This suggestion is supported by the fact that a vast majority of adaptive missense variants discovered previously in humans are CAPs, and hundreds of CAP alleles are protective in genotype-phenotype association data. Our integrated phylogenomic and population genetic EPA approach predicts the existence of thousands of nonneutral candidate variants in the human proteome. We expect this collection to be enriched in beneficial variation. The EPA approach can be applied to discover candidate adaptive variation in any protein, population, or species for which allele frequency data and reliable multispecies alignments are available.
Collapse
Affiliation(s)
- Ravi Patel
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA
- Department of Biology, Temple University, Philadelphia, PA
| | - Laura B Scheinfeldt
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA
- Department of Biology, Temple University, Philadelphia, PA
- Coriell Institute for Medical Research, Camden, NJ
| | - Maxwell D Sanderford
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA
| | - Tamera R Lanham
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA
| | - Koichiro Tamura
- Department of Biology, Tokyo Metropolitan University, Tokyo, Japan
| | - Alexander Platt
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA
- Department of Biology, Temple University, Philadelphia, PA
- Center for Computational Genetics and Genomics, Temple University, Philadelphia, PA
| | - Benjamin S Glicksberg
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY
| | - Ke Xu
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY
| | - Joel T Dudley
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY
| | - Sudhir Kumar
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA
- Department of Biology, Temple University, Philadelphia, PA
- Center for Excellence in Genome Medicine and Research, King Abdulaziz University, Jeddah, Saudi Arabia
| |
Collapse
|
7
|
Variants in KCNJ11 and BAD do not predict response to ketogenic dietary therapies for epilepsy. Epilepsy Res 2015; 118:22-8. [PMID: 26590798 PMCID: PMC4819482 DOI: 10.1016/j.eplepsyres.2015.10.003] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2015] [Revised: 08/24/2015] [Accepted: 10/20/2015] [Indexed: 01/01/2023]
Abstract
Common KCNJ11 and BAD variants were not associated with KDT response. There was no consistent effect of rare variants on KDT response. Larger cohorts may show associations from variants with effect size <3 or MAF < 0.05. Variants with small effect sizes are unlikely to be clinically relevant. Variants in other genes may influence response to KDT.
In the absence of specific metabolic disorders, predictors of response to ketogenic dietary therapies (KDT) are unknown. We aimed to determine whether variants in established candidate genes KCNJ11 and BAD influence response to KDT. We sequenced KCNJ11 and BAD in individuals without previously-known glucose transporter type 1 deficiency syndrome or other metabolic disorders, who received KDT for epilepsy. Hospital records were used to obtain demographic and clinical data. Two response phenotypes were used: ≥50% seizure reduction and seizure-freedom at 3-month follow-up. Case/control association tests were conducted with KCNJ11 and BAD variants with minor allele frequency (MAF) > 0.01, using PLINK. Response to KDT in individuals with variants with MAF < 0.01 was evaluated. 303 Individuals had KCNJ11 and 246 individuals had BAD sequencing data and diet response data. Six SNPs in KCNJ11 and two in BAD had MAF > 0.01. Eight variants in KCNJ11 and seven in BAD (of which three were previously-unreported) had MAF < 0.01. No significant results were obtained from association analyses, with either KDT response phenotype. P-values were similar when accounting for ethnicity using a stratified Cochran–Mantel–Haenszel test. There did not seem to be a consistent effect of rare variants on response to KDT, although the cohort size was too small to assess significance. Common variants in KCNJ11 and BAD do not predict response to KDT for epilepsy. We can exclude, with 80% power, association from variants with a MAF of >0.05 and effect size >3. A larger sample size is needed to detect associations from rare variants or those with smaller effect sizes.
Collapse
|
8
|
McNally EM, George AL. New approaches to establish genetic causality. Trends Cardiovasc Med 2015; 25:646-52. [PMID: 25864169 DOI: 10.1016/j.tcm.2015.02.013] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/21/2015] [Revised: 02/23/2015] [Accepted: 02/24/2015] [Indexed: 01/06/2023]
Abstract
Cardiovascular medicine has evolved rapidly in the era of genomics with many diseases having primary genetic origins becoming the subject of intense investigation. The resulting avalanche of information on the molecular causes of these disorders has prompted a revolution in our understanding of disease mechanisms and provided new avenues for diagnoses. At the heart of this revolution is the need to correctly classify genetic variants discovered during the course of research or reported from clinical genetic testing. This review will address current concepts related to establishing the cause and effect relationship between genomic variants and heart diseases. A survey of general approaches used for functional annotation of variants will also be presented.
Collapse
Affiliation(s)
- Elizabeth M McNally
- Department of Medicine, Northwestern University Feinberg School of Medicine, Chicago, IL; Center for Genetic Medicine, Northwestern University Feinberg School of Medicine, Chicago, IL
| | - Alfred L George
- Department of Pharmacology, Northwestern University Feinberg School of Medicine, Chicago, IL; Center for Pharmacogenomics, Northwestern University Feinberg School of Medicine, Chicago, IL.
| |
Collapse
|
9
|
A commentary on identification of the rare compound heterozygous variants in the NEB gene in a Korean family with intellectual disability, epilepsy and early-childhood-onset generalized muscle weakness. J Hum Genet 2015; 60:161-2. [DOI: 10.1038/jhg.2014.120] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
|
10
|
Affiliation(s)
- Alfred L George
- From the Department of Pharmacology and Center for Pharmacogenomics, Northwestern University Feinberg School of Medicine, Chicago, IL.
| |
Collapse
|
11
|
Gerek NZ, Liu L, Gerold K, Biparva P, Thomas ED, Kumar S. Evolutionary Diagnosis of non-synonymous variants involved in differential drug response. BMC Med Genomics 2015; 8 Suppl 1:S6. [PMID: 25952014 PMCID: PMC4315320 DOI: 10.1186/1755-8794-8-s1-s6] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
Background Many pharmaceutical drugs are known to be ineffective or have negative side effects in a substantial proportion of patients. Genomic advances are revealing that some non-synonymous single nucleotide variants (nsSNVs) may cause differences in drug efficacy and side effects. Therefore, it is desirable to evaluate nsSNVs of interest in their ability to modulate the drug response. Results We found that the available data on the link between drug response and nsSNV is rather modest. There were only 31 distinct drug response-altering (DR-altering) and 43 distinct drug response-neutral (DR-neutral) nsSNVs in the whole Pharmacogenomics Knowledge Base (PharmGKB). However, even with this modest dataset, it was clear that existing bioinformatics tools have difficulties in correctly predicting the known DR-altering and DR-neutral nsSNVs. They exhibited an overall accuracy of less than 50%, which was not better than random diagnosis. We found that the underlying problem is the markedly different evolutionary properties between positions harboring nsSNVs linked to drug responses and those observed for inherited diseases. To solve this problem, we developed a new diagnosis method, Drug-EvoD, which was trained on the evolutionary properties of nsSNVs associated with drug responses in a sparse learning framework. Drug-EvoD achieves a TPR of 84% and a TNR of 53%, with a balanced accuracy of 69%, which improves upon other methods significantly. Conclusions The new tool will enable researchers to computationally identify nsSNVs that may affect drug responses. However, much larger training and testing datasets are needed to develop more reliable and accurate tools.
Collapse
|
12
|
Wu L, Schaid DJ, Sicotte H, Wieben ED, Li H, Petersen GM. Case-only exome sequencing and complex disease susceptibility gene discovery: study design considerations. J Med Genet 2014; 52:10-6. [PMID: 25371537 DOI: 10.1136/jmedgenet-2014-102697] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
Whole exome sequencing (WES) provides an unprecedented opportunity to identify the potential aetiological role of rare functional variants in human complex diseases. Large-scale collaborations have generated germline WES data on patients with a number of diseases, especially cancer, but less often on healthy controls under the same sequencing procedures. These data can be a valuable resource for identifying new disease susceptibility loci if study designs are appropriately applied. This review describes suggested strategies and technical considerations when focusing on case-only study designs that use WES data in complex disease scenarios. These include variant filtering based on frequency and functionality, gene prioritisation, interrogation of different data types and targeted sequencing validation. We propose that if case-only WES designs were applied in an appropriate manner, new susceptibility genes containing rare variants for human complex diseases can be detected.
Collapse
Affiliation(s)
- Lang Wu
- Department of Health Sciences Research, Mayo Clinic, Rochester, Minnesota, USA Center for Clinical and Translational Science, Mayo Clinic, Rochester, Minnesota, USA
| | - Daniel J Schaid
- Department of Health Sciences Research, Mayo Clinic, Rochester, Minnesota, USA
| | - Hugues Sicotte
- Department of Health Sciences Research, Mayo Clinic, Rochester, Minnesota, USA
| | - Eric D Wieben
- Department of Biochemistry and Molecular Biology, Mayo Clinic, Rochester, Minnesota, USA
| | - Hu Li
- Department of Molecular Pharmacology and Experimental Therapeutics, Mayo Clinic, Rochester, Minnesota, USA
| | - Gloria M Petersen
- Department of Health Sciences Research, Mayo Clinic, Rochester, Minnesota, USA
| |
Collapse
|
13
|
|
14
|
Stecher G, Liu L, Sanderford M, Peterson D, Tamura K, Kumar S. MEGA-MD: molecular evolutionary genetics analysis software with mutational diagnosis of amino acid variation. Bioinformatics 2014; 30:1305-7. [PMID: 24413669 DOI: 10.1093/bioinformatics/btu018] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Computational diagnosis of amino acid variants in the human exome is the first step in assessing the disruptive impacts of non-synonymous single nucleotide variants (nsSNVs) on human health and disease. The Molecular Evolutionary Genetics Analysis software with mutational diagnosis (MEGA-MD) is a suite of tools developed to forecast the deleteriousness of nsSNVs using multiple methods and to explore nsSNVs in the context of the variability permitted in the long-term evolution of the affected position. In its graphical interface for use on desktops, it enables interactive computational diagnosis and evolutionary exploration of nsSNVs. As a web service, MEGA-MD is suitable for diagnosing variants on an exome scale. The MEGA-MD suite intends to serve the needs for conducting low- and high-throughput analysis of nsSNVs in diverse applications.
Collapse
Affiliation(s)
- Glen Stecher
- Center for Evolutionary Medicine and Informatics, Biodesign Institute, Arizona State University (ASU), Tempe, AZ 85287, Research Center for Genomics and Bioinformatics, Tokyo Metropolitan University (TMU), Hachioji, Tokyo, Japan, Department of Biological Sciences, TMU, Tokyo, Japan, School of Life Sciences, ASU, Tempe, AZ 85287, USA and Center for Excellence in Genomic Medicine and Research, King Abdulaziz University, Jeddah, Saudi Arabia
| | | | | | | | | | | |
Collapse
|
15
|
Goswami DB, Ogawa LM, Ward JM, Miller GM, Vallender EJ. Large-scale polymorphism discovery in macaque G-protein coupled receptors. BMC Genomics 2013; 14:703. [PMID: 24119066 PMCID: PMC3907043 DOI: 10.1186/1471-2164-14-703] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2013] [Accepted: 10/04/2013] [Indexed: 01/01/2023] Open
Abstract
BACKGROUND G-protein coupled receptors (GPCRs) play an inordinately large role in human health. Variation in the genes that encode these receptors is associated with numerous disorders across the entire spectrum of disease. GPCRs also represent the single largest class of drug targets and associated pharmacogenetic effects are modulated, in part, by polymorphisms. Recently, non-human primate models have been developed focusing on naturally-occurring, functionally-parallel polymorphisms in candidate genes. This work aims to extend those studies broadly across the roughly 377 non-olfactory GPCRs. Initial efforts include resequencing 44 Indian-origin rhesus macaques (Macaca mulatta), 20 Chinese-origin rhesus macaques, and 32 cynomolgus macaques (M. fascicularis). RESULTS Using the Agilent target enrichment system, capture baits were designed for GPCRs off the human and rhesus exonic sequence. Using next generation sequencing technologies, nearly 25,000 SNPs were identified in coding sequences including over 14,000 non-synonymous and more than 9,500 synonymous protein-coding SNPs. As expected, regions showing the least evolutionary constraint show greater rates of polymorphism and greater numbers of higher frequency polymorphisms. While the vast majority of these SNPs are singletons, roughly 1,750 non-synonymous and 2,900 synonymous SNPs were found in multiple individuals. CONCLUSIONS In all three populations, polymorphism and divergence is highly concentrated in N-terminal and C-terminal domains and the third intracellular loop region of GPCRs, regions critical to ligand-binding and signaling. SNP frequencies in macaques follow a similar pattern of divergence from humans and new polymorphisms in primates have been identified that may parallel those seen in humans, helping to establish better non-human primate models of disease.
Collapse
Affiliation(s)
- Dharmendra B Goswami
- New England Primate Research Center, Harvard Medical School, One Pine Hill Drive, Southborough, MA 01772, USA.
| | | | | | | | | |
Collapse
|