1
|
Bosart K, Petreaca RC, Bouley RA. In silico analysis of several frequent SLX4 mutations appearing in human cancers. MICROPUBLICATION BIOLOGY 2024; 2024:10.17912/micropub.biology.001216. [PMID: 38828439 PMCID: PMC11143449 DOI: 10.17912/micropub.biology.001216] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Figures] [Subscribe] [Scholar Register] [Received: 04/30/2024] [Revised: 05/16/2024] [Accepted: 05/16/2024] [Indexed: 06/05/2024]
Abstract
SLX4 is an interactor and activator of structure-specific exonuclease that helps resolve tangled recombination intermediates arising at stalled replication forks. It is one of the many factors that assist with homologous recombination, the major mechanism for restarting replication. SLX4 mutations have been reported in many cancers but a pan cancer map of all the mutations has not been undertaken. Here, using data from the Catalogue of Somatic Mutations in Cancers (COSMIC), we show that mutations occur in almost every cancer and many of them truncate the protein which should severely alter the function of the enzyme. We identified a frequent R1779W point mutation that occurs in the SLX4 domain required for heterodimerization with its partner, SLX1. In silico protein structure analysis of this mutation shows that it significantly alters the protein structure and is likely to destabilize the interaction with SLX1. Although this brief communication is limited to only in silico analysis, it identifies certain high frequency SLX4 mutations in human cancers that would warrant further in vivo studies. Additionally, these mutations may be potentially actionable for drug therapies.
Collapse
Affiliation(s)
- Korey Bosart
- James Comprehensive Cancer Center, The Ohio State University, Columbus, Ohio, United States
| | - Ruben C Petreaca
- James Comprehensive Cancer Center, The Ohio State University, Columbus, Ohio, United States
- Molecular Genetics, The Ohio State University at Marion, Marion, Ohio, United States
| | - Renee A Bouley
- Chemistry and Biochemistry, The Ohio State University at Marion, Marion, Ohio, United States
| |
Collapse
|
2
|
Weber SE, Roscher-Ehrig L, Kox T, Abbadi A, Stahl A, Snowdon RJ. Genomic prediction in Brassica napus: evaluating the benefit of imputed whole-genome sequencing data. Genome 2024. [PMID: 38708850 DOI: 10.1139/gen-2023-0126] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/07/2024]
Abstract
Advances in sequencing technology allow whole plant genomes to be sequenced with high quality. Combining genotypic and phenotypic data in genomic prediction helps breeders to select crossing partners in partially phenotyped populations. In plant breeding programs, the cost of sequencing entire breeding populations still exceeds available genotyping budgets. Hence, the method for genotyping is still mainly single nucleotide polymorphism (SNP) arrays; however, arrays are unable to assess the entire genome- and population-wide diversity. A compromise involves genotyping the entire population using an SNP array and a subset of the population with whole-genome sequencing. Both datasets can then be used to impute markers from whole-genome sequencing onto the entire population. Here, we evaluate whether imputation of whole-genome sequencing data enhances genomic predictions, using data from a nested association mapping population of rapeseed (Brassica napus). Employing two cross-validation schemes that mimic scenarios for the prediction of close and distant relatives, we show that imputed marker data do not significantly improve prediction accuracy, likely due to redundancy in relationship estimates and imputation errors. In simulation studies, only small improvements were observed, further corroborating the findings. We conclude that SNP arrays are already equipped with the information that is added by imputation through relationship and linkage disequilibrium.
Collapse
Affiliation(s)
- Sven E Weber
- Department of Plant Breeding, IFZ Research Centre for Biosystems, Land Use and Nutrition, Justus Liebig University, Giessen, Germany
| | - Lennard Roscher-Ehrig
- Department of Plant Breeding, IFZ Research Centre for Biosystems, Land Use and Nutrition, Justus Liebig University, Giessen, Germany
| | | | | | - Andreas Stahl
- Julius Kuehn Institute (JKI), Federal Research Centre for Cultivated Plants, Institute for Resistance Research and Stress Tolerance, Quedlinburg, Germany
| | - Rod J Snowdon
- Department of Plant Breeding, IFZ Research Centre for Biosystems, Land Use and Nutrition, Justus Liebig University, Giessen, Germany
| |
Collapse
|
3
|
Weber SE, Chawla HS, Ehrig L, Hickey LT, Frisch M, Snowdon RJ. Accurate prediction of quantitative traits with failed SNP calls in canola and maize. FRONTIERS IN PLANT SCIENCE 2023; 14:1221750. [PMID: 37936929 PMCID: PMC10627008 DOI: 10.3389/fpls.2023.1221750] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 05/12/2023] [Accepted: 10/05/2023] [Indexed: 11/09/2023]
Abstract
In modern plant breeding, genomic selection is becoming the gold standard to select superior genotypes in large breeding populations that are only partially phenotyped. Many breeding programs commonly rely on single-nucleotide polymorphism (SNP) markers to capture genome-wide data for selection candidates. For this purpose, SNP arrays with moderate to high marker density represent a robust and cost-effective tool to generate reproducible, easy-to-handle, high-throughput genotype data from large-scale breeding populations. However, SNP arrays are prone to technical errors that lead to failed allele calls. To overcome this problem, failed calls are often imputed, based on the assumption that failed SNP calls are purely technical. However, this ignores the biological causes for failed calls-for example: deletions-and there is increasing evidence that gene presence-absence and other kinds of genome structural variants can play a role in phenotypic expression. Because deletions are frequently not in linkage disequilibrium with their flanking SNPs, permutation of missing SNP calls can potentially obscure valuable marker-trait associations. In this study, we analyze published datasets for canola and maize using four parametric and two machine learning models and demonstrate that failed allele calls in genomic prediction are highly predictive for important agronomic traits. We present two statistical pipelines, based on population structure and linkage disequilibrium, that enable the filtering of failed SNP calls that are likely caused by biological reasons. For the population and trait examined, prediction accuracy based on these filtered failed allele calls was competitive to standard SNP-based prediction, underlying the potential value of missing data in genomic prediction approaches. The combination of SNPs with all failed allele calls or the filtered allele calls did not outperform predictions with only SNP-based prediction due to redundancy in genomic relationship estimates.
Collapse
Affiliation(s)
- Sven E. Weber
- Department of Plant Breeding, Justus Liebig University, Giessen, Germany
| | | | - Lennard Ehrig
- Department of Plant Breeding, Justus Liebig University, Giessen, Germany
| | - Lee T. Hickey
- Centre for Crop Science, Queensland Alliance for Agriculture and Food Innovation, The University of Queensland, St Lucia, QLD, Australia
| | - Matthias Frisch
- Department of Biometry and Population Genetics, Justus Liebig University, Giessen, Germany
| | - Rod J. Snowdon
- Department of Plant Breeding, Justus Liebig University, Giessen, Germany
| |
Collapse
|
4
|
Hamid A, Petreaca B, Petreaca R. Frequent homozygous deletions of the CDKN2A locus in somatic cancer tissues. Mutat Res 2019; 815:30-40. [PMID: 31096160 DOI: 10.1016/j.mrfmmm.2019.04.002] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2019] [Revised: 04/09/2019] [Accepted: 04/10/2019] [Indexed: 02/07/2023]
Abstract
Here we present and describe data on homozygous deletions (HD) of human CDKN2 A and neighboring regions on the p arm of Chromosome 9 from cancer genome sequences deposited on the online Catalogue of Somatic Mutations in Cancer (COSMIC) database. Although CDKN2 A HDs have been previously described in many cancers, this is a pan-cancer report of these aberrations with the aim to map the distribution of the breakpoints. We find that HDs of this locus have a median range of 1,255,650bps. When the deletion breakpoints were mapped on both the telomere and centromere proximal sides of CDKN2A, most of the telomere proximal breakpoints concentrate to a narrow region of the chromosome which includes the gene MTAP.. The centromere proximal breakpoints of the deletions are distributed over a wider chromosomal region. Furthermore, gene expression analysis shows that the deletions that include the CDKN2A region also include the MTAP region and this observation is tissue independent. We propose a model that may explain the origin of the telomere proximal CDKN2A breakpoints Finally, we find that HD distributions for at least three other loci, RB1, SMAD4 and PTEN are also not random.
Collapse
Affiliation(s)
- Abdulaziz Hamid
- The Ohio State University, MSE110A, 1464 Mount Vernon Ave, Marion, OH 43302, United States
| | - Beniamin Petreaca
- The Ohio State University, MSE110A, 1464 Mount Vernon Ave, Marion, OH 43302, United States
| | - Ruben Petreaca
- The Ohio State University, MSE110A, 1464 Mount Vernon Ave, Marion, OH 43302, United States.
| |
Collapse
|
5
|
Kidder BL, He R, Wangsa D, Padilla-Nash HM, Bernardo MM, Sheng S, Ried T, Zhao K. SMYD5 Controls Heterochromatin and Chromosome Integrity during Embryonic Stem Cell Differentiation. Cancer Res 2017; 77:6729-6745. [PMID: 28951459 DOI: 10.1158/0008-5472.can-17-0828] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2017] [Revised: 08/10/2017] [Accepted: 09/21/2017] [Indexed: 12/18/2022]
Abstract
Epigenetic regulation of chromatin states is thought to control gene expression programs during lineage specification. However, the roles of repressive histone modifications, such as trimethylated histone lysine 20 (H4K20me3), in development and genome stability are largely unknown. Here, we show that depletion of SET and MYND domain-containing protein 5 (SMYD5), which mediates H4K20me3, leads to genome-wide decreases in H4K20me3 and H3K9me3 levels and derepression of endogenous LTR- and LINE-repetitive DNA elements during differentiation of mouse embryonic stem cells. SMYD5 depletion resulted in chromosomal aberrations and the formation of transformed cells that exhibited decreased H4K20me3 and H3K9me3 levels and an expression signature consistent with multiple human cancers. Moreover, dysregulated gene expression in SMYD5 cancer cells was associated with LTR and endogenous retrovirus elements and decreased H4K20me3. In addition, depletion of SMYD5 in human colon and lung cancer cells results in increased tumor growth and upregulation of genes overexpressed in colon and lung cancers, respectively. These findings implicate an important role for SMYD5 in maintaining chromosome integrity by regulating heterochromatin and repressing endogenous repetitive DNA elements during differentiation. Cancer Res; 77(23); 6729-45. ©2017 AACR.
Collapse
Affiliation(s)
- Benjamin L Kidder
- Department of Oncology, Wayne State University School of Medicine, Detroit, Michigan. .,Barbara Ann Karmanos Cancer Institute, Wayne State University School of Medicine, Detroit, Michigan
| | - Runsheng He
- Department of Oncology, Wayne State University School of Medicine, Detroit, Michigan.,Barbara Ann Karmanos Cancer Institute, Wayne State University School of Medicine, Detroit, Michigan
| | - Darawalee Wangsa
- Cancer Genomics Section, National Cancer Institute, NIH, Bethesda, Maryland
| | | | - M Margarida Bernardo
- Barbara Ann Karmanos Cancer Institute, Wayne State University School of Medicine, Detroit, Michigan.,Department of Pathology, Wayne State University School of Medicine, Detroit, Michigan
| | - Shijie Sheng
- Barbara Ann Karmanos Cancer Institute, Wayne State University School of Medicine, Detroit, Michigan.,Department of Pathology, Wayne State University School of Medicine, Detroit, Michigan
| | - Thomas Ried
- Cancer Genomics Section, National Cancer Institute, NIH, Bethesda, Maryland
| | - Keji Zhao
- Systems Biology Center, National Heart, Lung and Blood Institute, NIH, Bethesda, Maryland.
| |
Collapse
|
6
|
Cui H, Dhroso A, Johnson N, Korkin D. The variation game: Cracking complex genetic disorders with NGS and omics data. Methods 2015; 79-80:18-31. [PMID: 25944472 DOI: 10.1016/j.ymeth.2015.04.018] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2014] [Revised: 03/27/2015] [Accepted: 04/17/2015] [Indexed: 12/14/2022] Open
Abstract
Tremendous advances in Next Generation Sequencing (NGS) and high-throughput omics methods have brought us one step closer towards mechanistic understanding of the complex disease at the molecular level. In this review, we discuss four basic regulatory mechanisms implicated in complex genetic diseases, such as cancer, neurological disorders, heart disease, diabetes, and many others. The mechanisms, including genetic variations, copy-number variations, posttranscriptional variations, and epigenetic variations, can be detected using a variety of NGS methods. We propose that malfunctions detected in these mechanisms are not necessarily independent, since these malfunctions are often found associated with the same disease and targeting the same gene, group of genes, or functional pathway. As an example, we discuss possible rewiring effects of the cancer-associated genetic, structural, and posttranscriptional variations on the protein-protein interaction (PPI) network centered around P53 protein. The review highlights multi-layered complexity of common genetic disorders and suggests that integration of NGS and omics data is a critical step in developing new computational methods capable of deciphering this complexity.
Collapse
Affiliation(s)
- Hongzhu Cui
- Department of Computer Science, Worcester Polytechnic Institute, 100 Institute Road, Worcester, MA 01609, United States
| | - Andi Dhroso
- Department of Computer Science, Worcester Polytechnic Institute, 100 Institute Road, Worcester, MA 01609, United States
| | - Nathan Johnson
- Department of Computer Science, Worcester Polytechnic Institute, 100 Institute Road, Worcester, MA 01609, United States
| | - Dmitry Korkin
- Department of Computer Science, Worcester Polytechnic Institute, 100 Institute Road, Worcester, MA 01609, United States; Bioinformatics and Computational Biology Program, Worcester Polytechnic Institute, 100 Institute Road, Worcester, MA 01609, United States
| |
Collapse
|
7
|
Zhang H, Du ZQ, Dong JQ, Wang HX, Shi HY, Wang N, Wang SZ, Li H. Detection of genome-wide copy number variations in two chicken lines divergently selected for abdominal fat content. BMC Genomics 2014; 15:517. [PMID: 24962627 PMCID: PMC4092215 DOI: 10.1186/1471-2164-15-517] [Citation(s) in RCA: 34] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2013] [Accepted: 06/19/2014] [Indexed: 12/13/2022] Open
Abstract
Background The chicken (Gallus gallus) is an important model organism that bridges the evolutionary gap between mammals and other vertebrates. Copy number variations (CNVs) are a form of genomic structural variation widely distributed in the genome. CNV analysis has recently gained greater attention and momentum, as the identification of CNVs can contribute to a better understanding of traits important to both humans and other animals. To detect chicken CNVs, we genotyped 475 animals derived from two broiler chicken lines divergently selected for abdominal fat content using chicken 60 K SNP array, which is a high-throughput method widely used in chicken genomics studies. Results Using PennCNV algorithm, we detected 438 and 291 CNVs in the lean and fat lines, respectively, corresponding to 271 and 188 CNV regions (CNVRs), which were obtained by merging overlapping CNVs. Out of these CNVRs, 99% were confirmed also by the CNVPartition program. These CNVRs covered 40.26 and 30.60 Mb of the chicken genome in the lean and fat lines, respectively. Moreover, CNVRs included 176 loss, 68 gain and 27 both (i.e. loss and gain within the same region) events in the lean line, and 143 loss, 25 gain and 20 both events in the fat line. Ten CNVRs were chosen for the validation experiment using qPCR method, and all of them were confirmed in at least one qPCR assay. We found a total of 886 genes located within these CNVRs, and Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analyses showed they could play various roles in a number of biological processes. Integrating the results of CNVRs, known quantitative trait loci (QTL) and selective sweeps for abdominal fat content suggested that some genes (including SLC9A3, GNAL, SPOCK3, ANXA10, HELIOS, MYLK, CCDC14, SPAG9, SOX5, VSNL1, SMC6, GEN1, MSGN1 and ZPAX) may be important for abdominal fat deposition in the chicken. Conclusions Our study provided a genome-wide CNVR map of the chicken genome, thereby contributing to our understanding of genomic structural variations and their potential roles in abdominal fat content in the chicken. Electronic supplementary material The online version of this article (doi:10.1186/1471-2164-15-517) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
| | | | | | | | | | | | | | - Hui Li
- Key Laboratory of Chicken Genetics and Breeding, Ministry of Agriculture, Harbin 150030, P,R China.
| |
Collapse
|
8
|
Li C, Yang C, Gelernter J, Zhao H. Improving genetic risk prediction by leveraging pleiotropy. Hum Genet 2014; 133:639-50. [PMID: 24337655 PMCID: PMC3988249 DOI: 10.1007/s00439-013-1401-5] [Citation(s) in RCA: 59] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2013] [Accepted: 11/23/2013] [Indexed: 11/28/2022]
Abstract
An important task of human genetics studies is to predict accurately disease risks in individuals based on genetic markers, which allows for identifying individuals at high disease risks, and facilitating their disease treatment and prevention. Although hundreds of genome-wide association studies (GWAS) have been conducted on many complex human traits in recent years, there has been only limited success in translating these GWAS data into clinically useful risk prediction models. The predictive capability of GWAS data is largely bottlenecked by the available training sample size due to the presence of numerous variants carrying only small to modest effects. Recent studies have shown that different human traits may share common genetic bases. Therefore, an attractive strategy to increase the training sample size and hence improve the prediction accuracy is to integrate data from genetically correlated phenotypes. Yet, the utility of genetic correlation in risk prediction has not been explored in the literature. In this paper, we analyzed GWAS data for bipolar and related disorders and schizophrenia with a bivariate ridge regression method, and found that jointly predicting the two phenotypes could substantially increase prediction accuracy as measured by the area under the receiver operating characteristic curve. We also found similar prediction accuracy improvements when we jointly analyzed GWAS data for Crohn's disease and ulcerative colitis. The empirical observations were substantiated through our comprehensive simulation studies, suggesting that a gain in prediction accuracy can be obtained by combining phenotypes with relatively high genetic correlations. Through both real data and simulation studies, we demonstrated pleiotropy can be leveraged as a valuable asset that opens up a new opportunity to improve genetic risk prediction in the future.
Collapse
Affiliation(s)
- Cong Li
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, Connecticut 06520, USA
| | - Can Yang
- Department of Biostatistics, Yale School of Public Health, Yale University, New Haven, Connecticut 06520, USA, Department of Psychiatry, Yale University School of Medicine, New Haven, CT, USA
| | - Joel Gelernter
- Department of Psychiatry, Yale University School of Medicine, New Haven, CT, USA, VA CT Healthcare Center, Departments of Genetics and Neurobiology, Yale Univ. School of Medicine, West Haven, Connecticut 06516, USA
| | - Hongyu Zhao
- Department of Biostatistics, Yale School of Public Health, Yale University, New Haven, Connecticut 06520, USA, Program in Computational Biology and Bioinformatics, Yale University, New Haven, Connecticut 06520, USA
| |
Collapse
|
9
|
Fowler KE, Pong-Wong R, Bauer J, Clemente EJ, Reitter CP, Affara NA, Waite S, Walling GA, Griffin DK. Genome wide analysis reveals single nucleotide polymorphisms associated with fatness and putative novel copy number variants in three pig breeds. BMC Genomics 2013; 14:784. [PMID: 24225222 PMCID: PMC3879217 DOI: 10.1186/1471-2164-14-784] [Citation(s) in RCA: 48] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2013] [Accepted: 10/29/2013] [Indexed: 12/13/2022] Open
Abstract
BACKGROUND Obesity, excess fat tissue in the body, can underlie a variety of medical complaints including heart disease, stroke and cancer. The pig is an excellent model organism for the study of various human disorders, including obesity, as well as being the foremost agricultural species. In order to identify genetic variants associated with fatness, we used a selective genomic approach sampling DNA from animals at the extreme ends of the fat and lean spectrum using estimated breeding values derived from a total population size of over 70,000 animals. DNA from 3 breeds (Sire Line Large White, Duroc and a white Pietrain composite line (Titan)) was used to interrogate the Illumina Porcine SNP60 Genotyping Beadchip in order to identify significant associations in terms of single nucleotide polymorphisms (SNPs) and copy number variants (CNVs). RESULTS By sampling animals at each end of the fat/lean EBV (estimate breeding value) spectrum the whole population could be assessed using less than 300 animals, without losing statistical power. Indeed, several significant SNPs (at the 5% genome wide significance level) were discovered, 4 of these linked to genes with ontologies that had previously been correlated with fatness (NTS, FABP6, SST and NR3C2). Quantitative analysis of the data identified putative CNV regions containing genes whose ontology suggested fatness related functions (MCHR1, PPARα, SLC5A1 and SLC5A4). CONCLUSIONS Selective genotyping of EBVs at either end of the phenotypic spectrum proved to be a cost effective means of identifying SNPs and CNVs associated with fatness and with estimated major effects in a large population of animals.
Collapse
Affiliation(s)
- Katie E Fowler
- School of Biosciences, University of Kent, Canterbury, Kent CT2 7NH, UK
| | - Ricardo Pong-Wong
- Roslin Institute, The University of Edinburgh, Roslin Biocentre, Midlothian, Scotland EH25 9PS, UK
| | - Julien Bauer
- Department of Pathology, University of Cambridge, Tennis Court Road, Cambridge CB2 1QP, UK
| | - Emily J Clemente
- Department of Pathology, University of Cambridge, Tennis Court Road, Cambridge CB2 1QP, UK
| | - Christopher P Reitter
- Department of Pathology, University of Cambridge, Tennis Court Road, Cambridge CB2 1QP, UK
| | - Nabeel A Affara
- Department of Pathology, University of Cambridge, Tennis Court Road, Cambridge CB2 1QP, UK
| | - Stephen Waite
- JSR Genetics, Southburn, Driffield, East Yorkshirea YO25 9ED, UK
| | - Grant A Walling
- JSR Genetics, Southburn, Driffield, East Yorkshirea YO25 9ED, UK
| | - Darren K Griffin
- School of Biosciences, University of Kent, Canterbury, Kent CT2 7NH, UK
| |
Collapse
|
10
|
Chen N, Balasenthil S, Reuther J, Frayna A, Wang Y, Chandler DS, Abruzzo LV, Rashid A, Rodriguez J, Lozano G, Cao Y, Lokken E, Chen J, Frazier ML, Sahin AA, Wistuba II, Sen S, Lott ST, Killary AM. DEAR1 is a chromosome 1p35 tumor suppressor and master regulator of TGF-β-driven epithelial-mesenchymal transition. Cancer Discov 2013; 3:1172-89. [PMID: 23838884 PMCID: PMC4107927 DOI: 10.1158/2159-8290.cd-12-0499] [Citation(s) in RCA: 34] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
Abstract
UNLABELLED Deletion of chromosome 1p35 is a common event in epithelial malignancies. We report that DEAR1 (annotated as TRIM62) is a chromosome 1p35 tumor suppressor that undergoes mutation, copy number variation, and loss of expression in human tumors. Targeted disruption in the mouse recapitulates this human tumor spectrum, with both Dear1(-/-) and Dear1(+/-) mice developing primarily epithelial adenocarcinomas and lymphoma with evidence of metastasis in a subset of mice. DEAR1 loss of function in the presence of TGF-β results in failure of acinar morphogenesis, upregulation of epithelial-mesenchymal transition (EMT) markers, anoikis resistance, migration, and invasion. Furthermore, DEAR1 blocks TGF-β-SMAD3 signaling, resulting in decreased nuclear phosphorylated SMAD3 by binding to and promoting the ubiquitination of SMAD3, the major effector of TGF-β-induced EMT. Moreover, DEAR1 loss increases levels of SMAD3 downstream effectors SNAIL1 and SNAIL2, with genetic alteration of DEAR1/SNAIL2 serving as prognostic markers of overall poor survival in a cohort of 889 cases of invasive breast cancer. SIGNIFICANCE Cumulative results provide compelling evidence that DEAR1 is a critical tumor suppressor involved in multiple human cancers and provide a novel paradigm for regulation of TGF-β-induced EMT through DEAR1's regulation of SMAD3 protein levels. DEAR1 loss of function has important therapeutic implications for targeted therapies aimed at the TGF-β-SMAD3 pathway.
Collapse
Affiliation(s)
- Nanyue Chen
- Department of Genetics, The University of Texas M. D. Anderson Cancer Center, Houston, Texas 77030, USA
| | - Seetharaman Balasenthil
- Department of Genetics, The University of Texas M. D. Anderson Cancer Center, Houston, Texas 77030, USA
| | - Jacquelyn Reuther
- Department of Genetics, The University of Texas M. D. Anderson Cancer Center, Houston, Texas 77030, USA
| | - Aileen Frayna
- Department of Genetics, The University of Texas M. D. Anderson Cancer Center, Houston, Texas 77030, USA
| | - Ying Wang
- Department of Genetics, The University of Texas M. D. Anderson Cancer Center, Houston, Texas 77030, USA
| | - Dawn S. Chandler
- Department of Genetics, The University of Texas M. D. Anderson Cancer Center, Houston, Texas 77030, USA
| | - Lynne V. Abruzzo
- Division of Pathology and Laboratory Medicine, The University of Texas M. D. Anderson Cancer Center, Houston, Texas 77030, USA
| | - Asif Rashid
- Division of Pathology and Laboratory Medicine, The University of Texas M. D. Anderson Cancer Center, Houston, Texas 77030, USA
| | - Jaime Rodriguez
- Division of Pathology and Laboratory Medicine, The University of Texas M. D. Anderson Cancer Center, Houston, Texas 77030, USA
| | - Guillermina Lozano
- Department of Genetics, The University of Texas M. D. Anderson Cancer Center, Houston, Texas 77030, USA
| | - Yu Cao
- Department of Genetics, The University of Texas M. D. Anderson Cancer Center, Houston, Texas 77030, USA
| | - Erica Lokken
- Department of Genetics, The University of Texas M. D. Anderson Cancer Center, Houston, Texas 77030, USA
| | - Jinyun Chen
- Department of Epidemiology, The University of Texas M. D. Anderson Cancer Center, Houston, Texas 77030, USA
| | - Marsha L. Frazier
- Department of Epidemiology, The University of Texas M. D. Anderson Cancer Center, Houston, Texas 77030, USA
| | - Aysegul A. Sahin
- Division of Pathology and Laboratory Medicine, The University of Texas M. D. Anderson Cancer Center, Houston, Texas 77030, USA
| | - Ignacio I. Wistuba
- Division of Pathology and Laboratory Medicine, The University of Texas M. D. Anderson Cancer Center, Houston, Texas 77030, USA
| | - Subrata Sen
- Division of Pathology and Laboratory Medicine, The University of Texas M. D. Anderson Cancer Center, Houston, Texas 77030, USA
| | - Steven T. Lott
- Department of Genetics, The University of Texas M. D. Anderson Cancer Center, Houston, Texas 77030, USA
| | - Ann McNeill Killary
- Department of Genetics, The University of Texas M. D. Anderson Cancer Center, Houston, Texas 77030, USA
| |
Collapse
|
11
|
Glessner JT, Li J, Hakonarson H. ParseCNV integrative copy number variation association software with quality tracking. Nucleic Acids Res 2013; 41:e64. [PMID: 23293001 PMCID: PMC3597648 DOI: 10.1093/nar/gks1346] [Citation(s) in RCA: 51] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/29/2023] Open
Abstract
A number of copy number variation (CNV) calling algorithms exist; however, comprehensive software tools for CNV association studies are lacking. We describe ParseCNV, unique software that takes CNV calls and creates probe-based statistics for CNV occurrence in both case–control design and in family based studies addressing both de novo and inheritance events, which are then summarized based on CNV regions (CNVRs). CNVRs are defined in a dynamic manner to allow for a complex CNV overlap while maintaining precise association region. Using this approach, we avoid failure to converge and non-monotonic curve fitting weaknesses of programs, such as CNVtools and CNVassoc, and although Plink is easy to use, it only provides combined CNV state probe-based statistics, not state-specific CNVRs. Existing CNV association methods do not provide any quality tracking information to filter confident associations, a key issue which is fully addressed by ParseCNV. In addition, uncertainty in CNV calls underlying CNV associations is evaluated to verify significant results, including CNV overlap profiles, genomic context, number of probes supporting the CNV and single-probe intensities. When optimal quality control parameters are followed using ParseCNV, 90% of CNVs validate by polymerase chain reaction, an often problematic stage because of inadequate significant association review. ParseCNV is freely available at http://parsecnv.sourceforge.net.
Collapse
Affiliation(s)
- Joseph T Glessner
- Department of Pediatrics, Division of Human Genetics, The Center for Applied Genomics, Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA.
| | | | | |
Collapse
|
12
|
Butler JL, Osborne Locke ME, Hill KA, Daley M. HD-CNV: hotspot detector for copy number variants. Bioinformatics 2012; 29:262-3. [PMID: 23129301 DOI: 10.1093/bioinformatics/bts650] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023] Open
Abstract
SUMMARY Copy number variants (CNVs) are a major source of genetic variation. Comparing CNVs between samples is important in elucidating their potential effects in a wide variety of biological contexts. HD-CNV (hotspot detector for copy number variants) is a tool for downstream analysis of previously identified CNV regions from multiple samples, and it detects recurrent regions by finding cliques in an interval graph generated from the input. It creates a unique graphical representation of the data, as well as summary spreadsheets and UCSC (University of California, Santa Cruz) Genome Browser track files. The interval graph, when viewed with other software or by automated graph analysis, is useful in identifying genomic regions of interest for further study. AVAILABILITY AND IMPLEMENTATION HD-CNV is an open source Java code and is freely available, with tutorials and sample data from http://daleylab.org. CONTACT jcamer7@uwo.ca
Collapse
Affiliation(s)
- Jenna L Butler
- Department of Computer Science, The University of Western Ontario, London, ON, Canada N6A 3K7.
| | | | | | | |
Collapse
|
13
|
Kim SY, Kim JH, Chung YJ. Effect of Combining Multiple CNV Defining Algorithms on the Reliability of CNV Calls from SNP Genotyping Data. Genomics Inform 2012; 10:194-9. [PMID: 23166530 PMCID: PMC3492655 DOI: 10.5808/gi.2012.10.3.194] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2012] [Revised: 08/20/2012] [Accepted: 08/23/2012] [Indexed: 01/11/2023] Open
Abstract
In addition to single-nucleotide polymorphisms (SNP), copy number variation (CNV) is a major component of human genetic diversity. Among many whole-genome analysis platforms, SNP arrays have been commonly used for genomewide CNV discovery. Recently, a number of CNV defining algorithms from SNP genotyping data have been developed; however, due to the fundamental limitation of SNP genotyping data for the measurement of signal intensity, there are still concerns regarding the possibility of false discovery or low sensitivity for detecting CNVs. In this study, we aimed to verify the effect of combining multiple CNV calling algorithms and set up the most reliable pipeline for CNV calling with Affymetrix Genomewide SNP 5.0 data. For this purpose, we selected the 3 most commonly used algorithms for CNV segmentation from SNP genotyping data, PennCNV, QuantiSNP; and BirdSuite. After defining the CNV loci using the 3 different algorithms, we assessed how many of them overlapped with each other, and we also validated the CNVs by genomic quantitative PCR. Through this analysis, we proposed that for reliable CNV-based genomewide association study using SNP array data, CNV calls must be performed with at least 3 different algorithms and that the CNVs consistently called from more than 2 algorithms must be used for association analysis, because they are more reliable than the CNVs called from a single algorithm. Our result will be helpful to set up the CNV analysis protocols for Affymetrix Genomewide SNP 5.0 genotyping data.
Collapse
Affiliation(s)
- Soon-Young Kim
- Integrated Research Center for Genome Polymorphism, The Catholic University of Korea School of Medicine, Seoul 137-701, Korea. ; Department of Microbiology, The Catholic University of Korea School of Medicine, Seoul 137-701, Korea
| | | | | |
Collapse
|
14
|
Carr IM, Diggle CP, Khan K, Inglehearn C, McKibbin M, Bonthron DT, Markham AF, Anwar R, Dobbie A, Pena SDJ, Ali M. Rapid visualisation of microarray copy number data for the detection of structural variations linked to a disease phenotype. PLoS One 2012; 7:e43466. [PMID: 22912880 PMCID: PMC3422275 DOI: 10.1371/journal.pone.0043466] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2012] [Accepted: 07/20/2012] [Indexed: 12/12/2022] Open
Abstract
Whilst the majority of inherited diseases have been found to be caused by single base substitutions, small insertions or deletions (<1Kb), a significant proportion of genetic variability is due to copy number variation (CNV). The possible role of CNV in monogenic and complex diseases has recently attracted considerable interest. However, until the development of whole genome, oligonucleotide micro-arrays, designed specifically to detect the presence of copy number variation, it was not easy to screen an individual for the presence of unknown deletions or duplications with sizes below the level of sensitivity of optical microscopy (3-5 Mb). Now that currently available oligonucleotide micro-arrays have in excess of a million probes, the problem of copy number analysis has moved from one of data production to that of data analysis. We have developed CNViewer, to identify copy number variation that co-segregates with a disease phenotype in small nuclear families, from genome-wide oligonucleotide micro-array data. This freely available program should constitute a useful addition to the diagnostic armamentarium of clinical geneticists.
Collapse
Affiliation(s)
- Ian M Carr
- School of Medicine, University of Leeds, Leeds, United Kingdom.
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
15
|
Kim JH, Hu HJ, Yim SH, Bae JS, Kim SY, Chung YJ. CNVRuler: a copy number variation-based case-control association analysis tool. ACTA ACUST UNITED AC 2012; 28:1790-2. [PMID: 22539667 DOI: 10.1093/bioinformatics/bts239] [Citation(s) in RCA: 63] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]
Abstract
SUMMARY The method for genome-wide association study (GWAS) based on copy number variation (CNV) is not as well established as that for single nucleotide polymorphism (SNP)-GWAS. Although there are several tools for CNV association studies, most of them do not provide appropriate definitions of CNV regions (CNVRs), which are essential for CNV-association studies. Here we present a user-friendly program called CNVRuler for CNV-association studies. Outputs from the 10 most common CNV defining algorithms can be directly used as input files for determining the three different definitions of CNVRs. Once CNVRs are defined, CNVRuler supports four kinds of statistical association tests and options for population stratification. CNVRuler is based on the open-source programs R and Java from Sun Microsystems. AVAILABILITY CNVRuler software is available with an online manual at the website, www.ircgp.com/CNVRuler/index.html.
Collapse
Affiliation(s)
- Ji-Hong Kim
- Integrated Research Center for Genome Polymorphism, Department of Microbiology, School of Medicine, Catholic University of Korea, Seoul 137-701, Korea
| | | | | | | | | | | |
Collapse
|
16
|
Grayson BL, Aune TM. A comparison of genomic copy number calls by Partek Genomics Suite, Genotyping Console and Birdsuite algorithms to quantitative PCR. BioData Min 2011; 4:8. [PMID: 21489293 PMCID: PMC3084167 DOI: 10.1186/1756-0381-4-8] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2010] [Accepted: 04/13/2011] [Indexed: 01/31/2023] Open
Abstract
Background Copy number variants are >1 kb genomic amplifications or deletions that can be identified using array platforms. However, arrays produce substantial background noise that contributes to high false discovery rates of variants. We hypothesized that quantitative PCR could finitely determine copy number and assess the validity of calling algorithms. Results Using data from 29 Affymetrix SNP 6.0 arrays, we determined copy numbers using three programs: Partek Genomics Suite, Affymetrix Genotyping Console 2.0 and Birdsuite. We compared array calls at 25 chromosomal regions to those determined by qPCR and found nearly identical calls in regions of copy number 2. Conversely, agreement differed in regions called variant by at least one method. The highest overall agreement in calls, 91%, was between Birdsuite and quantitative PCR. Partek Genomics Suite calls agreed with quantitative PCR 76% of the time while the agreement of Affymetrix Genotyping Console 2.0 with quantitative PCR was 79%. Conclusions In 38 independent samples, 96% of Birdsuite calls agreed with quantitative PCR. Analysis of three copy number calling programs and quantitative PCR showed Birdsuite to have the greatest agreement with quantitative PCR.
Collapse
Affiliation(s)
- Britney L Grayson
- Department of Medicine, Vanderbilt University School of Medicine, Nashville, TN 37232, USA.
| | | |
Collapse
|
17
|
Makowsky R, Pajewski NM, Klimentidis YC, Vazquez AI, Duarte CW, Allison DB, de los Campos G. Beyond missing heritability: prediction of complex traits. PLoS Genet 2011; 7:e1002051. [PMID: 21552331 PMCID: PMC3084207 DOI: 10.1371/journal.pgen.1002051] [Citation(s) in RCA: 210] [Impact Index Per Article: 16.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2010] [Accepted: 03/02/2011] [Indexed: 01/25/2023] Open
Abstract
Despite rapid advances in genomic technology, our ability to account for phenotypic variation using genetic information remains limited for many traits. This has unfortunately resulted in limited application of genetic data towards preventive and personalized medicine, one of the primary impetuses of genome-wide association studies. Recently, a large proportion of the "missing heritability" for human height was statistically explained by modeling thousands of single nucleotide polymorphisms concurrently. However, it is currently unclear how gains in explained genetic variance will translate to the prediction of yet-to-be observed phenotypes. Using data from the Framingham Heart Study, we explore the genomic prediction of human height in training and validation samples while varying the statistical approach used, the number of SNPs included in the model, the validation scheme, and the number of subjects used to train the model. In our training datasets, we are able to explain a large proportion of the variation in height (h(2) up to 0.83, R(2) up to 0.96). However, the proportion of variance accounted for in validation samples is much smaller (ranging from 0.15 to 0.36 depending on the degree of familial information used in the training dataset). While such R(2) values vastly exceed what has been previously reported using a reduced number of pre-selected markers (<0.10), given the heritability of the trait (∼ 0.80), substantial room for improvement remains.
Collapse
Affiliation(s)
- Robert Makowsky
- Department of Biostatistics, University of Alabama at Birmingham, Alabama, United States of America.
| | | | | | | | | | | | | |
Collapse
|