1
|
Delmonico L, Obenauer JC, Stockfisch TP, Fournier MV. Housekeeping genes involved in non-malignant breast phenotypes are widely expressed in multiple cancers and provide novel biomarkers of tumor classification. ACTA ACUST UNITED AC 2021; 54:e10388. [PMID: 34008752 PMCID: PMC8130057 DOI: 10.1590/1414-431x2020e10388] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2020] [Accepted: 02/12/2021] [Indexed: 11/21/2022]
Abstract
Clinically relevant biomarkers are useful to determine cancer patients' prognosis and treatments. To discover new putative biomarkers, we performed in silico analysis of a 325-gene panel previously associated with breast epithelial cell biology and clinical outcomes. Sixteen public datasets of microarray samples representing 8 cancer types and a total of 3,663 patients' samples were used for the analyses. Feature selection was used to identify the best subsets of the 325 genes for each classification, and linear discriminant analysis was used to quantify the accuracy of the classifications. A subset of 102 of the 325 genes were found to be housekeeping (HK) genes, and the classifications were repeated using only the 102 HK subset. The 325-gene panel and 102 HK subset were able to distinguish colon, gastric, lung, ovarian, pancreatic, and prostate tumors and leukemia from normal adjacent tissue, and classify disease subtypes of breast and lung cancers and leukemia with 70% or higher accuracy. HK genes have been overlooked as potential biomarkers due to their relative stability. This study describes a set of HK genes as putative biomarkers applicable to multiple cancer types worth following in subsequent validation studies.
Collapse
Affiliation(s)
- L Delmonico
- Instituto de Biofísica Carlos Chagas Filho, Universidade Federal do Rio de Janeiro, Rio de Janeiro, RJ, Brasil
| | | | | | | |
Collapse
|
2
|
Delmonico L, Obenauer JC, Qureshi F, Alves G, Costa MASM, Martin KJ, Fournier MV. A Novel Panel of 80 RNA Biomarkers with Differential Expression in Multiple Human Solid Tumors against Healthy Blood Samples. Int J Mol Sci 2019; 20:ijms20194894. [PMID: 31581693 PMCID: PMC6802086 DOI: 10.3390/ijms20194894] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2019] [Revised: 09/26/2019] [Accepted: 09/27/2019] [Indexed: 12/23/2022] Open
Abstract
The aim of this study was to identify genes with higher expression in solid tumor cells by comparing human tumor biopsies with healthy blood samples using both in silico statistical analysis and experimental validations. This approach resulted in a novel panel of 80 RNA biomarkers with high discrimination power to detect circulating tumor cells in blood samples. To identify the 80 RNA biomarkers, Affymetrix HG-U133 plus 2.0 microarrays datasets were used to compare breast tumor tissue biopsies and breast cancer cell lines with blood samples from patients with conditions other than cancer. A total of 859 samples were analyzed at the discovery stage, consisting of 417 mammary tumors, 41 breast lines, and 401 control samples. To confirm this discovery, external datasets of eight types of tumors were used, and experimental validation studies (NanoString n-counter gene expression assay) were performed, totaling 5028 samples analyzed. In these analyses, the 80 biomarkers showed higher expression in all solid tumors analyzed relative to healthy blood samples. Experimental validation studies using NanoString assay confirmed the results were not dependent of the gene expression platform. A panel of 80 RNA biomarkers was described here, with the potential to detect solid tumor cells present in the blood of multiple tumor types.
Collapse
Affiliation(s)
- Lucas Delmonico
- Carlos Chagas Filho Institute of Biophysics, Federal University of Rio de Janeiro (UFRJ), Rio de Janeiro 21941-902, Brazil.
| | | | - Fatir Qureshi
- Center for Biotechnology and Interdisciplinary Studies, Rensselaer Polytechnic Institute, Troy, NY 12180, USA.
| | - Gilda Alves
- Circulating Biomarkers Laboratory, Faculty of Medical Sciences, Rio de Janeiro State University, Rio de Janeiro 20550-170, Brazil.
| | | | | | | |
Collapse
|
3
|
Obenauer JC, Stockfisch TP, Fournier MV. Abstract 1659: Overcorrection of batch effects by ComBat can be avoided by using an equal medians method. Cancer Res 2019. [DOI: 10.1158/1538-7445.am2019-1659] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Abstract
Combining multiple data sets from the Gene Expression Omnibus (GEO) or other data repositories for an integrated analysis requires appropriate batch correction. ComBat, an empirical Bayesian method for batch correction of microarray data, is widely used and has been reported to be the best correction method. We combined cancer data from 16 public studies representing 8 tissue types and a total of 3,563 samples, used the R “sva” package and ComBat for batch correction, and examined 6 gene sets representing positive and negative controls. As positive controls, we extracted 4 gene sets from the Human Protein Atlas that were found to be expressed at least 5-fold higher in one tissue than in any of 35 other tissues, and we matched these genes to their Affymetrix U133A probesets. This resulted in 16 probesets specific for stomach, 18 for lung, 37 for pancreas, and 27 for prostate. A fifth positive control is a group of 85 genes called BA80 that we have found to be expressed much lower in blood than in solid tissues. As a negative control that we do not expect to change much between tissues, we used a list of 3,804 housekeeping (HK) genes that were reported to show less than a four-fold expression change across 16 tissue types. We compared the ComBat results to a new method we call equal medians. The equal medians method assumes that the 22,277 genes measured on the Affymetrix U133A microarrays can vary widely between tissues and batches, but that the median of the 22,277 genes is the same for every sample. We created boxplots of each gene set across the 16 studies before and after each method of batch correction. The reduction in batch effects was scored using the change in standard deviation of the HK genes. The preservation of biological variability was scored using the fold change of the positive controls, comparing the target tissue’s median to the nearest alternate tissue’s median. We used two GEO studies as independent representatives of each tissue type, so the two fold changes were averaged to create a single measure.
The results using the HK genes showed that ComBat removed 99.90% of the batch effects visible in the raw data, while equal medians removed 61.58%. However, equal medians did the best at preserving biological variability, with a fold change of 4.8 for stomach, 13.1 for lung, 42.3 for pancreas, 12.0 for prostate, and 3.9 for blood. The corresponding fold changes for ComBat were 1.4, 1.1, 2.2, 1.0, and 1.0.
We conclude that ComBat was best at removing batch effects, but at the undesirable cost of minimizing biological variation. We believe this is due to known and unknown sources of variability that are confounded with batches, which is one of ComBat’s known risks. Equal medians showed the opposite performance, preserving biological variation better while partially removing batch effects. We offer the equal medians method as an alternative batch correction method in cases where ComBat shows evidence of overcorrection.
Citation Format: John C. Obenauer, Thomas P. Stockfisch, Marcia V. Fournier. Overcorrection of batch effects by ComBat can be avoided by using an equal medians method [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2019; 2019 Mar 29-Apr 3; Atlanta, GA. Philadelphia (PA): AACR; Cancer Res 2019;79(13 Suppl):Abstract nr 1659.
Collapse
|
4
|
Huang R, Grishagin I, Wang Y, Zhao T, Greene J, Obenauer JC, Ngan D, Nguyen DT, Guha R, Jadhav A, Southall N, Simeonov A, Austin CP. The NCATS BioPlanet - An Integrated Platform for Exploring the Universe of Cellular Signaling Pathways for Toxicology, Systems Biology, and Chemical Genomics. Front Pharmacol 2019; 10:445. [PMID: 31133849 PMCID: PMC6524730 DOI: 10.3389/fphar.2019.00445] [Citation(s) in RCA: 135] [Impact Index Per Article: 27.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2018] [Accepted: 04/08/2019] [Indexed: 12/16/2022] Open
Abstract
Chemical genomics aims to comprehensively define, and ultimately predict, the effects of small molecule compounds on biological systems. Chemical activity profiling approaches must consider chemical effects on all pathways operative in mammalian cells. To enable a strategic and maximally efficient chemical profiling of pathway space, we have created the NCATS BioPlanet, a comprehensive integrated pathway resource that incorporates the universe of 1,658 human pathways sourced from publicly available, manually curated sources, which have been subjected to thorough redundancy and consistency cross-evaluation. BioPlanet supports interactive browsing, retrieval, and analysis of pathways, exploration of pathway connections, and pathway search by gene targets, category, and availability of corresponding bioactivity assay, as well as visualization of pathways on a 3-dimensional globe, in which the distance between any two pathways is proportional to their degree of gene component overlap. Using this resource, we propose a strategy to identify a minimal set of 362 biological assays that can interrogate the universe of human pathways. The NCATS BioPlanet is a public resource, which will be continually expanded and updated, for systems biology, toxicology, and chemical genomics, available at http://tripod.nih.gov/bioplanet/.
Collapse
Affiliation(s)
- Ruili Huang
- Division of Pre-Clinical Innovation, National Center for Advancing Translational Sciences, National Institutes of Health, Rockville, MD, United States
| | | | - Yuhong Wang
- Division of Pre-Clinical Innovation, National Center for Advancing Translational Sciences, National Institutes of Health, Rockville, MD, United States
| | - Tongan Zhao
- Division of Pre-Clinical Innovation, National Center for Advancing Translational Sciences, National Institutes of Health, Rockville, MD, United States
| | - Jon Greene
- Rancho BioSciences, San Diego, CA, United States
| | | | - Deborah Ngan
- Division of Pre-Clinical Innovation, National Center for Advancing Translational Sciences, National Institutes of Health, Rockville, MD, United States
| | - Dac-Trung Nguyen
- Division of Pre-Clinical Innovation, National Center for Advancing Translational Sciences, National Institutes of Health, Rockville, MD, United States
| | - Rajarshi Guha
- Division of Pre-Clinical Innovation, National Center for Advancing Translational Sciences, National Institutes of Health, Rockville, MD, United States
| | - Ajit Jadhav
- Division of Pre-Clinical Innovation, National Center for Advancing Translational Sciences, National Institutes of Health, Rockville, MD, United States
| | - Noel Southall
- Division of Pre-Clinical Innovation, National Center for Advancing Translational Sciences, National Institutes of Health, Rockville, MD, United States
| | - Anton Simeonov
- Division of Pre-Clinical Innovation, National Center for Advancing Translational Sciences, National Institutes of Health, Rockville, MD, United States
| | - Christopher P Austin
- Division of Pre-Clinical Innovation, National Center for Advancing Translational Sciences, National Institutes of Health, Rockville, MD, United States
| |
Collapse
|
5
|
Fournier MV, Obenauer JC, Maer A, Goodwin EC. Abstract LB-217: Genes involved in non-malignant breast phenotypes are widely expressed in multiple cancers and provide novel biomarkers of clinical outcomes and therapeutic response. Cancer Res 2018. [DOI: 10.1158/1538-7445.am2018-lb-217] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Abstract
From over two hundred driver mutations identified to date, only about a dozen are FDA-approved biomarkers, and there is an unmet need to discover novel suitable biomarkers. We selected novel biomarkers based on non-malignant breast epithelial cell phenotypes and identified 325 genes (BA325) showing 32 significant oncology drug associations. A total of 251 genes out of 325 are unique and not found in any of 9 other oncology panels investigated, suggesting that BA325 may yield novel insights regarding tumor biology, clinical outcomes, and novel therapeutic targets, not covered by current tools. While prior work has validated the utility of BA325 in breast cancer, the current study investigates BA325 expression in other tumor types beyond breast. We tested BA325 expression in 8 tumor types (breast, colon, lung, ovarian, prostate, pancreatic, gastric cancers, and leukemia), using two independent public data sets for each, totaling 3,563 samples in 16 datasets. All used Affymetrix HG-U133A or Plus 2.0 microarrays. Samples were normalized within each study using RMA and batch-corrected across studies. BA325 expression levels were investigated, showing at least 324 of the 325 genes expressed above background levels (log2 > 4) in all cases tested. The expression profile of BA325 differed in magnitude and pattern across tumor types from 1,000 randomly generated sets of 325 genes. Surprisingly, 102 of the BA325 genes were among 3,804 published housekeeping (HK) genes, with 7 of them present in a predictive signature developed for breast cancer chemotherapy response. The results suggest HK genes may play a role in the underlying biology of how cancer cells respond to treatments. A set of 119 genes showed tissue-specific expression (10 breast, 2 colon, 7 lung, 3 ovarian, and 42 prostate), while 87 genes showed the least variation across tumors (4-fold change or less), including the genes MPRIP, MUS81, and AKT1. Twenty seven (27) of the tissue-specific genes and 48 of the least-varying genes are also reported to be HK genes. Differential expression was tested by unsupervised clustering in 14 of the 16 studies. BA325 genes successfully distinguished tumor and normal samples in gastric and ovarian cancers, and subtypes of lung and breast cancers, with implications for treatment responses. We conclude that BA325 expression profiles in all datasets examined include both tissue-specific genes and genes with similar expression across tissues. Preliminary results indicate BA325 genes may have utility as biomarkers in a surprisingly wide variety of tumor types (including leukemia) in addition to breast cancer, with discriminatory power in at least gastric, ovarian, lung and breast cancers. Thus, BA325 can greatly increase the biomarker repertoire beyond oncogenes or other driver genes and may provide relevant insight in novel oncology therapeutic targets.
Citation Format: Marcia V. Fournier, John C. Obenauer, Andreia Maer, Edward C. Goodwin. Genes involved in non-malignant breast phenotypes are widely expressed in multiple cancers and provide novel biomarkers of clinical outcomes and therapeutic response [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2018; 2018 Apr 14-18; Chicago, IL. Philadelphia (PA): AACR; Cancer Res 2018;78(13 Suppl):Abstract nr LB-217.
Collapse
|
6
|
Kasper LH, Qu C, Obenauer JC, McGoldrick DJ, Brindle PK. Genome-wide and single-cell analyses reveal a context dependent relationship between CBP recruitment and gene expression. Nucleic Acids Res 2014; 42:11363-82. [PMID: 25249627 PMCID: PMC4191404 DOI: 10.1093/nar/gku827] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2014] [Revised: 08/27/2014] [Accepted: 09/01/2014] [Indexed: 12/31/2022] Open
Abstract
Genome-wide distribution of histone H3K18 and H3K27 acetyltransferases, CBP (CREBBP) and p300 (EP300), is used to map enhancers and promoters, but whether these elements functionally require CBP/p300 remains largely uncertain. Here we compared global CBP recruitment with gene expression in wild-type and CBP/p300 double-knockout (dKO) fibroblasts. ChIP-seq using CBP-null cells as a control revealed nearby CBP recruitment for 20% of constitutively-expressed genes, but surprisingly, three-quarters of these genes were unaffected or slightly activated in dKO cells. Computationally defined enhancer-promoter-units (EPUs) having a CBP peak near the enhancer-like element were more predictive, with CBP/p300 deletion attenuating expression of 40% of such constitutively-expressed genes. Examining signal-responsive (Hypoxia Inducible Factor) genes showed that 97% were within 50 kilobases of an inducible CBP peak, and 70% of these required CBP/p300 for full induction. Unexpectedly, most inducible CBP peaks occurred near signal-nonresponsive genes. Finally, single-cell expression analysis revealed additional context dependence where some signal-responsive genes were not uniformly dependent on CBP/p300 in individual cells. While CBP/p300 was needed for full induction of some genes in single-cells, for other genes CBP/p300 increased the probability of maximal expression. Thus, target gene context influences the transcriptional requirement for CBP/p300, possibly by multiple mechanisms.
Collapse
Affiliation(s)
- Lawryn H Kasper
- Department of Biochemistry, St Jude Children's Research Hospital, 262 Danny Thomas Place, Memphis, TN 38105, USA
| | - Chunxu Qu
- Department of Computational Biology, St Jude Children's Research Hospital, 262 Danny Thomas Place, Memphis, TN 38105, USA
| | - John C Obenauer
- Department of Computational Biology, St Jude Children's Research Hospital, 262 Danny Thomas Place, Memphis, TN 38105, USA
| | - Daniel J McGoldrick
- Department of Computational Biology, St Jude Children's Research Hospital, 262 Danny Thomas Place, Memphis, TN 38105, USA
| | - Paul K Brindle
- Department of Biochemistry, St Jude Children's Research Hospital, 262 Danny Thomas Place, Memphis, TN 38105, USA
| |
Collapse
|
7
|
Wang J, Mullighan CG, Easton J, Roberts S, Heatley SL, Ma J, Rusch MC, Chen K, Harris CC, Ding L, Holmfeldt L, Payne-Turner D, Fan X, Wei L, Zhao D, Obenauer JC, Naeve C, Mardis ER, Wilson RK, Downing JR, Zhang J. CREST maps somatic structural variation in cancer genomes with base-pair resolution. Nat Methods 2011; 8:652-4. [PMID: 21666668 DOI: 10.1038/nmeth.1628] [Citation(s) in RCA: 390] [Impact Index Per Article: 30.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2010] [Accepted: 05/19/2011] [Indexed: 12/30/2022]
Abstract
We developed 'clipping reveals structure' (CREST), an algorithm that uses next-generation sequencing reads with partial alignments to a reference genome to directly map structural variations at the nucleotide level of resolution. Application of CREST to whole-genome sequencing data from five pediatric T-lineage acute lymphoblastic leukemias (T-ALLs) and a human melanoma cell line, COLO-829, identified 160 somatic structural variations. Experimental validation exceeded 80%, demonstrating that CREST had a high predictive accuracy.
Collapse
Affiliation(s)
- Jianmin Wang
- Department of Information Sciences, St. Jude Children's Research Hospital, Memphis, Tennessee, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
8
|
Liu X, Nguyen P, Liu W, Cheng C, Steeves M, Obenauer JC, Ma J, Geiger TL. T cell receptor CDR3 sequence but not recognition characteristics distinguish autoreactive effector and Foxp3(+) regulatory T cells. Immunity 2009; 31:909-20. [PMID: 20005134 DOI: 10.1016/j.immuni.2009.09.023] [Citation(s) in RCA: 35] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2009] [Revised: 09/21/2009] [Accepted: 09/25/2009] [Indexed: 01/12/2023]
Abstract
The source, specificity, and plasticity of the forkhead box transcription factor 3 (Foxp3)(+) regulatory T (Treg) and conventional T (Tconv) cell populations active at sites of autoimmune pathology are not well characterized. To evaluate this, we combined global repertoire analyses and functional assessments of isolated T cell receptors (TCR) from TCRalpha retrogenic mice with autoimmune encephalomyelitis. Treg and Tconv cell TCR repertoires were distinct, and autoantigen-specific Treg and Tconv cells were enriched in diseased tissue. Autoantigen sensitivity and fine specificity of these cells intersected, implying that differences in responsiveness were not responsible for lineage specification. Notably, autoreactive Treg and Tconv cells could be fully distinguished by an acidic versus aliphatic variation at a single TCR CDR3 residue. Our results imply that ontogenically distinct Treg and Tconv cell repertoires with convergent specificities for autoantigen respond during autoimmunity and argue against more than limited plasticity between Treg and Tconv cells during autoimmune inflammation.
Collapse
Affiliation(s)
- Xin Liu
- Department of Pathology, St. Jude Children's Research Hospital, Memphis, TN 38105, USA
| | | | | | | | | | | | | | | |
Collapse
|
9
|
Galea CA, High AA, Obenauer JC, Mishra A, Park CG, Punta M, Schlessinger A, Ma J, Rost B, Slaughter CA, Kriwacki RW. Large-scale analysis of thermostable, mammalian proteins provides insights into the intrinsically disordered proteome. J Proteome Res 2009; 8:211-26. [PMID: 19067583 DOI: 10.1021/pr800308v] [Citation(s) in RCA: 61] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Intrinsically disordered proteins are predicted to be highly abundant and play broad biological roles in eukaryotic cells. In particular, by virtue of their structural malleability and propensity to interact with multiple binding partners, disordered proteins are thought to be specialized for roles in signaling and regulation. However, these concepts are based on in silico analyses of translated whole genome sequences, not on large-scale analyses of proteins expressed in living cells. Therefore, whether these concepts broadly apply to expressed proteins is currently unknown. Previous studies have shown that heat-treatment of cell extracts lead to partial enrichment of soluble, disordered proteins. On the basis of this observation, we sought to address the current dearth of knowledge about expressed, disordered proteins by performing a large-scale proteomics study of thermostable proteins isolated from mouse fibroblast cells. With the use of novel multidimensional chromatography methods and mass spectrometry, we identified a total of 1320 thermostable proteins from these cells. Further, we used a variety of bioinformatics methods to analyze the structural and biological properties of these proteins. Interestingly, more than 900 of these expressed proteins were predicted to be substantially disordered. These were divided into two categories, with 514 predicted to be predominantly disordered and 395 predicted to exhibit both disordered and ordered/folded features. In addition, 411 of the thermostable proteins were predicted to be folded. Despite the use of heat treatment (60 min at 98 degrees C) to partially enrich for disordered proteins, which might have been expected to select for small proteins, the sequences of these proteins exhibited a wide range of lengths (622 +/- 555 residues (average length +/- standard deviation) for disordered proteins and 569 +/- 598 residues for folded proteins). Computational structural analyses revealed several unexpected features of the thermostable proteins: (1) disordered domains and coiled-coil domains occurred together in a large number of disordered proteins, suggesting functional interplay between these domains; and (2) more than 170 proteins contained lengthy domains (>300 residues) known to be folded. Reference to Gene Ontology Consortium functional annotations revealed that, while disordered proteins play diverse biological roles in mouse fibroblasts, they do exhibit heightened involvement in several functional categories, including, cytoskeletal structure and cell movement, metabolic and biosynthetic processes, organelle structure, cell division, gene transcription, and ribonucleoprotein complexes. We believe that these results reflect the general properties of the mouse intrinsically disordered proteome (IDP-ome) although they also reflect the specialized physiology of fibroblast cells. Large-scale identification of expressed, thermostable proteins from other cell types in the future, grown under varied physiological conditions, will dramatically expand our understanding of the structural and biological properties of disordered eukaryotic proteins.
Collapse
Affiliation(s)
- Charles A Galea
- Department of Structural Biology, St. Jude Children's Research Hospital, 262 Danny Thomas Place, Memphis, Tennessee 38105, USA
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
10
|
Abstract
Avian influenza viruses have adapted to human hosts, causing pandemics in humans. The key host-specific amino acid mutations required for an avian influenza virus to function in humans are unknown. Through multiple-sequence alignment and statistical testing of each aligned amino acid, we identified markers that discriminate human influenza viruses from avian influenza viruses. We applied strict thresholds to select only markers which are highly preserved in human influenza virus isolates over time. We found that a subset of these persistent host markers exist in all human pandemic influenza virus sequences from 1918, 1957, and 1968, while others are acquired as the virus becomes a seasonal influenza virus. We also show that human H5N1 influenza viruses are significantly more likely to contain the amino acid predominant in human strains for a few persistent host markers than avian H5N1 influenza viruses. This sporadic enrichment of amino acids present in human-hosted viruses may indicate that some H5N1 viruses have made modest adaptations to their new hosts in the recent past. The markers reported here should be useful in monitoring potential pandemic influenza viruses.
Collapse
Affiliation(s)
- David B Finkelstein
- Hartwell Center for Bioinformatics and Biotechnology, St. Jude Children's Research Hospital, and Department of Pathology, University of Tennessee Health Science Center, Memphis, TN 38105-2794, USA
| | | | | | | | | | | | | |
Collapse
|
11
|
Pottier N, Cheok MH, Yang W, Assem M, Tracey L, Obenauer JC, Panetta JC, Relling MV, Evans WE. Expression of SMARCB1 modulates steroid sensitivity in human lymphoblastoid cells: identification of a promoter SNP that alters PARP1 binding and SMARCB1 expression. Hum Mol Genet 2007; 16:2261-71. [PMID: 17616514 DOI: 10.1093/hmg/ddm178] [Citation(s) in RCA: 32] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Although cure rate of childhood acute lymphoblastic leukemia (ALL) has surpassed 80%, drug resistance remains a major cause of treatment failure. We previously identified a panel of 33 genes differentially expressed in prednisolone sensitive versus resistant ALL cells from newly diagnosed children. Here we used bioinformatics to identify resistance genes most likely to contain single nucleotide polymorphisms (SNPs) in their promoter region. The highest priority gene was SMARCB1, a core member of the SWI/SNF complex which promotes glucocorticoid effects through nucleosome remodeling. We identified several SNPs in the SMARCB1 promoter in lymphoblastoid cells from 90 individuals in the Centre d'Etude du Polymorphisme Humain (CEPH) panel. Among these SNPs, the -228G>T SNP (allele frequency 9.4%) was the only one that significantly increased reporter activity in human ALL cell lines. Furthermore, we identified nuclear protein poly (ADP-ribose) polymerase family, member 1 (PARP1) as a nuclear protein binding to the SMARCB1 promoter and showed that the -228 SNP significantly altered PARP1 binding affinity. The -228G>T SNP altered SMARCB1 mRNA and protein levels and a positive association was found between the SMARCB1 mRNA level and both the -228 genotype and prednisolone sensitivity in CEPH cell lines. Finally, knockdown experiments performed in human ALL cell lines confirmed that lower SMARCB1 expression increased prednisolone resistance. In summary, we provide functional evidence that SMARCB1 is involved in prednisolone resistance and identified a promoter SNP that alters the level of SMARCB1 mRNA and protein expression and the binding of PARP1 to the SMARCB1 promoter.
Collapse
MESH Headings
- Amino Acid Sequence
- Blotting, Western
- Cell Line
- Chromosomal Proteins, Non-Histone/genetics
- Chromosomal Proteins, Non-Histone/metabolism
- Chromosome Mapping
- DNA-Binding Proteins/genetics
- DNA-Binding Proteins/metabolism
- Electrophoretic Mobility Shift Assay
- Gene Expression/drug effects
- Gene Frequency
- Genotype
- Humans
- Lymphocytes/cytology
- Lymphocytes/drug effects
- Lymphocytes/metabolism
- Molecular Sequence Data
- Mutagenesis, Site-Directed
- Poly (ADP-Ribose) Polymerase-1
- Poly(ADP-ribose) Polymerases/genetics
- Poly(ADP-ribose) Polymerases/metabolism
- Polymorphism, Single Nucleotide
- Prednisolone/pharmacology
- Promoter Regions, Genetic/genetics
- Protein Binding
- RNA Interference
- RNA, Messenger/genetics
- RNA, Messenger/metabolism
- RNA, Small Interfering/genetics
- SMARCB1 Protein
- Sequence Alignment
- Spectrometry, Mass, Matrix-Assisted Laser Desorption-Ionization
- Steroids/pharmacology
- Transcription Factors/genetics
- Transcription Factors/metabolism
Collapse
Affiliation(s)
- Nicolas Pottier
- Department of Pharmaceutical Sciences, St Jude Children's Research Hospital, Memphis, TN 38105, USA
| | | | | | | | | | | | | | | | | |
Collapse
|
12
|
Abstract
Intrinsically unstructured proteins (IUPs) represent an important class of proteins primarily involved in cellular signaling and regulation. The aim of this study was to develop methodology for the enrichment and identification of IUPs. We show that heat treatment of NIH3T3 mouse fibroblast cell extracts at 98 degrees C selects for IUPs. The majority of these IUPs were cytosolic or nuclear proteins involved in cell signaling or regulation. These studies represent the first large-scale experimental investigation of the intrinsically unstructured mammalian proteome.
Collapse
Affiliation(s)
- Charles A Galea
- Department of Structural Biology, St. Jude Children's Research Hospital, and Department of Molecular Sciences, University of Tennessee Health Sciences Center, Memphis, Tennessee 38163, USA
| | | | | | | | | | | |
Collapse
|
13
|
Affiliation(s)
- John C. Obenauer
- Hartwell Center for Bioinformatics and Biotechnology, St. Jude Children's Research Hospital, Memphis, TN 38105, USA
- Department of Pathology, University of Tennessee Health Science Center, Memphis, TN 38163, USA
| | - Yiping Fan
- Hartwell Center for Bioinformatics and Biotechnology, St. Jude Children's Research Hospital, Memphis, TN 38105, USA
- Department of Pathology, University of Tennessee Health Science Center, Memphis, TN 38163, USA
| | - Clayton W. Naeve
- Hartwell Center for Bioinformatics and Biotechnology, St. Jude Children's Research Hospital, Memphis, TN 38105, USA
- Department of Pathology, University of Tennessee Health Science Center, Memphis, TN 38163, USA
| |
Collapse
|
14
|
Obenauer JC, Denson J, Mehta PK, Su X, Mukatira S, Finkelstein DB, Xu X, Wang J, Ma J, Fan Y, Rakestraw KM, Webster RG, Hoffmann E, Krauss S, Zheng J, Zhang Z, Naeve CW. Large-scale sequence analysis of avian influenza isolates. Science 2006; 311:1576-80. [PMID: 16439620 DOI: 10.1126/science.1121586] [Citation(s) in RCA: 461] [Impact Index Per Article: 25.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
Abstract
The spread of H5N1 avian influenza viruses (AIVs) from China to Europe has raised global concern about their potential to infect humans and cause a pandemic. In spite of their substantial threat to human health, remarkably little AIV whole-genome information is available. We report here a preliminary analysis of the first large-scale sequencing of AIVs, including 2196 AIV genes and 169 complete genomes. We combine this new information with public AIV data to identify new gene alleles, persistent genotypes, compensatory mutations, and a potential virulence determinant.
Collapse
MESH Headings
- Animals
- Birds/virology
- Computational Biology
- Genes, Viral
- Genome, Viral
- Humans
- Influenza A Virus, H1N1 Subtype/genetics
- Influenza A Virus, H2N2 Subtype/genetics
- Influenza A Virus, H3N2 Subtype/genetics
- Influenza A Virus, H3N8 Subtype/genetics
- Influenza A Virus, H5N1 Subtype/chemistry
- Influenza A Virus, H5N1 Subtype/genetics
- Influenza A Virus, H5N1 Subtype/pathogenicity
- Influenza A Virus, H5N2 Subtype/genetics
- Influenza A Virus, H7N7 Subtype/genetics
- Influenza A Virus, H9N2 Subtype/genetics
- Influenza A virus/chemistry
- Influenza A virus/genetics
- Influenza A virus/isolation & purification
- Influenza A virus/pathogenicity
- Influenza in Birds/virology
- Influenza, Human/virology
- Molecular Sequence Data
- Mutation
- Phylogeny
- RNA, Viral/genetics
- Reassortant Viruses/genetics
- Sequence Analysis, DNA
- Viral Nonstructural Proteins/chemistry
- Viral Nonstructural Proteins/genetics
- Viral Proteins/chemistry
- Viral Proteins/genetics
- Virulence Factors/chemistry
- Virulence Factors/genetics
Collapse
Affiliation(s)
- John C Obenauer
- Hartwell Center for Bioinformatics and Biotechnology, St. Jude Children's Research Hospital, Memphis, TN 38105, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
15
|
Wadkins RM, Hyatt JL, Wei X, Yoon KJP, Wierdl M, Edwards CC, Morton CL, Obenauer JC, Damodaran K, Beroza P, Danks MK, Potter PM. Identification and characterization of novel benzil (diphenylethane-1,2-dione) analogues as inhibitors of mammalian carboxylesterases. J Med Chem 2005; 48:2906-15. [PMID: 15828829 DOI: 10.1021/jm049011j] [Citation(s) in RCA: 142] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Carboxylesterases (CE) are ubiquitous enzymes responsible for the metabolism of xenobiotics. Because the structural and amino acid homology among esterases of different classes, the identification of selective inhibitors of these proteins has proved problematic. Using Telik's target-related affinity profiling (TRAP) technology, we have identified a class of compounds based on benzil (1,2-diphenylethane-1,2-dione) that are potent CE inhibitors, with K(i) values in the low nanomolar range. Benzil and 30 analogues demonstrated selective inhibition of CEs, with no inhibitory activity toward human acetylcholinesterase or butyrylcholinesterase. Analysis of structurally related compounds indicated that the ethane-1,2-dione moiety was essential for enzyme inhibition and that potency was dependent on the presence of, and substitution within, the benzene ring. 3D-QSAR analyses of these benzil analogues for three different mammalian CEs demonstrated excellent correlations of observed versus predicted K(i) (r(2) > 0.91), with cross-validation coefficients (q(2)) of 0.9. Overall, these results suggest that selective inhibitors of CEs with potential for use in clinical applications can be designed.
Collapse
Affiliation(s)
- Randy M Wadkins
- Department of Chemistry and Biochemistry, University of Mississippi, University, Mississippi 38677, USA
| | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
16
|
Abstract
Eukaryotic proteins typically contain one or more modular domains such as kinases, phosphatases, and phoshopeptide-binding domains, as well as characteristic sequence motifs that direct post-translational modifications such as phosphorylation, or mediate binding to specific modular domains. A computational approach to predict protein interactions on a proteome-wide basis would therefore consist of identifying modular domains and sequence motifs from protein primary sequence data, creating sequence specificity-based algorithms to connect a domain in one protein with a motif in another in "interaction space," and then graphically constructing possible interaction networks. Computational methods for predicting modular domains in proteins have been quite successful, but identifying the short sequence motifs these domains recognize has been more difficult. We are developing improved methods to identify these motifs by combining experimental and computational techniques with databases of sequences and binding information. Scansite is a web-accessible program that predicts interactions between proteins using experimental binding data from peptide library and phage display experiments. This program focuses on domains important in cell signaling, but it can, in principle, be used for other interactions if the domains and binding motifs are known. This chapter describes in detail how to use Scansite to predict the binding partners of an input protein, and how to find all proteins that contain a given sequence motif.
Collapse
Affiliation(s)
- John C Obenauer
- Center for Cancer Research, Massachusetts Institute of Technology, Cambridge, MA, USA
| | | |
Collapse
|
17
|
Abstract
Scansite identifies short protein sequence motifs that are recognized by modular signaling domains, phosphorylated by protein Ser/Thr- or Tyr-kinases or mediate specific interactions with protein or phospholipid ligands. Each sequence motif is represented as a position-specific scoring matrix (PSSM) based on results from oriented peptide library and phage display experiments. Predicted domain-motif interactions from Scansite can be sequentially combined, allowing segments of biological pathways to be constructed in silico. The current release of Scansite, version 2.0, includes 62 motifs characterizing the binding and/or substrate specificities of many families of Ser/Thr- or Tyr-kinases, SH2, SH3, PDZ, 14-3-3 and PTB domains, together with signature motifs for PtdIns(3,4,5)P(3)-specific PH domains. Scansite 2.0 contains significant improvements to its original interface, including a number of new generalized user features and significantly enhanced performance. Searches of all SWISS-PROT, TrEMBL, Genpept and Ensembl protein database entries are now possible with run times reduced by approximately 60% when compared with Scansite version 1.0. Scansite 2.0 allows restricted searching of species-specific proteins, as well as isoelectric point and molecular weight sorting to facilitate comparison of predictions with results from two-dimensional gel electrophoresis experiments. Support for user-defined motifs has been increased, allowing easier input of user-defined matrices and permitting user-defined motifs to be combined with pre-compiled Scansite motifs for dual motif searching. In addition, a new series of Sequence Match programs for non-quantitative user-defined motifs has been implemented. Scansite is available via the World Wide Web at http://scansite.mit.edu.
Collapse
Affiliation(s)
- John C Obenauer
- Center for Cancer Research, E18-580, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, MA 02139, USA
| | | | | |
Collapse
|