1
|
Lader AS, Ramoni MF, Zetter BR, Kohane IS, Kwiatkowski DJ. Identification of a transcriptional profile associated with in vitro invasion in non-small cell lung cancer cell lines. Cancer Biol Ther 2014; 3:624-31. [PMID: 15153803 DOI: 10.4161/cbt.3.7.914] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022] Open
Abstract
Although much has been learned about basic mechanisms of cell invasion, the genes whose expression is required for this process by malignant cell lines have remained obscure. We assessed invasion through Matrigel using EGF as a chemoattractant and gene expression profiles using oligonucleotide microarrays for 22 non-small cell lung cancer cell lines. The expression of 22 genes were significantly correlated (p < 0.001) with the measured invasion index. Cluster analysis demonstrated that gene expression profiles classify the cell lines into low and high invasive subgroups. Considering invasiveness as a dichotomous variable, Bayesian analysis was used to identify genes that have the highest probability of being differentially expressed between the high and low invasion groups. This analysis identified 16 genes whose expression was associated with invasiveness. "Leave one out" cross validation was 91% accurate. Nine genes were identified in both correlation and Bayesian analyses. Seven of the nine genes were negatively associated with invasion and four of those genes are plasma membrane proteins. The two genes with the highest inverse association with invasion, TACSTD1 and CLDN3, are involved with cell adhesion and cell-cell interactions, respectively. Interestingly, the gene with the highest positive association with invasion, SERPINE1 (PAI-1), is a protease inhibitor. These and the other genes identified by both analyses represent targets for further study to assess their importance in non-small cell lung cancer invasion and metastasis.
Collapse
Affiliation(s)
- Alan S Lader
- Hematology Division, Brigham and Women's Hospital, Boston, Massachusetts 02115, USA.
| | | | | | | | | |
Collapse
|
2
|
Abstract
Background Pre-symptomatic prediction of disease and drug response based on genetic testing is a critical component of personalized medicine. Previous work has demonstrated that the predictive capacity of genetic testing is constrained by the heritability and prevalence of the tested trait, although these constraints have only been approximated under the assumption of a normally distributed genetic risk distribution. Results Here, we mathematically derive the absolute limits that these factors impose on test accuracy in the absence of any distributional assumptions on risk. We present these limits in terms of the best-case receiver-operating characteristic (ROC) curve, consisting of the best-case test sensitivities and specificities, and the AUC (area under the curve) measure of accuracy. We apply our method to genetic prediction of type 2 diabetes and breast cancer, and we additionally show the best possible accuracy that can be obtained from integrated predictors, which can incorporate non-genetic features. Conclusion Knowledge of such limits is valuable in understanding the implications of genetic testing even before additional associations are identified.
Collapse
|
3
|
Vigetti D, Rizzi M, Moretto P, Deleonibus S, Dreyfuss JM, Karousou E, Viola M, Clerici M, Hascall VC, Ramoni MF, De Luca G, Passi A. Glycosaminoglycans and glucose prevent apoptosis in 4-methylumbelliferone-treated human aortic smooth muscle cells. J Biol Chem 2011; 286:34497-503. [PMID: 21768115 DOI: 10.1074/jbc.m111.266312] [Citation(s) in RCA: 39] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/10/2023] Open
Abstract
Smooth muscle cells (SMCs) have a pivotal role in cardiovascular diseases and are responsible for hyaluronan (HA) deposition in thickening vessel walls. HA regulates SMC proliferation, migration, and inflammation, which accelerates neointima formation. We used the HA synthesis inhibitor 4-methylumbelliferone (4-MU) to reduce HA production in human aortic SMCs and found a significant increase of apoptotic cells. Interestingly, the exogenous addition of HA together with 4-MU reduced apoptosis. A similar anti-apoptotic effect was observed also by adding other glycosaminoglycans and glucose to 4-MU-treated cells. Furthermore, the anti-apoptotic effect of HA was mediated by Toll-like receptor 4, CD44, and PI3K but not by ERK1/2.
Collapse
Affiliation(s)
- Davide Vigetti
- Dipartimento di Scienze Biomediche Sperimentali e Cliniche, Università degli Studi dell'Insubria, via JH Dunant 5, 21100 Varese, Italy
| | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
4
|
Ferrazzi F, Engel FB, Wu E, Moseman AP, Kohane IS, Bellazzi R, Ramoni MF. Inferring cell cycle feedback regulation from gene expression data. J Biomed Inform 2011; 44:565-75. [PMID: 21310265 DOI: 10.1016/j.jbi.2011.02.002] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2010] [Revised: 02/02/2011] [Accepted: 02/03/2011] [Indexed: 12/01/2022]
Abstract
Feedback control is an important regulatory process in biological systems, which confers robustness against external and internal disturbances. Genes involved in feedback structures are therefore likely to have a major role in regulating cellular processes. Here we rely on a dynamic Bayesian network approach to identify feedback loops in cell cycle regulation. We analyzed the transcriptional profile of the cell cycle in HeLa cancer cells and identified a feedback loop structure composed of 10 genes. In silico analyses showed that these genes hold important roles in system's dynamics. The results of published experimental assays confirmed the central role of 8 of the identified feedback loop genes in cell cycle regulation. In conclusion, we provide a novel approach to identify critical genes for the dynamics of biological processes. This may lead to the identification of therapeutic targets in diseases that involve perturbations of these dynamics.
Collapse
Affiliation(s)
- Fulvia Ferrazzi
- Dipartimento di Informatica e Sistemistica, Università degli Studi di Pavia, Pavia, Italy.
| | | | | | | | | | | | | |
Collapse
|
5
|
Zollanvari A, Saccone NL, Bierut LJ, Ramoni MF, Alterovitz G. Is the reduction of dimensionality to a small number of features always necessary in constructing predictive models for analysis of complex diseases or behaviours? Annu Int Conf IEEE Eng Med Biol Soc 2011; 2011:3573-3576. [PMID: 22255111 DOI: 10.1109/iembs.2011.6090596] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/31/2023]
Abstract
Gene expression and genome wide association data have provided researchers the opportunity to study many complex traits and diseases. When designing prognostic and predictive models capable of phenotypic classification in this area, significant reduction of dimensionality through stringent filtering and/or feature selection is often deemed imperative. Here, this work challenges this presumption through both theoretical and empirical analysis. This work demonstrates that by a proper compromise between structure of the selected model and the number of features, one is able to achieve better performance even in large dimensionality. The inclusion of many genes/variants in the classification rules can help shed new light on the analysis of complex traitstraits that are typically determined by many causal variants with small effect size.
Collapse
Affiliation(s)
- Amin Zollanvari
- Children’s Hospital Informatics Program, Harvard-MIT Division of Health Sciences and Technology, Harvard Medical School, and Partners Healthcare Center for Personalized Genetic Medicine, Boston, MA, USA.
| | | | | | | | | |
Collapse
|
6
|
Uhl GR, Drgon T, Johnson C, Ramoni MF, Behm FM, Rose JE. Genome-wide association for smoking cessation success in a trial of precessation nicotine replacement. Mol Med 2010; 16:513-26. [PMID: 20811658 PMCID: PMC2972392 DOI: 10.2119/molmed.2010.00052] [Citation(s) in RCA: 48] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2010] [Accepted: 08/23/2010] [Indexed: 02/06/2023] Open
Abstract
Abilities to successfully quit smoking display substantial evidence for heritability in classic and molecular genetic studies. Genome-wide association (GWA) studies have demonstrated single-nucleotide polymorphisms (SNPs) and haplotypes that distinguish successful quitters from individuals who were unable to quit smoking in clinical trial participants and in community samples. Many of the subjects in these clinical trial samples were aided by nicotine replacement therapy (NRT). We now report novel GWA results from participants in a clinical trial that sought dose/response relationships for "precessation" NRT. In this trial, 369 European-American smokers were randomized to 21 or 42 mg NRT, initiated 2 wks before target quit dates. Ten-week continuous smoking abstinence was assessed on the basis of self-reports and carbon monoxide levels. SNP genotyping used Affymetrix 6.0 arrays. GWA results for smoking cessation success provided no P value that reached "genome-wide" significance. Compared with chance, these results do identify (a) more clustering of nominally positive results within small genomic regions, (b) more overlap between these genomic regions and those identified in six prior successful smoking cessation GWA studies and (c) sets of genes that fall into gene ontology categories that appear to be biologically relevant. The 1,000 SNPs with the strongest associations form a plausible Bayesian network; no such network is formed by randomly selected sets of SNPs. The data provide independent support, based on individual genotyping, for many loci previously nominated on the basis of data from genotyping in pooled DNA samples. These results provide further support for the idea that aid for smoking cessation may be personalized on the basis of genetic predictors of outcome.
Collapse
Affiliation(s)
- George R Uhl
- Molecular Neurobiology Branch, National Institutes of Health Intramural Research Program, National Institute on Drug Abuse (NIH-IRP, NIDA), Baltimore, Maryland, United States of America
| | - Tomas Drgon
- Molecular Neurobiology Branch, National Institutes of Health Intramural Research Program, National Institute on Drug Abuse (NIH-IRP, NIDA), Baltimore, Maryland, United States of America
| | - Catherine Johnson
- Molecular Neurobiology Branch, National Institutes of Health Intramural Research Program, National Institute on Drug Abuse (NIH-IRP, NIDA), Baltimore, Maryland, United States of America
| | - Marco F Ramoni
- Children’s Hospital Informatics Program, Harvard–Massachusetts Institute of Technology (MIT) Division of Health Sciences and Technology, Boston, Massachusetts, United States of America
| | - Frederique M Behm
- Department of Psychiatry and Center for Nicotine and Smoking Cessation Research, Duke University, Durham, North Carolina, United States of America
| | - Jed E Rose
- Department of Psychiatry and Center for Nicotine and Smoking Cessation Research, Duke University, Durham, North Carolina, United States of America
| |
Collapse
|
7
|
Abstract
BACKGROUND Identification of expression quantitative trait loci (eQTLs) is an emerging area in genomic study. The task requires an integrated analysis of genome-wide single nucleotide polymorphism (SNP) data and gene expression data, raising a new computational challenge due to the tremendous size of data. RESULTS We develop a method to identify eQTLs. The method represents eQTLs as information flux between genetic variants and transcripts. We use information theory to simultaneously interrogate SNP and gene expression data, resulting in a Transcriptional Information Map (TIM) which captures the network of transcriptional information that links genetic variations, gene expression and regulatory mechanisms. These maps are able to identify both cis- and trans- regulating eQTLs. The application on a dataset of leukemia patients identifies eQTLs in the regions of the GART, PCP4, DSCAM, and RIPK4 genes that regulate ADAMTS1, a known leukemia correlate. CONCLUSIONS The information theory approach presented in this paper is able to infer the dependence networks between SNPs and transcripts, which in turn can identify cis- and trans-eQTLs. The application of our method to the leukemia study explains how genetic variants and gene expression are linked to leukemia.
Collapse
Affiliation(s)
- Hsun-Hsien Chang
- Children's Hospital Informatics Program, Harvard-MIT Division of Health Sciences and Technology, Harvard Medical School, Boston, Massachusetts, USA.
| | | | | | | |
Collapse
|
8
|
Lee JJ, Essers JB, Kugathasan S, Escher JC, Lettre G, Butler JL, Stephens MC, Ramoni MF, Grand RJ, Hirschhorn J. Association of linear growth impairment in pediatric Crohn's disease and a known height locus: a pilot study. Ann Hum Genet 2010; 74:489-97. [PMID: 20846217 DOI: 10.1111/j.1469-1809.2010.00606.x] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
The etiology of growth impairment in Crohn's disease (CD) has been inadequately explained by nutritional, hormonal, and/or disease-related factors, suggesting that genetics may be an additional contributor. The aim of this cross-sectional study was to investigate genetic variants associated with linear growth in pediatric-onset CD. We genotyped 951 subjects (317 CD patient-parent trios) for 64 polymorphisms within 14 CD-susceptibility and 23 stature-associated loci. Patient height-for-age Z-score < -1.64 was used to dichotomize probands into growth-impaired and nongrowth-impaired groups. The transmission disequilibrium test (TDT) was used to study association to growth impairment. There was a significant association between growth impairment in CD (height-for-age Z-score < -1.64) and a stature-related polymorphism in the dymeclin gene DYM (rs8099594) (OR = 3.2, CI [1.57-6.51], p = 0.0007). In addition, there was nominal over-transmission of two CD-susceptibility alleles, 10q21.1 intergenic region (rs10761659) and ATG16L1 (rs10210302), in growth-impaired CD children (OR = 2.36, CI [1.26-4.41] p = 0.0056 and OR = 2.45, CI [1.22-4.95] p = 0.0094, respectively). Our data indicate that genetic influences due to stature-associated and possibly CD risk alleles may predispose CD patients to alterations in linear growth. This is the first report of a link between a stature-associated locus and growth impairment in CD.
Collapse
|
9
|
Chang HH, Dreyfuss JM, Ramoni MF. A transcriptional network signature characterizes lung cancer subtypes. Cancer 2010; 117:353-60. [PMID: 20839314 DOI: 10.1002/cncr.25592] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2010] [Revised: 07/20/2010] [Accepted: 07/20/2010] [Indexed: 11/07/2022]
Abstract
BACKGROUND Transcriptional networks play a central role in cancer development. The authors described a systems biology approach to cancer classification based on the reverse engineering of the transcriptional network surrounding the 2 most common types of lung cancer: adenocarcinoma (AC) and squamous cell carcinoma (SCC). METHODS A transcriptional network classifier was inferred from the molecular profiles of 111 human lung carcinomas. The authors tested its classification accuracy in 7 independent cohorts, for a total of 422 subjects of Caucasian, African, and Asian descent. RESULTS The model for distinguishing AC from SCC was a 25-gene network signature. Its performance on the 7 independent cohorts achieved 95.2% classification accuracy. Even more surprisingly, 95% of this accuracy was explained by the interplay of 3 genes (KRT6A, KRT6B, KRT6C) on a narrow cytoband of chromosome 12. The role of this chromosomal region in distinguishing AC and SCC was further confirmed by the analysis of another group of 28 independent subjects assayed by DNA copy number changes. The copy number variations of bands 12q12, 12q13, and 12q12-13 discriminated these samples with 84% accuracy. CONCLUSIONS These results suggest the existence of a robust signature localized in a relatively small area of the genome, and show the clinical potential of reverse engineering transcriptional networks from molecular profiles.
Collapse
Affiliation(s)
- Hsun-Hsien Chang
- Children's Hospital Informatics Program, Harvard-Massachusetts Institute of Technology Division of Health Sciences and Technology, Harvard Medical School, Boston, Massachusetts 02115, USA.
| | | | | |
Collapse
|
10
|
McGeachie M, Ramoni RLB, Mychaleckyj JC, Furie KL, Dreyfuss JM, Liu Y, Herrington D, Guo X, Lima JA, Post W, Rotter JI, Rich S, Sale M, Ramoni MF. Integrative predictive model of coronary artery calcification in atherosclerosis. Circulation 2010; 120:2448-54. [PMID: 19948975 DOI: 10.1161/circulationaha.109.865501] [Citation(s) in RCA: 37] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Abstract
BACKGROUND Many different genetic and clinical factors have been identified as causes or contributors to atherosclerosis. We present a model of preclinical atherosclerosis based on genetic and clinical data that predicts the presence of coronary artery calcification in healthy Americans of European descent 45 to 84 years of age in the Multi-Ethnic Study of Atherosclerosis (MESA). METHODS AND RESULTS We assessed 712 individuals for the presence or absence of coronary artery calcification and assessed their genotypes for 2882 single-nucleotide polymorphisms. With the use of these single-nucleotide polymorphisms and relevant clinical data, a Bayesian network that predicts the presence of coronary calcification was constructed. The model contained 13 single-nucleotide polymorphisms (from genes AGTR1, ALOX15, INSR, PRKAB1, IL1R2, ESR2, KCNK1, FBLN5, PPARA, VEGFA, PON1, TDRD6, PLA2G7, and 1 ancestry informative marker) and 5 clinical variables (sex, age, weight, smoking, and diabetes mellitus) and achieved 85% predictive accuracy, as measured by area under the receiver operating characteristic curve. This is a significant (P<0.001) improvement on models that use just the single-nucleotide polymorphism data or just the clinical variables. CONCLUSIONS We present an investigation of joint genetic and clinical factors associated with atherosclerosis that shows predictive results for both cases, as well as enhanced performance for their combination.
Collapse
Affiliation(s)
- Michael McGeachie
- Harvard Partners Center for Genetics and Genomics, Harvard Medical School, 77 Avenue Louis Pasteur, Boston, MA 02115, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
11
|
Himes BE, Wu AC, Duan QL, Klanderman B, Litonjua AA, Tantisira K, Ramoni MF, Weiss ST. Predicting response to short-acting bronchodilator medication using Bayesian networks. Pharmacogenomics 2009; 10:1393-412. [PMID: 19761364 DOI: 10.2217/pgs.09.93] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022] Open
Abstract
AIMS Bronchodilator response tests measure the effect of beta(2)-agonists, the most commonly used short-acting reliever drugs for asthma. We sought to relate candidate gene SNP data with bronchodilator response and measure the predictive accuracy of a model constructed with genetic variants. MATERIALS & METHODS Bayesian networks, multivariate models that are able to account for simultaneous associations and interactions among variables, were used to create a predictive model of bronchodilator response using candidate gene SNP data from 308 Childhood Asthma Management Program Caucasian subjects. RESULTS The model found that 15 SNPs in 15 genes predict bronchodilator response with fair accuracy, as established by a fivefold cross-validation area under the receiver-operating characteristic curve of 0.75 (standard error: 0.03). CONCLUSION Bayesian networks are an attractive approach to analyze large-scale pharmacogenetic SNP data because of their ability to automatically learn complex models that can be used for the prediction and discovery of novel biological hypotheses.
Collapse
Affiliation(s)
- Blanca E Himes
- Harvard-MIT Division of Health Sciences and Technology, MA, USA.
| | | | | | | | | | | | | | | |
Collapse
|
12
|
Alterovitz G, Muso T, Ramoni MF. The challenges of informatics in synthetic biology: from biomolecular networks to artificial organisms. Brief Bioinform 2009; 11:80-95. [PMID: 19906839 DOI: 10.1093/bib/bbp054] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
The field of synthetic biology holds an inspiring vision for the future; it integrates computational analysis, biological data and the systems engineering paradigm in the design of new biological machines and systems. These biological machines are built from basic biomolecular components analogous to electrical devices, and the information flow among these components requires the augmentation of biological insight with the power of a formal approach to information management. Here we review the informatics challenges in synthetic biology along three dimensions: in silico, in vitro and in vivo. First, we describe state of the art of the in silico support of synthetic biology, from the specific data exchange formats, to the most popular software platforms and algorithms. Next, we cast in vitro synthetic biology in terms of information flow, and discuss genetic fidelity in DNA manipulation, development strategies of biological parts and the regulation of biomolecular networks. Finally, we explore how the engineering chassis can manipulate biological circuitries in vivo to give rise to future artificial organisms.
Collapse
Affiliation(s)
- Gil Alterovitz
- Children's Hospital Informatics Program, Harvard/MITDivision of Health Sciences and Technology, USA
| | | | | |
Collapse
|
13
|
Yang G, Thieu K, Tsai KY, Piris A, Udayakumar D, Njauw CNJ, Ramoni MF, Tsao H. Dynamic gene expression analysis links melanocyte growth arrest with nevogenesis. Cancer Res 2009; 69:9029-37. [PMID: 19903842 DOI: 10.1158/0008-5472.can-09-0783] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Like all primary cells in vitro, normal human melanocytes exhibit a physiologic decay in proliferative potential as it transitions to a growth-arrested state. The underlying transcriptional program(s) that regulate this phenotypic change is largely unknown. To identify molecular determinants of this process, we performed a Bayesian-based dynamic gene expression analysis on primary melanocytes undergoing proliferative arrest. This analysis revealed several related clusters whose expression behavior correlated with the melanocyte growth kinetics; we designated these clusters the melanocyte growth arrest program (MGAP). These MGAP genes were preferentially represented in benign melanocytic nevi over melanomas and selectively mapped to the hepatocyte fibrosis pathway. This transcriptional relationship between melanocyte growth stasis, nevus biology, and fibrogenic signaling was further validated in vivo by the demonstration of strong pericellular collagen deposition within benign nevi but not melanomas. Taken together, our study provides a novel view of fibroplasia in both melanocyte biology and nevogenesis.
Collapse
Affiliation(s)
- Guang Yang
- Wellman Center for Photomedicine, Massachusetts General Hospital Boston, Massachusetts 02114-2696, USA
| | | | | | | | | | | | | | | |
Collapse
|
14
|
Abstract
Background Gene interactions play a central role in transcriptional networks. Many studies have performed genome-wide expression analysis to reconstruct regulatory networks to investigate disease processes. Since biological processes are outcomes of regulatory gene interactions, this paper develops a system biology approach to infer function-dependent transcriptional networks modulating phenotypic traits, which serve as a classifier to identify tissue states. Due to gene interactions taken into account in the analysis, we can achieve higher classification accuracy than existing methods. Results Our system biology approach is carried out by the Bayesian networks framework. The algorithm consists of two steps: gene filtering by Bayes factor followed by collinearity elimination via network learning. We validate our approach with two clinical data. In the study of lung cancer subtypes discrimination, we obtain a 25-gene classifier from 111 training samples, and the test on 422 independent samples achieves 95% classification accuracy. In the study of thoracic aortic aneurysm (TAA) diagnosis, 61 samples determine a 34-gene classifier, whose diagnosis accuracy on 33 independent samples achieves 82%. The performance comparisons with three other popular methods, PCA/LDA, PAM, and Weighted Voting, confirm that our approach yields superior classification accuracy and a more compact signature. Conclusions The system biology approach presented in this paper is able to infer function-dependent transcriptional networks, which in turn can classify biological samples with high accuracy. The validation of our classifier using clinical data demonstrates the promising value of our proposed approach for disease diagnosis.
Collapse
Affiliation(s)
- Hsun-Hsien Chang
- Childrens' Hospital Informatics Program, Harvard-MIT Division of Health Sciences and Technology, Harvard Medical School, Boston, Massachusetts, USA.
| | | |
Collapse
|
15
|
Himes BE, Dai Y, Kohane IS, Weiss ST, Ramoni MF. Prediction of chronic obstructive pulmonary disease (COPD) in asthma patients using electronic medical records. J Am Med Inform Assoc 2009; 16:371-9. [PMID: 19261943 PMCID: PMC2732240 DOI: 10.1197/jamia.m2846] [Citation(s) in RCA: 96] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2008] [Accepted: 01/30/2009] [Indexed: 11/10/2022] Open
Abstract
OBJECTIVE Identify clinical factors that modulate the risk of progression to COPD among asthma patients using data extracted from electronic medical records. DESIGN Demographic information and comorbidities from adult asthma patients who were observed for at least 5 years with initial observation dates between 1988 and 1998, were extracted from electronic medical records of the Partners Healthcare System using tools of the National Center for Biomedical Computing "Informatics for Integrating Biology to the Bedside" (i2b2). MEASUREMENTS A predictive model of COPD was constructed from a set of 9,349 patients (843 cases, 8,506 controls) using Bayesian networks. The model's predictive accuracy was tested using it to predict COPD in a future independent set of asthma patients (992 patients; 46 cases, 946 controls), who had initial observation dates between 1999 and 2002. RESULTS A Bayesian network model composed of age, sex, race, smoking history, and 8 comorbidity variables is able to predict COPD in the independent set of patients with an accuracy of 83.3%, computed as the area under the Receiver Operating Characteristic curve (AUROC). CONCLUSIONS Our results demonstrate that data extracted from electronic medical records can be used to create predictive models. With improvements in data extraction and inclusion of more variables, such models may prove to be clinically useful.
Collapse
Affiliation(s)
- Blanca E Himes
- Channing Laboratory, 181 Longwood Ave, Boston, MA 02115, USA.
| | | | | | | | | |
Collapse
|
16
|
Abstract
Individuals' dependence on nicotine, primarily through cigarette smoking, is a major source of morbidity and mortality worldwide. Many smokers attempt but fail to quit smoking, motivating researchers to identify the origins of this dependence. Because of the known heritability of nicotine-dependence phenotypes, considerable interest has been focused on discovering the genetic factors underpinning the trait. This goal, however, is not easily attained: no single factor is likely to explain any great proportion of dependence because nicotine dependence is thought to be a complex trait (i.e., the result of many interacting factors). Genomewide association studies are powerful tools in the search for the genomic bases of complex traits, and in this context, novel candidate genes have been identified through single nucleotide polymorphism (SNP) association analyses. Beyond association, however, genetic data can be used to generate predictive models of nicotine dependence. As expected in the context of a complex trait, individual SNPs fail to accurately predict nicotine dependence, demanding the use of multivariate models. Standard approaches, such as logistic regression, are unable to consider large numbers of SNPs given existing sample sizes. However, using Bayesian networks, one can overcome these limitations to generate a multivariate predictive model, which has markedly enhanced predictive accuracy on fitted values relative to that of individual SNPs. This approach, combined with the data being generated by genomewide association studies, promises to shed new light on the common, complex trait nicotine dependence.
Collapse
Affiliation(s)
- Rachel Badovinac Ramoni
- Department of Developmental Biology, Harvard School of Dental Medicine, Boston, Massachusetts, USA
| | | | | | | | | |
Collapse
|
17
|
Abstract
Cardioembolic stroke is a complex disease resulting from the interaction of numerous factors. Using data from Genes Affecting Stroke Risk and Outcome Study (GASROS), we show that a multivariate predictive model built using Bayesian networks is able to achieve a predictive accuracy of 86% on the fitted values as computed by the area under the receiver operating characteristic curve relative to that of the individual single nucleotide polymorphism with the highest prognostic performance (area under the receiver operating characteristic curve=60%).
Collapse
Affiliation(s)
- Rachel Badovinac Ramoni
- Harvard-Partners Center for Genetics and Genomics, Harvard Medical School, 77 Avenue Louis Pasteur, Boston, MA 02115, USA.
| | | | | | | | | |
Collapse
|
18
|
Himes BE, Kohane IS, Ramoni MF, Weiss ST. Characterization of patients who suffer asthma exacerbations using data extracted from electronic medical records. AMIA Annu Symp Proc 2008; 2008:308-12. [PMID: 18999057 PMCID: PMC2655929] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Received: 03/14/2008] [Revised: 07/16/2008] [Indexed: 05/27/2023]
Abstract
The increasing availability of electronic medical records offers opportunities to better characterize patient populations and create predictive tools to individualize health care. We determined which asthma patients suffer exacerbations using data extracted from electronic medical records of the Partners Healthcare System using Natural Language Processing tools from the "Informatics for Integrating Biology to the Bedside" center (i2b2). Univariable and multivariable analysis of data for 11,356 patients (1,394 cases, 9,962 controls) found that race, BMI, smoking history, and age at initial observation are predictors of asthma exacerbations. The area under the receiver operating characteristic curve (AUROC) corresponding to prediction of exacerbations in an independent group of 1,436 asthma patients (106 cases, 1,330 controls) is 0.67. Our findings are consistent with previous characterizations of asthma patients in epidemiological studies, and demonstrate that data extracted by natural language processing from electronic medical records is suitable for the characterization of patient populations.
Collapse
Affiliation(s)
- Blanca E Himes
- Harvard-MIT Division of Health Sciences and Technology, Cambridge, MA, USA
| | | | | | | |
Collapse
|
19
|
English SB, Shih SC, Ramoni MF, Smith LE, Butte AJ. Use of Bayesian networks to probabilistically model and improve the likelihood of validation of microarray findings by RT-PCR. J Biomed Inform 2008; 42:287-95. [PMID: 18790084 DOI: 10.1016/j.jbi.2008.08.009] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2008] [Revised: 05/31/2008] [Accepted: 08/17/2008] [Indexed: 12/16/2022]
Abstract
Though genome-wide technologies, such as microarrays, are widely used, data from these methods are considered noisy; there is still varied success in downstream biological validation. We report a method that increases the likelihood of successfully validating microarray findings using real time RT-PCR, including genes at low expression levels and with small differences. We use a Bayesian network to identify the most relevant sources of noise based on the successes and failures in validation for an initial set of selected genes, and then improve our subsequent selection of genes for validation based on eliminating these sources of noise. The network displays the significant sources of noise in an experiment, and scores the likelihood of validation for every gene. We show how the method can significantly increase validation success rates. In conclusion, in this study, we have successfully added a new automated step to determine the contributory sources of noise that determine successful or unsuccessful downstream biological validation.
Collapse
Affiliation(s)
- Sangeeta B English
- Stanford Center for Biomedical Informatics Research (BMIR), Stanford University School of Medicine, 251 Campus Drive, Stanford, CA 94305, USA.
| | | | | | | | | |
Collapse
|
20
|
Abstract
UNLABELLED Many bioinformatics solutions suffer from the lack of usable interface/platform from which results can be analyzed and visualized. Overcoming this hurdle would allow for more widespread dissemination of bioinformatics algorithms within the biological and medical communities. The algorithms should be accessible without extensive technical support or programming knowledge. Here, we propose a dynamic wizard platform that provides users with a Graphical User Interface (GUI) for most Java bioinformatics library toolkits. The application interface is generated in real-time based on the original source code. This platform lets developers focus on designing algorithms and biologists/physicians on testing hypotheses and analyzing results. AVAILABILITY The open source code can be downloaded from: http://bcl.med.harvard.edu/proteomics/proj/APBA/.
Collapse
Affiliation(s)
- Gil Alterovitz
- Children's Hospital Informatics Program at the Division of Health Sciences and Technology, Harvard University and Massachusetts Institute of Technology, USA.
| | | | | |
Collapse
|
21
|
Alterovitz G, Xiang M, Liu J, Chang A, Ramoni MF. System-wide peripheral biomarker discovery using information theory. Pac Symp Biocomput 2008:231-242. [PMID: 18229689] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 05/25/2023]
Abstract
The identification of reliable peripheral biomarkers for clinical diagnosis, patient prognosis, and biological functional studies would allow for access to biological information currently available only through invasive methods. Traditional approaches have so far considered aspects of tissues and biofluid markers independently. Here we introduce an information theoretic framework for biomarker discovery, integrating biofluid and tissue information. This allows us to identify tissue information in peripheral biofluids. We treat tissue-biofluid interactions as an information channel through functional space using 26 proteomes from 45 different sources to determine quantitatively the correspondence of each biofluid for specific tissues via relative entropy calculation of proteomes mapped onto phenotype, function, and drug space. Next, we identify candidate biofluids and biomarkers responsible for functional information transfer (p < 0.01). A total of 851 unique candidate biomarkers proxies were identified. The biomarkers were found to be significant functional tissue proxies compared to random proteins (p < 0.001). This proxy link is found to be further enhanced by filtering the biofluid proteins to include only significant tissue-biofluid information channels and is further validated by gene expression. Furthermore, many of the candidate biomarkers are novel and have yet to be explored. In addition to characterizing proteins and their interactions with a systemic perspective, our work can be used as a roadmap to guide biomedical investigation, from suggesting biofluids for study to constraining the search for biomarkers. This work has applications in disease screening, diagnosis, and protein function studies.
Collapse
Affiliation(s)
- Gil Alterovitz
- Division of Health Sciences and Technology, Harvard University/Massachusetts Institute of Technology, Cambridge, MA, USA
| | | | | | | | | |
Collapse
|
22
|
Schachter AD, Ramoni MF, Baio G, Roberts TG, Finkelstein SN. Economic evaluation of a Bayesian model to predict late-phase success of new chemical entities. Value Health 2007; 10:377-85. [PMID: 17888102 DOI: 10.1111/j.1524-4733.2007.00191.x] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/17/2023]
Abstract
OBJECTIVE To evaluate the economic impact of a Bayesian network model designed to predict clinical success of a new chemical entity (NCE) based on pre-phase III data. METHODS We trained our Bayesian network model on publicly accessible data on 503 NCEs, stratified by therapeutic class. We evaluated the sensitivity, specificity and accuracy of our model on an independent data set of 18 NCE-indication pairs, using prior probability data for the antineoplastic NCEs within the training set. We performed Monte Carlo simulations to evaluate the economic performance of our model relative to reported pharmaceutical industry performance, taking into account reported capitalized phase costs, cumulative revenues for a postapproval period of 7 years, and the range of possible false negative and true negative rates for terminated NCEs within the pharmaceutical industry. RESULTS Our model predicted outcomes on the independent validation set of oncology agents with 78% accuracy (80%sensitivity and 76% specificity). In comparison with the pharmaceutical industry's reported success rates, on average our model significantly reduced capitalized expenditures from $727 million/successful NCE to $444 million/successful NCE (P < 0.001), and significantly improved revenues from $347 million/phase III trial to $507 million/phase III trial (P < 0.001) during the first 7 years post launch. These results indicate that our model identified successful NCEs more efficiently than currently reported pharmaceutical industry performances. CONCLUSIONS Accurate prediction of NCE outcomes is computationally feasible, significantly increasing the proportion of successful NCEs, and likely eliminating ineffective and unsafe NCEs.
Collapse
|
23
|
Abstract
Biological and medical data have been growing exponentially over the past several years [1, 2]. In particular, proteomics has seen automation dramatically change the rate at which data are generated [3]. Analysis that systemically incorporates prior information is becoming essential to making inferences about the myriad, complex data [4-6]. A Bayesian approach can help capture such information and incorporate it seamlessly through a rigorous, probabilistic framework. This paper starts with a review of the background mathematics behind the Bayesian methodology: from parameter estimation to Bayesian networks. The article then goes on to discuss how emerging Bayesian approaches have already been successfully applied to research across proteomics, a field for which Bayesian methods are particularly well suited [7-9]. After reviewing the literature on the subject of Bayesian methods in biological contexts, the article discusses some of the recent applications in proteomics and emerging directions in the field.
Collapse
Affiliation(s)
- Gil Alterovitz
- Division of Health Sciences and Technology, Harvard University and Massachusetts Institute of Technology, Boston, MA, USA.
| | | | | | | |
Collapse
|
24
|
|
25
|
Ferrazzi F, Sebastiani P, Ramoni MF, Bellazzi R. Bayesian approaches to reverse engineer cellular systems: a simulation study on nonlinear Gaussian networks. BMC Bioinformatics 2007; 8 Suppl 5:S2. [PMID: 17570861 PMCID: PMC1892090 DOI: 10.1186/1471-2105-8-s5-s2] [Citation(s) in RCA: 36] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Reverse engineering cellular networks is currently one of the most challenging problems in systems biology. Dynamic Bayesian networks (DBNs) seem to be particularly suitable for inferring relationships between cellular variables from the analysis of time series measurements of mRNA or protein concentrations. As evaluating inference results on a real dataset is controversial, the use of simulated data has been proposed. However, DBN approaches that use continuous variables, thus avoiding the information loss associated with discretization, have not yet been extensively assessed, and most of the proposed approaches have dealt with linear Gaussian models. RESULTS We propose a generalization of dynamic Gaussian networks to accommodate nonlinear dependencies between variables. As a benchmark dataset to test the new approach, we used data from a mathematical model of cell cycle control in budding yeast that realistically reproduces the complexity of a cellular system. We evaluated the ability of the networks to describe the dynamics of cellular systems and their precision in reconstructing the true underlying causal relationships between variables. We also tested the robustness of the results by analyzing the effect of noise on the data, and the impact of a different sampling time. CONCLUSION The results confirmed that DBNs with Gaussian models can be effectively exploited for a first level analysis of data from complex cellular systems. The inferred models are parsimonious and have a satisfying goodness of fit. Furthermore, the networks not only offer a phenomenological description of the dynamics of cellular systems, but are also able to suggest hypotheses concerning the causal interactions between variables. The proposed nonlinear generalization of Gaussian models yielded models characterized by a slightly lower goodness of fit than the linear model, but a better ability to recover the true underlying connections between variables.
Collapse
Affiliation(s)
- Fulvia Ferrazzi
- Dipartimento di Informatica e Sistemistica, Università degli Studi di Pavia, via Ferrata 1, 27100 Pavia, Italy
- Children's Hospital Informatics Program, Division of Health Sciences and Technology, Harvard Medical School and Massachusetts Institute of Technology, 300 Longwood Avenue, Boston MA 02115, USA
| | - Paola Sebastiani
- Department of Biostatistics, Boston University School of Public Health, 715 Albany Street, Boston MA 02118, USA
| | - Marco F Ramoni
- Children's Hospital Informatics Program, Division of Health Sciences and Technology, Harvard Medical School and Massachusetts Institute of Technology, 300 Longwood Avenue, Boston MA 02115, USA
| | - Riccardo Bellazzi
- Dipartimento di Informatica e Sistemistica, Università degli Studi di Pavia, via Ferrata 1, 27100 Pavia, Italy
| |
Collapse
|
26
|
Funke BH, Brown AC, Ramoni MF, Regan ME, Baglieri C, Finn CT, Babcock M, Shprintzen RJ, Morrow BE, Kucherlapati R. A Novel, Single Nucleotide Polymorphism-Based Assay to Detect 22q11 Deletions. ACTA ACUST UNITED AC 2007; 11:91-100. [PMID: 17394398 DOI: 10.1089/gte.2006.0507] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/06/2022]
Abstract
Velocardiofacial syndrome, DiGeorge syndrome, and conotruncal anomaly face syndrome, now collectively referred to as 22q11deletion syndrome (22q11DS) are caused by microdeletions on chromosome 22q11. The great majority ( approximately 90%) of these deletions are 3 Mb in size. The remaining deleted patients have nested break-points resulting in overlapping regions of hemizygosity. Diagnostic testing for the disorder is traditionally done by fluorescent in situ hybridization (FISH) using probes located in the proximal half of the region common to all deletions. We developed a novel, high-resolution single-nucleotide polymorphism (SNP) genotyping assay to detect 22q11 deletions. We validated this assay using DNA from 110 nondeleted controls and 77 patients with 22q11DS that had previously been tested by FISH. The assay was 100% sensitive (all deletions were correctly identified). Our assay was also able to detect a case of segmental uniparental disomy at 22q11 that was not detected by the FISH assay. We used Bayesian networks to identify a set of 17 SNPs that are sufficient to ascertain unambiguously the deletion status of 22q11DS patients. Our SNP based assay is a highly accurate, sensitive, and specific method for the diagnosis of 22q11 deletion syndrome.
Collapse
Affiliation(s)
- Birgit H Funke
- Harvard Medical School-Partners Healthcare Center for Genetics and Genomics, Cambridge, MA 02139, USA.
| | | | | | | | | | | | | | | | | | | |
Collapse
|
27
|
|
28
|
Allocco DJ, Song Q, Gibbons GH, Ramoni MF, Kohane IS. Geography and genography: prediction of continental origin using randomly selected single nucleotide polymorphisms. BMC Genomics 2007; 8:68. [PMID: 17349058 PMCID: PMC1828730 DOI: 10.1186/1471-2164-8-68] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2006] [Accepted: 03/10/2007] [Indexed: 02/08/2023] Open
Abstract
BACKGROUND Recent studies have shown that when individuals are grouped on the basis of genetic similarity, group membership corresponds closely to continental origin. There has been considerable debate about the implications of these findings in the context of larger debates about race and the extent of genetic variation between groups. Some have argued that clustering according to continental origin demonstrates the existence of significant genetic differences between groups and that these differences may have important implications for differences in health and disease. Others argue that clustering according to continental origin requires the use of large amounts of genetic data or specifically chosen markers and is indicative only of very subtle genetic differences that are unlikely to have biomedical significance. RESULTS We used small numbers of randomly selected single nucleotide polymorphisms (SNPs) from the International HapMap Project to train naïve Bayes classifiers for prediction of ancestral continent of origin. Predictive accuracy was tested on two independent data sets. Genetically similar groups should be difficult to distinguish, especially if only a small number of genetic markers are used. The genetic differences between continentally defined groups are sufficiently large that one can accurately predict ancestral continent of origin using only a minute, randomly selected fraction of the genetic variation present in the human genome. Genotype data from only 50 random SNPs was sufficient to predict ancestral continent of origin in our primary test data set with an average accuracy of 95%. Genetic variations informative about ancestry were common and widely distributed throughout the genome. CONCLUSION Accurate characterization of ancestry is possible using small numbers of randomly selected SNPs. The results presented here show how investigators conducting genetic association studies can use small numbers of arbitrarily chosen SNPs to identify stratification in study subjects and avoid false positive genotype-phenotype associations. Our findings also demonstrate the extent of variation between continentally defined groups and argue strongly against the contention that genetic differences between groups are too small to have biomedical significance.
Collapse
Affiliation(s)
- Dominic J Allocco
- Children's Hospital Informatics Program at Harvard-MIT Division of Health Sciences and Technology, Boston, MA, USA
- Division of Cardiology, Beth Israel Deaconess Medical Center, Boston, MA, USA
| | - Qing Song
- Cardiovascular Research Institute, Morehouse School of Medicine, Atlanta, GA, USA
| | - Gary H Gibbons
- Cardiovascular Research Institute, Morehouse School of Medicine, Atlanta, GA, USA
| | - Marco F Ramoni
- Children's Hospital Informatics Program at Harvard-MIT Division of Health Sciences and Technology, Boston, MA, USA
- Harvard Partners Center for Genetics and Genomics, Boston, MA, USA
| | - Isaac S Kohane
- Children's Hospital Informatics Program at Harvard-MIT Division of Health Sciences and Technology, Boston, MA, USA
- Harvard Partners Center for Genetics and Genomics, Boston, MA, USA
| |
Collapse
|
29
|
Shirvani SM, Mookanamparambil L, Ramoni MF, Chin MT. Transcription factor CHF1/Hey2 regulates the global transcriptional response to platelet-derived growth factor in vascular smooth muscle cells. Physiol Genomics 2007; 30:61-8. [PMID: 17327490 DOI: 10.1152/physiolgenomics.00277.2006] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023] Open
Abstract
The cardiovascular restricted transcription factor CHF1/Hey2 has been previously shown to regulate the smooth muscle response to growth factors. To determine how CHF1/Hey2 affects the smooth muscle response to growth factors, we performed a genomic screen for transcripts that are differentially expressed in wild-type and knockout smooth muscle cells after stimulation with platelet-derived growth factor. We screened 45,101 probes representing >39,000 transcripts derived from at least 34,000 genes, at eight different time points. We analyzed the expression data utilizing an algorithm based on Bayesian statistics to derive the best polynomial clustering model to fit the expression data. We found that in a total of 9,827 transcripts the normalized ratio of knockout to wild-type expression diverged more than threefold from baseline in at least one time point, and these transcripts separated into 17 distinct clusters. Further analysis of each cluster revealed distinct alterations in gene expression patterns for immediate early genes, transcription factors, matrix metalloproteinases, signaling molecules, and other molecules important in vascular biology. Our findings demonstrate that CHF1/Hey2 profoundly affects vascular smooth muscle phenotype by altering both the absolute expression level of a variety of genes and the kinetics of growth factor-induced gene expression.
Collapse
MESH Headings
- Animals
- Basic Helix-Loop-Helix Transcription Factors/genetics
- Cells, Cultured
- Mice
- Mice, Knockout
- Muscle, Smooth, Vascular/cytology
- Muscle, Smooth, Vascular/drug effects
- Muscle, Smooth, Vascular/metabolism
- Mutation
- Myocytes, Smooth Muscle/cytology
- Myocytes, Smooth Muscle/drug effects
- Myocytes, Smooth Muscle/metabolism
- Oligonucleotide Array Sequence Analysis
- Platelet-Derived Growth Factor/pharmacology
- Repressor Proteins/genetics
- Reverse Transcriptase Polymerase Chain Reaction
- Transcription, Genetic/drug effects
Collapse
Affiliation(s)
- Shervin M Shirvani
- Vascular Medicine Research, Cardiovascular Division, Brigham and Women's Hospital and Harvard Medical School, Boston, Massachusetts, USA
| | | | | | | |
Collapse
|
30
|
Aivado M, Spentzos D, Germing U, Alterovitz G, Meng XY, Grall F, Giagounidis AAN, Klement G, Steidl U, Otu HH, Czibere A, Prall WC, Iking-Konert C, Shayne M, Ramoni MF, Gattermann N, Haas R, Mitsiades CS, Fung ET, Libermann TA. Serum proteome profiling detects myelodysplastic syndromes and identifies CXC chemokine ligands 4 and 7 as markers for advanced disease. Proc Natl Acad Sci U S A 2007; 104:1307-12. [PMID: 17220270 PMCID: PMC1783137 DOI: 10.1073/pnas.0610330104] [Citation(s) in RCA: 64] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Myelodysplastic syndromes (MDS) are among the most frequent hematologic malignancies. Patients have a short survival and often progress to acute myeloid leukemia. The diagnosis of MDS can be difficult; there is a paucity of molecular markers, and the pathophysiology is largely unknown. Therefore, we conducted a multicenter study investigating whether serum proteome profiling may serve as a noninvasive platform to discover novel molecular markers for MDS. We generated serum proteome profiles from 218 individuals by MS and identified a profile that distinguishes MDS from non-MDS cytopenias in a learning sample set. This profile was validated by testing its ability to predict MDS in a first independent validation set and a second, prospectively collected, independent validation set run 5 months apart. Accuracy was 80.5% in the first and 79.0% in the second validation set. Peptide mass fingerprinting and quadrupole TOF MS identified two differential proteins: CXC chemokine ligands 4 (CXCL4) and 7 (CXCL7), both of which had significantly decreased serum levels in MDS, as confirmed with independent antibody assays. Western blot analyses of platelet lysates for these two platelet-derived molecules revealed a lack of CXCL4 and CXCL7 in MDS. Subtype analyses revealed that these two proteins have decreased serum levels in advanced MDS, suggesting the possibility of a concerted disturbance of transcription or translation of these chemokines in advanced MDS.
Collapse
|
31
|
Abstract
Gene Ontology (GO) has been widely used to infer functional significance associated with sets of genes in order to automate discoveries within large-scale genetic studies. A level in GO's direct acyclic graph structure is often assumed to be indicative of its terms' specificities, although other work has suggested this assumption does not hold. Unfortunately, quantitative analysis of biological functions based on nodes at the same level (as is common in gene enrichment analysis tools) can lead to incorrect conclusions as well as missed discoveries due to inefficient use of available information. This paper addresses these using an informational theoretic approach encoded in the GO Partition Database that guarantees to maximize information for gene enrichment analysis. The GO Partition Database was designed to feature ontology partitions with GO terms of similar specificity. The GO partitions comprise varying numbers of nodes and present relevant information theoretic statistics, so researchers can choose to analyze datasets at arbitrary levels of specificity. The GO Partition Database, featuring GO partition sets for functional analysis of genes from human and 10 other commonly studied organisms with a total of 131 972 genes, is available on the internet at: . The site also includes an online tutorial.
Collapse
Affiliation(s)
- Gil Alterovitz
- Division of Health Sciences and Technology Harvard Medical School and Massachusetts Institute of Technology, Boston, MA, USA.
| | | | | | | |
Collapse
|
32
|
Abstract
The severe acute respiratory syndrome (SARS) epidemic, the growing fear of an influenza pandemic and the recent shortage of flu vaccine highlight the need for surveillance systems able to provide early, quantitative predictions of epidemic events. We use dynamic Bayesian networks to discover the interplay among four data sources that are monitored for influenza surveillance. By integrating these different data sources into a dynamic model, we identify in children and infants presenting to the pediatric emergency department with respiratory syndromes an early indicator of impending influenza morbidity and mortality. Our findings show the importance of modelling the complex dynamics of data collected for influenza surveillance, and suggest that dynamic Bayesian networks could be suitable modelling tools for developing epidemic surveillance systems.
Collapse
Affiliation(s)
- Paola Sebastiani
- Department of Biostatistics, Boston University, Boston, MA, USA.
| | | | | | | | | |
Collapse
|
33
|
Frank NY, Kho AT, Schatton T, Murphy GF, Molloy MJ, Zhan Q, Ramoni MF, Frank MH, Kohane IS, Gussoni E. Regulation of myogenic progenitor proliferation in human fetal skeletal muscle by BMP4 and its antagonist Gremlin. ACTA ACUST UNITED AC 2006; 175:99-110. [PMID: 17015616 PMCID: PMC2064502 DOI: 10.1083/jcb.200511036] [Citation(s) in RCA: 53] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2023]
Abstract
Skeletal muscle side population (SP) cells are thought to be “stem”-like cells. Despite reports confirming the ability of muscle SP cells to give rise to differentiated progeny in vitro and in vivo, the molecular mechanisms defining their phenotype remain unclear. In this study, gene expression analyses of human fetal skeletal muscle demonstrate that bone morphogenetic protein 4 (BMP4) is highly expressed in SP cells but not in main population (MP) mononuclear muscle-derived cells. Functional studies revealed that BMP4 specifically induces proliferation of BMP receptor 1a–positive MP cells but has no effect on SP cells, which are BMPR1a-negative. In contrast, the BMP4 antagonist Gremlin, specifically up-regulated in MP cells, counteracts the stimulatory effects of BMP4 and inhibits proliferation of BMPR1a-positive muscle cells. In vivo, BMP4-positive cells can be found in the proximity of BMPR1a-positive cells in the interstitial spaces between myofibers. Gremlin is expressed by mature myofibers and interstitial cells, which are separate from BMP4-expressing cells. Together, these studies propose that BMP4 and Gremlin, which are highly expressed by human fetal skeletal muscle SP and MP cells, respectively, are regulators of myogenic progenitor proliferation.
Collapse
Affiliation(s)
- Natasha Y Frank
- Division of Genetics, Children's Hospital Boston, Boston, MA 02115, USA
| | | | | | | | | | | | | | | | | | | |
Collapse
|
34
|
Abstract
BACKGROUND There are no specific bacterial profiles or diagnostic tests capable of identifying refractory periodontitis patients before a treatment regimen is initiated. Therefore, in this high-risk cohort of patients who do not respond appropriately, host factors that might be partly under genetic control may play a crucial role in their susceptibility. Specifically, we tested the hypothesis that patients with refractory periodontitis have multiple upregulated and/or downregulated genes that might be important in influencing clinical risk. METHODS Oral subepithelial connective tissues were harvested aseptically from seven refractory periodontitis and seven periodontally well-maintained patients. An RNA isolation kit was used to isolate total RNA from tissue samples that had been stabilized in the RNA stabilizing reagent. The isolated total RNA was then subjected to gene expression profiling using the microarray to measure gene expression levels. The retrieved data were analyzed with a computer program for the differential analysis of gene expression microarray experiments. In addition, real-time polymerase chain reaction (PCR) analysis was performed on selected samples to confirm the microarray data's gene expression patterns. RESULTS A total of 68 upregulated and six downregulated genes were identified that were differentially expressed at least two-fold out of 22,283 genes we analyzed. The selected model provided a 93% intrinsic validation along with a 93% extrinsic validation. To validate the microarray data, five upregulated genes (lactotransferrin [LTF], matrix metalloproteinase-1 [MMP-1], MMP-3, interferon induced-15 [IFI-15], and Homo sapiens hypothetical protein MGC5566) and two downregulated genes (keratin 2A [KRT2A] and desmocollin-1 [DSC-1]) were randomly selected for further analysis by real-time PCR. The relative RNA expression level of these genes measured by real-time PCR was similar to those measured by microarrays. CONCLUSION The combined use of microarray technology with the computer program for the differential analysis of gene expression microarray experiments provided a set of candidate genes that may serve as novel therapeutic intervention points and improved diagnostic and screening procedures for high-risk individuals.
Collapse
Affiliation(s)
- David M Kim
- Department of Oral Medicine, Infection and Immunity, Harvard School of Dental Medicine, Boston, MA 02115, USA
| | | | | | | |
Collapse
|
35
|
Cerletti M, Molloy MJ, Tomczak KK, Yoon S, Ramoni MF, Kho AT, Beggs AH, Gussoni E. Melanoma cell adhesion molecule is a novel marker for human fetal myogenic cells and affects myoblast fusion. J Cell Sci 2006; 119:3117-27. [PMID: 16835268 PMCID: PMC1578761 DOI: 10.1242/jcs.03056] [Citation(s) in RCA: 34] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/16/2023] Open
Abstract
Myoblast fusion is a highly regulated process that is important during muscle development and myofiber repair and is also likely to play a key role in the incorporation of donor cells in myofibers for cell-based therapy. Although several proteins involved in muscle cell fusion in Drosophila are known, less information is available on the regulation of this process in vertebrates, including humans. To identify proteins that are regulated during fusion of human myoblasts, microarray studies were performed on samples obtained from human fetal skeletal muscle of seven individuals. Primary muscle cells were isolated, expanded, induced to fuse in vitro, and gene expression comparisons were performed between myoblasts and early or late myotubes. Among the regulated genes, melanoma cell adhesion molecule (M-CAM) was found to be significantly downregulated during human fetal muscle cell fusion. M-CAM expression was confirmed on activated myoblasts, both in vitro and in vivo, and on myoendothelial cells (M-CAM(+) CD31(+)), which were positive for the myogenic markers desmin and MyoD. Lastly, in vitro functional studies using M-CAM RNA knockdown demonstrated that inhibition of M-CAM expression enhances myoblast fusion. These studies identify M-CAM as a novel marker for myogenic progenitors in human fetal muscle and confirm that downregulation of this protein promotes myoblast fusion.
Collapse
Affiliation(s)
| | | | | | | | - Marco F. Ramoni
- Bioinformatics Program, Children’s Hospital Boston, 320 Longwood Avenue, Boston, MA 02115, USA
| | - Alvin T. Kho
- Bioinformatics Program, Children’s Hospital Boston, 320 Longwood Avenue, Boston, MA 02115, USA
| | | | - Emanuela Gussoni
- Division of Genetics and Program in Genomics and
- Author for correspondence (e-mail: )
| |
Collapse
|
36
|
Abstract
The speed of the human genome project (Lander, E. S., Linton, L. M., Birren, B., Nusbaum, C. et al., Nature 2001, 409, 860-921) was made possible, in part, by developments in automation of sequencing technologies. Before these technologies, sequencing was a laborious, expensive, and personnel-intensive task. Similarly, automation and robotics are changing the field of proteomics today. Proteomics is defined as the effort to understand and characterize proteins in the categories of structure, function and interaction (Englbrecht, C. C., Facius, A., Comb. Chem. High Throughput Screen. 2005, 8, 705-715). As such, this field nicely lends itself to automation technologies since these methods often require large economies of scale in order to achieve cost and time-saving benefits. This article describes some of the technologies and methods being applied in proteomics in order to facilitate automation within the field as well as in linking proteomics-based information with other related research areas.
Collapse
Affiliation(s)
- Gil Alterovitz
- Division of Health Sciences and Technology, HST, Harvard Medical School and Massachusetts Institute of Technology, Boston, MA 02115, USA.
| | | | | | | |
Collapse
|
37
|
Traum AZ, Wells MP, Aivado M, Libermann TA, Ramoni MF, Schachter AD. SELDI-TOF MS of quadruplicate urine and serum samples to evaluate changes related to storage conditions. Proteomics 2006; 6:1676-80. [PMID: 16447157 PMCID: PMC1447593 DOI: 10.1002/pmic.200500174] [Citation(s) in RCA: 42] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Proteomic profiling with SELDI-TOF MS has facilitated the discovery of disease-specific protein profiles. However, multicenter studies are often hindered by the logistics required for prompt deep-freezing of samples in liquid nitrogen or dry ice within the clinic setting prior to shipping. We report high concordance between MS profiles within sets of quadruplicate split urine and serum samples deep-frozen at 0, 2, 6, and 24 h after sample collection. Gage R&R results confirm that deep-freezing times are not a statistically significant source of SELDI-TOF MS variability for either blood or urine.
Collapse
Affiliation(s)
- Avram Z. Traum
- Division of Nephrology, Children’s Hospital Boston, Boston, MA, USA
- Harvard Medical School, Boston, MA, USA
| | - Meghan P. Wells
- BIDMC Genomics Center and Dana Farber/Harvard Cancer Center Cancer Proteomics Core, Beth Israel Deaconess Medical Center, Boston, MA, USA
- Harvard Medical School, Boston, MA, USA
| | - Manuel Aivado
- BIDMC Genomics Center and Dana Farber/Harvard Cancer Center Cancer Proteomics Core, Beth Israel Deaconess Medical Center, Boston, MA, USA
- Harvard Medical School, Boston, MA, USA
| | - Towia A. Libermann
- BIDMC Genomics Center and Dana Farber/Harvard Cancer Center Cancer Proteomics Core, Beth Israel Deaconess Medical Center, Boston, MA, USA
- Harvard Medical School, Boston, MA, USA
| | - Marco F. Ramoni
- Children’s Hospital Informatics Program at Harvard-MIT Health Sciences and Technology, Boston, MA, USA
- Harvard Medical School, Boston, MA, USA
| | - Asher D. Schachter
- Division of Nephrology, Children’s Hospital Boston, Boston, MA, USA
- Children’s Hospital Informatics Program at Harvard-MIT Health Sciences and Technology, Boston, MA, USA
- Harvard Medical School, Boston, MA, USA
- Correspondence: Dr. Asher D. Schachter, MD, MMSc, MS, Division of Nephrology and Children’s Hospital Informatics Program, 300 Longwood Avenue, Boston, MA, USA, E-mail:, Fax: +1-617-730-0569
| |
Collapse
|
38
|
Sebastiani P, Mandl KD, Szolovits P, Kohane IS, Ramoni MF. A Bayesian dynamic model for influenza surveillance. Stat Med 2006. [DOI: 10.1002/sim.2564] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
|
39
|
Alterovitz G, Ramoni MF. Discovering biological guilds through topological abstraction. AMIA Annu Symp Proc 2006; 2006:1-5. [PMID: 17238291 PMCID: PMC1839326] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Subscribe] [Scholar Register] [Indexed: 05/13/2023]
Abstract
High-throughput generation of new types of relational biological datasets is creating a demand for methods to provide insights into their complexity. Such networks are often too large to interpret visually and too complicated to be explained solely based on local topological properties. One way to try to make sense of such complex networks would be to transform them into discernable abstracts, or summaries, of the original networks. Then, important components could become more readily visible. This work presents such an approach for understanding networks via abstraction of global network connectivity using compression. This made possible the discovery of a new type of topological class, referred to herein as a guild, that captures global connectivity similarity. Lastly, the correspondence of these guilds to biological function is validated via an E. Coli gene regulation network. This resulted in biological findings that could not be derived from local topology of the original network.
Collapse
Affiliation(s)
- Gil Alterovitz
- Division of Health Science and Technology, Massachusetts Institute of Technology/HarvardUniversity, Cambridge, MA., USA
| | | |
Collapse
|
40
|
|
41
|
Wang L, Ramoni MF, Mandl KD, Sebastiani P. Factors affecting automated syndromic surveillance. Artif Intell Med 2005; 34:269-78. [PMID: 16023563 DOI: 10.1016/j.artmed.2004.11.002] [Citation(s) in RCA: 18] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2004] [Revised: 11/08/2004] [Accepted: 11/11/2004] [Indexed: 10/25/2022]
Abstract
OBJECTIVE The increased threat of bioterroristic attacks and epidemic events requires the development of accurate and timely outbreak detection systems for early identification of anomalies in public health data. MATERIAL AND METHODS We propose an automated outbreak detection system based on syndromic data. This system uses an autoregressive model with seasonal components to monitor, online, the daily counts of chief complaints for respiratory syndromes at the emergency department of two major metropolitan hospitals. We evaluate this system by estimating the false positive rate in real data under the assumption that there were no outbreaks of disease, and the true positive rate in real baseline data in which we injected stochastically simulated outbreaks of different shape and size. We then use directed graphical models to account for the effect of exogenous factors on the detection performance of the system. RESULTS Our study shows that for a week-long outbreak, our model has an overall 84.8% true detection accuracy across all shapes of outbreaks, while the outbreak size influences the earliness to detection. The false and true positive rates are also associated with the exogenous factors and knowledge about these factors can help to improve the detection accuracy. CONCLUSION This study suggests that the integration of multiple data sources can significantly improve the detection accuracy of syndromic surveillance systems.
Collapse
Affiliation(s)
- Ling Wang
- Department of Biostatistics, Boston University School of Public Health, 715 Albany Street, Boston, MA 02118, USA
| | | | | | | |
Collapse
|
42
|
Sebastiani P, Ramoni MF, Nolan V, Baldwin CT, Steinberg MH. Genetic dissection and prognostic modeling of overt stroke in sickle cell anemia. Nat Genet 2005; 37:435-40. [PMID: 15778708 PMCID: PMC2896308 DOI: 10.1038/ng1533] [Citation(s) in RCA: 248] [Impact Index Per Article: 13.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2004] [Accepted: 02/08/2005] [Indexed: 01/01/2023]
Abstract
Sickle cell anemia (SCA) is a paradigmatic single gene disorder caused by homozygosity with respect to a unique mutation at the beta-globin locus. SCA is phenotypically complex, with different clinical courses ranging from early childhood mortality to a virtually unrecognized condition. Overt stroke is a severe complication affecting 6-8% of individuals with SCA. Modifier genes might interact to determine the susceptibility to stroke, but such genes have not yet been identified. Using Bayesian networks, we analyzed 108 SNPs in 39 candidate genes in 1,398 individuals with SCA. We found that 31 SNPs in 12 genes interact with fetal hemoglobin to modulate the risk of stroke. This network of interactions includes three genes in the TGF-beta pathway and SELP, which is associated with stroke in the general population. We validated this model in a different population by predicting the occurrence of stroke in 114 individuals with 98.2% accuracy.
Collapse
Affiliation(s)
- Paola Sebastiani
- Department of Biostatistics, Boston University School of Public Health, Boston, Massachusetts 02118, USA
| | | | | | | | | |
Collapse
|
43
|
Aivado M, Spentzos D, Alterovitz G, Otu HH, Grall F, Giagounidis AAN, Wells M, Cho JY, Germing U, Czibere A, Prall WC, Porter C, Ramoni MF, Libermann TA. Optimization and evaluation of surface-enhanced laser desorption/ionization time-of-flight mass spectrometry (SELDI-TOF MS) with reversed-phase protein arrays for protein profiling. Clin Chem Lab Med 2005; 43:133-40. [PMID: 15843205 DOI: 10.1515/cclm.2005.022] [Citation(s) in RCA: 38] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
AbstractSurface-enhanced laser desorption/ionization (SELDI) time-of-flight mass spectrometry with protein arrays has facilitated the discovery of disease-specific protein profiles in serum. Such results raise hopes that protein profiles may become a powerful diagnostic tool. To this end, reliable and reproducible protein profiles need to be generated from many samples, accurate mass peak heights are necessary, and the experimental variation of the profiles must be known. We adapted the entire processing of protein arrays to a robotics system, thus improving the intra-assay coefficients of variation (CVs) from 45.1% to 27.8% (p<0.001). In addition, we assessed up to 16 technical replicates, and demonstrated that analysis of 2–4 replicates significantly increases the reliability of the protein profiles. A recent report on limited long-term reproducibility seemed to concord with our initial inter-assay CVs, which varied widely and reached up to 56.7%. However, we discovered that the inter-assay CV is strongly dependent on the drying time before application of the matrix molecule. Therefore, we devised a standardized drying process and demonstrated that our optimized SELDI procedure generates reliable and long-term reproducible protein profiles with CVs ranging from 25.7% to 32.6%, depending on the signal-to-noise ratio threshold used.
Collapse
Affiliation(s)
- Manuel Aivado
- BIDMC Genomics Center and Bioinformatics Core, Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, MA 02115, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
44
|
Spentzos D, Levine DA, Ramoni MF, Joseph M, Gu X, Boyd J, Libermann TA, Cannistra SA. Gene Expression Signature With Independent Prognostic Significance in Epithelial Ovarian Cancer. J Clin Oncol 2004; 22:4700-10. [PMID: 15505275 DOI: 10.1200/jco.2004.04.070] [Citation(s) in RCA: 181] [Impact Index Per Article: 9.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022] Open
Abstract
Purpose Currently available clinical and molecular prognostic factors provide an imperfect assessment of prognosis for patients with epithelial ovarian cancer (EOC). In this study, we investigated whether tumor transcription profiling could be used as a prognostic tool in this disease. Methods Tumor tissue from 68 patients was profiled with oligonucleotide microarrays. Samples were randomly split into training and validation sets. A three-step training procedure was used to discover a statistically significant Kaplan-Meier split in the training set. The resultant prognostic signature was then tested on an independent validation set for confirmation. Results In the training set, a 115-gene signature referred to as the Ovarian Cancer Prognostic Profile (OCPP) was identified. When applied to the validation set, the OCPP distinguished between patients with unfavorable and favorable overall survival (median, 30 months v not yet reached, respectively; log-rank P = .004). The signature maintained independent prognostic value in multivariate analysis, controlling for other known prognostic factors such as age, stage, grade, and debulking status. The hazard ratio for death in the unfavorable OCPP group was 4.8 (P = .021 by Cox proportional hazards analysis). Conclusion The OCPP is an independent prognostic determinant of outcome in EOC. The use of gene profiling may ultimately permit identification of EOC patients appropriate for investigational treatment approaches, based on a low likelihood of achieving prolonged survival with standard first-line platinum-based therapy.
Collapse
MESH Headings
- Adult
- Aged
- Biomarkers, Tumor/metabolism
- Biopsy, Needle
- Chemotherapy, Adjuvant
- Combined Modality Therapy
- DNA, Complementary/analysis
- Female
- Gene Expression Regulation, Neoplastic
- Genetic Predisposition to Disease
- Humans
- Immunohistochemistry
- Middle Aged
- Neoplasm Staging
- Neoplasms, Glandular and Epithelial/genetics
- Neoplasms, Glandular and Epithelial/mortality
- Neoplasms, Glandular and Epithelial/pathology
- Neoplasms, Glandular and Epithelial/therapy
- Ovarian Neoplasms/genetics
- Ovarian Neoplasms/mortality
- Ovarian Neoplasms/pathology
- Ovarian Neoplasms/therapy
- Ovariectomy/methods
- Predictive Value of Tests
- Prognosis
- RNA, Neoplasm/analysis
- Risk Assessment
- Sensitivity and Specificity
- Survival Analysis
- Treatment Outcome
Collapse
Affiliation(s)
- Dimitrios Spentzos
- Program of Gynecologic Medical Oncology, Beth Israel Deaconess Medical Center, Boston, MA 02215, USA
| | | | | | | | | | | | | | | |
Collapse
|
45
|
Abstract
With the completion of the Human Genome Project and the growing computational challenges presented by the large amount of genomic data available today, machine learning is becoming an integral part of biomedical research and plays a major role in the emerging fields of bioinformatics and computational biology. This situation offers unparalleled opportunities and unprecedented challenges to machine learning research in general and to Bayesian learning methods in particular. This paper outlines some of the opportunities and the challenges of this endeavor, it describes where the efforts of "cracking the code of life" can most benefit from a Bayesian approach, and it identifies some potential applications of Bayesian machine learning methods to the genomic analysis of squamous cell carcinomas of the head and neck.
Collapse
Affiliation(s)
- P Sebastiani
- Department of Biostatistics, Boston University School of Public Health, MA USA
| | | | | |
Collapse
|
46
|
Abstract
The success rate of association studies can be improved by selecting better genetic markers for genotyping or by providing better leads for identifying pathogenic single nucleotide polymorphisms (SNPs) in the regions of linkage disequilibrium with positive disease associations. We have developed a novel algorithm to predict pathogenic single amino acid changes, either nonsynonymous SNPs (nsSNPs) or missense mutations, in conserved protein domains. Using a Bayesian framework, we found that the probability of a microbial missense mutation causing a significant change in phenotype depended on how much difference it made in several phylogenetic, biochemical, and structural features related to the single amino acid substitution. We tested our model on pathogenic allelic variants (missense mutations or nsSNPs) included in OMIM, and on the other nsSNPs in the same genes (from dbSNP) as the nonpathogenic variants. As a result, our model predicted pathogenic variants with a 10% false-positive rate. The high specificity of our prediction algorithm should make it valuable in genetic association studies aimed at identifying pathogenic SNPs.
Collapse
Affiliation(s)
- Zhaohui Cai
- Children's Hospital Boston, Harvard Medical School, Boston, Massachusetts, USA
| | | | | | | | | | | |
Collapse
|
47
|
Tomczak KK, Marinescu VD, Ramoni MF, Sanoudou D, Montanaro F, Han M, Kunkel LM, Kohane IS, Beggs AH. Expression profiling and identification of novel genes involved in myogenic differentiation. FASEB J 2003; 18:403-5. [PMID: 14688207 DOI: 10.1096/fj.03-0568fje] [Citation(s) in RCA: 149] [Impact Index Per Article: 7.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
Skeletal muscle differentiation is a complex, highly coordinated process that relies on precise temporal gene expression patterns. To better understand this cascade of transcriptional events, we used expression profiling to analyze gene expression in a 12-day time course of differentiating C2C12 myoblasts. Cluster analysis specific for time-ordered microarray experiments classified 2895 genes and ESTs with variable expression levels between proliferating and differentiating cells into 22 clusters with distinct expression patterns during myogenesis. Expression patterns for several known and novel genes were independently confirmed by real-time quantitative RT-PCR and/or Western blotting and immunofluorescence. MyoD and MEF family members exhibited unique expression kinetics that were highly coordinated with cell-cycle withdrawal regulators. Among genes with peak expression levels during cell cycle withdrawal were Vcam1, Itgb3, Itga5, Vcl, as well as Ptger4, a gene not previously associated with the process of myogenesis. One interesting uncharacterized transcript that is highly induced during myogenesis encodes several immunoglobulin repeats with sequence similarity to titin, a large sarcomeric protein. These data sets identify many additional uncharacterized transcripts that may play important functions in muscle cell proliferation and differentiation and provide a baseline for comparison with C2C12 cells expressing various mutant genes involved in myopathic disorders.
Collapse
Affiliation(s)
- Kinga K Tomczak
- Genetics Division, Children's Hospital, Boston, Massachusetts 02115, USA
| | | | | | | | | | | | | | | | | |
Collapse
|
48
|
Abstract
The high frequency of single-nucleotide polymorphisms (SNPs) in the human genome presents an unparalleled opportunity to track down the genetic basis of common diseases. At the same time, the sheer number of SNPs also makes unfeasible genome-wide disease association studies. The haplotypic nature of the human genome, however, lends itself to the selection of a parsimonious set of SNPs, called haplotype tagging SNPs (htSNPs), able to distinguish the haplotypic variations in a population. Current approaches rely on statistical analysis of transmission rates to identify htSNPs. In contrast to these approximate methods, this contribution describes an exact, analytical, and lossless method, called BEST (Best Enumeration of SNP Tags), able to identify the minimum set of SNPs tagging an arbitrary set of haplotypes from either pedigree or independent samples. Our results confirm that a small proportion of SNPs is sufficient to capture the haplotypic variations in a population and that this proportion decreases exponentially as the haplotype length increases. We used BEST to tag the haplotypes of 105 genes in an African-American and a European-American sample. An interesting finding of this analysis is that the vast majority (95%) of the htSNPs in the European-American sample is a subset of the htSNPs of the African-American sample. This result seems to provide further evidence that a severe bottleneck occurred during the founding of Europe and the conjectured "Out of Africa" event.
Collapse
Affiliation(s)
- Paola Sebastiani
- Department of Biostatistics, Boston University School of Public Health, Boston, MA 02118, USA
| | | | | | | | | | | |
Collapse
|
49
|
|
50
|
Abstract
This article presents a Bayesian method for model-based clustering of gene expression dynamics. The method represents gene-expression dynamics as autoregressive equations and uses an agglomerative procedure to search for the most probable set of clusters given the available data. The main contributions of this approach are the ability to take into account the dynamic nature of gene expression time series during clustering and a principled way to identify the number of distinct clusters. As the number of possible clustering models grows exponentially with the number of observed time series, we have devised a distance-based heuristic search procedure able to render the search process feasible. In this way, the method retains the important visualization capability of traditional distance-based clustering and acquires an independent, principled measure to decide when two series are different enough to belong to different clusters. The reliance of this method on an explicit statistical representation of gene expression dynamics makes it possible to use standard statistical techniques to assess the goodness of fit of the resulting model and validate the underlying assumptions. A set of gene-expression time series, collected to study the response of human fibroblasts to serum, is used to identify the properties of the method.
Collapse
Affiliation(s)
- Marco F Ramoni
- Children's Hospital Informatics Program, Harvard Medical School, 300 Longwood Avenue, Boston, MA 02115, USA
| | | | | |
Collapse
|