101
|
Gandhi T, Fusetti F, Wiederhold E, Breitling R, Poolman B, Permentier HP. Apex Peptide Elution Chain Selection: A New Strategy for Selecting Precursors in 2D-LC−MALDI-TOF/TOF Experiments on Complex Biological Samples. J Proteome Res 2010; 9:5922-8. [DOI: 10.1021/pr1006944] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
102
|
t'Kindt R, Jankevics A, Scheltema RA, Zheng L, Watson DG, Dujardin JC, Breitling R, Coombs GH, Decuypere S. Towards an unbiased metabolic profiling of protozoan parasites: optimisation of a Leishmania sampling protocol for HILIC-orbitrap analysis. Anal Bioanal Chem 2010; 398:2059-69. [PMID: 20824428 DOI: 10.1007/s00216-010-4139-0] [Citation(s) in RCA: 46] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2010] [Revised: 08/13/2010] [Accepted: 08/17/2010] [Indexed: 01/12/2023]
Abstract
Comparative metabolomics of Leishmania species requires the simultaneous identification and quantification of a large number of intracellular metabolites. Here, we describe the optimisation of a comprehensive metabolite extraction protocol for Leishmania parasites and the subsequent optimisation of the analytical approach, consisting of hydrophilic interaction liquid chromatography coupled to LTQ-orbitrap mass spectrometry. The final optimised protocol starts with a rapid quenching of parasite cells to 0 °C, followed by a triplicate washing step in phosphate-buffered saline. The intracellular metabolome of 4 × 10(7) parasites is then extracted in cold chloroform/methanol/water 20/60/20 (v/v/v) for 1 h at 4 °C, resulting in both cell disruption and comprehensive metabolite dissolution. Our developed metabolomics platform can detect approximately 20% of the predicted Leishmania metabolome in a single experiment in positive and negative ionisation mode.
Collapse
|
103
|
Medema MH, Trefzer A, Kovalchuk A, van den Berg M, Müller U, Heijne W, Wu L, Alam MT, Ronning CM, Nierman WC, Bovenberg RAL, Breitling R, Takano E. The sequence of a 1.8-mb bacterial linear plasmid reveals a rich evolutionary reservoir of secondary metabolic pathways. Genome Biol Evol 2010; 2:212-24. [PMID: 20624727 PMCID: PMC2997539 DOI: 10.1093/gbe/evq013] [Citation(s) in RCA: 156] [Impact Index Per Article: 11.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022] Open
Abstract
Plasmids are mobile genetic elements that play a key role in the evolution of bacteria by mediating genome plasticity and lateral transfer of useful genetic information. Although originally considered to be exclusively circular, linear plasmids have also been identified in certain bacterial phyla, notably the actinomycetes. In some cases, linear plasmids engage with chromosomes in an intricate evolutionary interplay, facilitating the emergence of new genome configurations by transfer and recombination or plasmid integration. Genome sequencing of Streptomyces clavuligerus ATCC 27064, a Gram-positive soil bacterium known for its production of a diverse array of biotechnologically important secondary metabolites, revealed a giant linear plasmid of 1.8 Mb in length. This megaplasmid (pSCL4) is one of the largest plasmids ever identified and the largest linear plasmid to be sequenced. It contains more than 20% of the putative protein-coding genes of the species, but none of these is predicted to be essential for primary metabolism. Instead, the plasmid is densely packed with an exceptionally large number of gene clusters for the potential production of secondary metabolites, including a large number of putative antibiotics, such as staurosporine, moenomycin, β-lactams, and enediynes. Interestingly, cross-regulation occurs between chromosomal and plasmid-encoded genes. Several factors suggest that the megaplasmid came into existence through recombination of a smaller plasmid with the arms of the main chromosome. Phylogenetic analysis indicates that heavy traffic of genetic information between Streptomyces plasmids and chromosomes may facilitate the rapid evolution of secondary metabolite repertoires in these bacteria.
Collapse
|
104
|
Zhang M, Schafer WR, Breitling R. A circuit model of the temporal pattern generator of Caenorhabditis egg-laying behavior. BMC SYSTEMS BIOLOGY 2010; 4:81. [PMID: 20529297 PMCID: PMC2887794 DOI: 10.1186/1752-0509-4-81] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/08/2009] [Accepted: 06/07/2010] [Indexed: 11/24/2022]
Abstract
Background Egg-laying behavior in the nematode C. elegans displays a distinct clustered temporal pattern: egg-laying events occur primarily in bursts or active phases, separated by inactive phases during which eggs are retained. The onset of the active phase can be modeled as a Poisson process with a time constant of approximately 20 minutes, while egg-laying events within an active phase occur with a faster time constant of approximately 20 seconds. Here we propose a cellular model for how the temporal pattern of egg-laying might be generated, based on genetic and cell-biological experiments and statistical analyses. Results We suggest that the HSN neuron is the executive neuron driving egg-laying events. We propose that the VC neurons act as "single egg counters" that inhibit HSN activity for short periods in response to individual egg-laying events. We further propose that the uv1 neuroendocrine cells are "cluster counters", which inhibit HSN activity for longer periods and are responsible for the time constant of the inactive phase. Together they form an integrated circuit that drives the clustered egg-laying pattern. Conclusions The detailed predictions derived from this model can now be tested by straightforward validation experiments.
Collapse
|
105
|
Abstract
Systems biology is increasingly popular, but to many biologists it remains unclear what this new discipline actually encompasses. This brief personal perspective starts by outlining the asthetic qualities that motivate systems biologists, discusses which activities do not belong to the core of systems biology, and finally explores the crucial link with synthetic biology. It concludes by attempting to define systems biology as the research endeavor that aims at providing the scientific foundation for successful synthetic biology.
Collapse
|
106
|
Alam MT, Merlo ME, Hodgson DA, Wellington EMH, Takano E, Breitling R. Metabolic modeling and analysis of the metabolic switch in Streptomyces coelicolor. BMC Genomics 2010; 11:202. [PMID: 20338070 PMCID: PMC2853524 DOI: 10.1186/1471-2164-11-202] [Citation(s) in RCA: 69] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2009] [Accepted: 03/26/2010] [Indexed: 12/19/2022] Open
Abstract
BACKGROUND The transition from exponential to stationary phase in Streptomyces coelicolor is accompanied by a major metabolic switch and results in a strong activation of secondary metabolism. Here we have explored the underlying reorganization of the metabolome by combining computational predictions based on constraint-based modeling and detailed transcriptomics time course observations. RESULTS We reconstructed the stoichiometric matrix of S. coelicolor, including the major antibiotic biosynthesis pathways, and performed flux balance analysis to predict flux changes that occur when the cell switches from biomass to antibiotic production. We defined the model input based on observed fermenter culture data and used a dynamically varying objective function to represent the metabolic switch. The predicted fluxes of many genes show highly significant correlation to the time series of the corresponding gene expression data. Individual mispredictions identify novel links between antibiotic production and primary metabolism. CONCLUSION Our results show the usefulness of constraint-based modeling for providing a detailed interpretation of time course gene expression data.
Collapse
|
107
|
van Ham TJ, Breitling R, Swertz MA, Nollen EAA. Neurodegenerative diseases: Lessons from genome-wide screens in small model organisms. EMBO Mol Med 2010; 1:360-70. [PMID: 20049741 PMCID: PMC3378155 DOI: 10.1002/emmm.200900051] [Citation(s) in RCA: 61] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023] Open
Abstract
Various age-related neurodegenerative diseases, including Parkinson's disease, polyglutamine expansion diseases and Alzheimer's disease, are associated with the accumulation of misfolded proteins in aggregates in the brain. How and why these proteins form aggregates and cause disease is still poorly understood. Small model organisms—the baker's yeast Saccharomyces cerevisiae, the nematode worm Caenorhabditis elegans and the fruit fly Drosophila melanogaster—have been used to model these diseases and high-throughput genetic screens using these models have led to the identification of a large number of genes that modify aggregation and toxicity of the disease proteins. In this review, we revisit these models and provide a comprehensive comparison of the genetic screens performed so far. Our integrative analysis highlights alterations of a wide variety of basic cellular processes. Not all disease proteins are influenced by alterations in the same cellular processes and despite the unifying theme of protein misfolding and aggregation, the pathology of each of the age-related misfolding disorders can be induced or influenced by a disease-protein-specific subset of molecular processes.
Collapse
|
108
|
Armengaud P, Breitling R, Amtmann A. Coronatine-insensitive 1 (COI1) mediates transcriptional responses of Arabidopsis thaliana to external potassium supply. MOLECULAR PLANT 2010; 3:390-405. [PMID: 20339157 PMCID: PMC2845782 DOI: 10.1093/mp/ssq012] [Citation(s) in RCA: 39] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/21/2009] [Accepted: 12/27/2009] [Indexed: 05/18/2023]
Abstract
The ability to adjust growth and development to the availability of mineral nutrients in the soil is an essential life skill of plants but the underlying signaling pathways are poorly understood. In Arabidopsis thaliana, shortage of potassium (K) induces a number of genes related to the phytohormone jasmonic acid (JA). Using comparative microarray analysis of wild-type and coi1-16 mutant plants, we classified transcriptional responses to K with respect to their dependence on COI1, a central component of oxylipin signaling. Expression profiles obtained in a short-term experiment clearly distinguished between COI1-dependent and COI1-independent K-responsive genes, and identified both known and novel targets of JA-COI1-signaling. During long-term K-deficiency, coi-16 mutants displayed de novo responses covering similar functions as COI1-targets except for defense. A putative role of JA for enhancing the defense potential of K-deficient plants was further supported by the observation that plants grown on low K were less damaged by thrips than plants grown with sufficient K.
Collapse
|
109
|
Alam MT, Merlo ME, Takano E, Breitling R. Genome-based phylogenetic analysis of Streptomyces and its relatives. Mol Phylogenet Evol 2009; 54:763-72. [PMID: 19948233 DOI: 10.1016/j.ympev.2009.11.019] [Citation(s) in RCA: 33] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2009] [Revised: 11/18/2009] [Accepted: 11/19/2009] [Indexed: 11/18/2022]
Abstract
MOTIVATION Streptomyces is one of the best-studied genera of the order Actinomycetales due to its great importance in medical science, ecology and the biotechnology industry. A comprehensive, detailed and robust phylogeny of Streptomyces and its relatives is needed for understanding how this group emerged and maintained such a vast diversity throughout evolution and how soil-living mycelial forms (e.g., Streptomyces s. str.) are related to parasitic, unicellular pathogens (e.g., Mycobacterium tuberculosis) or marine species (e.g., Salinispora tropica). The most important application area of such a phylogenetic analysis will be in the comparative re-annotation of genome sequences and the reconstruction of Streptomyces metabolic networks for biotechnology. METHODS Classical 16S-rRNA-based phylogenetic reconstruction does not guarantee to produce well-resolved robust trees that reflect the overall relationship between bacterial species with widespread horizontal gene transfer. In our study we therefore combine three whole genome-based phylogenies with eight different, highly informative single-gene phylogenies to determine a new robust consensus tree of 45 Actinomycetales species with completely sequenced genomes. RESULTS None of the individual methods achieved a resolved phylogeny of Streptomyces and its relatives. Single-gene approaches failed to yield a detailed phylogeny; even though the single trees are in good agreement among each other, they show very low resolution of inner branches. The three whole genome-based methods improve resolution considerably. Only by combining the phylogenies from single gene-based and genome-based approaches we finally obtained a consensus tree with well-resolved branches for the entire set of Actinomycetales species. This phylogenetic information is stable and informative enough for application to the system-wide comparative modeling of bacterial physiology.
Collapse
|
110
|
Michaelis M, Klassert D, Barth S, Suhan T, Breitling R, Mayer B, Hinsch N, Doerr HW, Cinatl J, Cinatl J. Chemoresistance acquisition induces a global shift of expression of aniogenesis-associated genes and increased pro-angogenic activity in neuroblastoma cells. Mol Cancer 2009; 8:80. [PMID: 19788758 PMCID: PMC2761864 DOI: 10.1186/1476-4598-8-80] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2009] [Accepted: 09/29/2009] [Indexed: 01/13/2023] Open
Abstract
Background Chemoresistance acquisition may influence cancer cell biology. Here, bioinformatics analysis of gene expression data was used to identify chemoresistance-associated changes in neuroblastoma biology. Results Bioinformatics analysis of gene expression data revealed that expression of angiogenesis-associated genes significantly differs between chemosensitive and chemoresistant neuroblastoma cells. A subsequent systematic analysis of a panel of 14 chemosensitive and chemoresistant neuroblastoma cell lines in vitro and in animal experiments indicated a consistent shift to a more pro-angiogenic phenotype in chemoresistant neuroblastoma cells. The molecular mechanims underlying increased pro-angiogenic activity of neuroblastoma cells are individual and differ between the investigated chemoresistant cell lines. Treatment of animals carrying doxorubicin-resistant neuroblastoma xenografts with doxorubicin, a cytotoxic drug known to exert anti-angiogenic activity, resulted in decreased tumour vessel formation and growth indicating chemoresistance-associated enhanced pro-angiogenic activity to be relevant for tumour progression and to represent a potential therapeutic target. Conclusion A bioinformatics approach allowed to identify a relevant chemoresistance-associated shift in neuroblastoma cell biology. The chemoresistance-associated enhanced pro-angiogenic activity observed in neuroblastoma cells is relevant for tumour progression and represents a potential therapeutic target.
Collapse
|
111
|
Li Y, Swertz MA, Vera G, Fu J, Breitling R, Jansen RC. designGG: an R-package and web tool for the optimal design of genetical genomics experiments. BMC Bioinformatics 2009; 10:188. [PMID: 19538731 PMCID: PMC2706229 DOI: 10.1186/1471-2105-10-188] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2009] [Accepted: 06/18/2009] [Indexed: 11/25/2022] Open
Abstract
BACKGROUND High-dimensional biomolecular profiling of genetically different individuals in one or more environmental conditions is an increasingly popular strategy for exploring the functioning of complex biological systems. The optimal design of such genetical genomics experiments in a cost-efficient and effective way is not trivial. RESULTS This paper presents designGG, an R package for designing optimal genetical genomics experiments. A web implementation for designGG is available at http://gbic.biol.rug.nl/designGG. All software, including source code and documentation, is freely available. CONCLUSION DesignGG allows users to intelligently select and allocate individuals to experimental units and conditions such as drug treatment. The user can maximize the power and resolution of detecting genetic, environmental and interaction effects in a genome-wide or local mode by giving more weight to genome regions of special interest, such as previously detected phenotypic quantitative trait loci. This will help to achieve high power and more accurate estimates of the effects of interesting factors, and thus yield a more reliable biological interpretation of data. DesignGG is applicable to linkage analysis of experimental crosses, e.g. recombinant inbred lines, as well as to association analysis of natural populations.
Collapse
|
112
|
Breitling R. Robust signaling networks of the adipose secretome. Trends Endocrinol Metab 2009; 20:1-7. [PMID: 18930409 DOI: 10.1016/j.tem.2008.08.006] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/02/2008] [Revised: 08/27/2008] [Accepted: 08/27/2008] [Indexed: 12/27/2022]
Abstract
Type 2 diabetes is a prototypical complex systems disease that has a strong hereditary component and etiologic links with a sedentary lifestyle, overeating and obesity. Adipose tissue has been shown to be a central driver of type 2 diabetes progression, establishing and maintaining a chronic state of low-level inflammation. The number and diversity of identified endocrine factors from adipose tissue (adipokines) is growing rapidly. Here, I argue that a systems biology approach to understanding the robust multi-level signaling networks established by the adipose secretome will be crucial for developing efficient type 2 diabetes treatment. Recent advances in whole-genome association studies, global molecular profiling and quantitative modeling are currently fueling the emergence of this novel research strategy.
Collapse
|
113
|
Rogers S, Scheltema RA, Girolami M, Breitling R. Probabilistic assignment of formulas to mass peaks in metabolomics experiments. Bioinformatics 2008; 25:512-8. [DOI: 10.1093/bioinformatics/btn642] [Citation(s) in RCA: 76] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022] Open
|
114
|
Wiederhold E, Gandhi T, Permentier HP, Breitling R, Poolman B, Slotboom DJ. The yeast vacuolar membrane proteome. Mol Cell Proteomics 2008; 8:380-92. [PMID: 19001347 DOI: 10.1074/mcp.m800372-mcp200] [Citation(s) in RCA: 69] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/26/2023] Open
Abstract
Transport of solutes between the cytosol and the vacuolar lumen is of crucial importance for various functions of vacuoles, including ion homeostasis; detoxification; storage of different molecules such as amino acids, phosphate, and calcium ions; and proteolysis. To identify proteins that catalyze solute transport across the vacuolar membrane, the membrane proteome of purified Saccharomyces cerevisiae vacuoles was analyzed. Subtractive proteomics was used to distinguish contaminants from true vacuolar proteins by comparing the relative abundances of proteins in pure and crude preparations. A robust statistical analysis combining enrichment ranking with the double boundary iterative group analysis revealed that 148 proteins were significantly enriched in the pure vacuolar preparations. Among these proteins were well characterized vacuolar proteins, such as the subunits of the vacuolar H(+)-ATPase, but also proteins that had not previously been assigned to a cellular location, many of which are likely novel vacuolar membrane transporters, e.g. for nucleosides and oligopeptides. Although the majority of contaminating proteins from other organelles were depleted from the pure vacuolar membranes, some proteins annotated to reside in other cellular locations were enriched along with the vacuolar proteins. In many cases the enrichment of these proteins is biologically relevant, and we discuss that a large group is involved in membrane fusion and protein trafficking to vacuoles and may have multiple localizations. Other proteins are degraded in vacuoles, and in some cases database annotations are likely to be incomplete or incorrect. Our work provides a wealth of information on vacuolar biology and a solid basis for further characterization of vacuolar functions.
Collapse
|
115
|
Scheltema RA, Kamleh A, Wildridge D, Ebikeme C, Watson DG, Barrett MP, Jansen RC, Breitling R. Increasing the mass accuracy of high-resolution LC-MS data using background ions - a case study on the LTQ-Orbitrap. Proteomics 2008; 8:4647-56. [DOI: 10.1002/pmic.200800314] [Citation(s) in RCA: 50] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
|
116
|
Breitling R, Li Y, Tesson BM, Fu J, Wu C, Wiltshire T, Gerrits A, Bystrykh LV, de Haan G, Su AI, Jansen RC. Genetical genomics: spotlight on QTL hotspots. PLoS Genet 2008; 4:e1000232. [PMID: 18949031 PMCID: PMC2563687 DOI: 10.1371/journal.pgen.1000232] [Citation(s) in RCA: 164] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023] Open
|
117
|
Blom EJ, Breitling R, Hofstede KJ, Roerdink JBTM, van Hijum SAFT, Kuipers OP. Prosecutor: parameter-free inference of gene function for prokaryotes using DNA microarray data, genomic context and multiple gene annotation sources. BMC Genomics 2008; 9:495. [PMID: 18939968 PMCID: PMC2585105 DOI: 10.1186/1471-2164-9-495] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2008] [Accepted: 10/21/2008] [Indexed: 01/23/2023] Open
Abstract
Background Despite a plethora of functional genomic efforts, the function of many genes in sequenced genomes remains unknown. The increasing amount of microarray data for many species allows employing the guilt-by-association principle to predict function on a large scale: genes exhibiting similar expression patterns are more likely to participate in shared biological processes. Results We developed Prosecutor, an application that enables researchers to rapidly infer gene function based on available gene expression data and functional annotations. Our parameter-free functional prediction method uses a sensitive algorithm to achieve a high association rate of linking genes with unknown function to annotated genes. Furthermore, Prosecutor utilizes additional biological information such as genomic context and known regulatory mechanisms that are specific for prokaryotes. We analyzed publicly available transcriptome data sets and used literature sources to validate putative functions suggested by Prosecutor. We supply the complete results of our analysis for 11 prokaryotic organisms on a dedicated website. Conclusion The Prosecutor software and supplementary datasets available at allow researchers working on any of the analyzed organisms to quickly identify the putative functions of their genes of interest. A de novo analysis allows new organisms to be studied.
Collapse
|
118
|
Li Y, Breitling R, Jansen RC. Generalizing genetical genomics: getting added value from environmental perturbation. Trends Genet 2008; 24:518-24. [PMID: 18774198 DOI: 10.1016/j.tig.2008.08.001] [Citation(s) in RCA: 36] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2007] [Revised: 08/08/2008] [Accepted: 08/09/2008] [Indexed: 11/29/2022]
Abstract
Genetical genomics is a useful approach for studying the effect of genetic perturbations on biological systems at the molecular level. However, molecular networks depend on the environmental conditions and, thus, a comprehensive understanding of biological systems requires studying them across multiple environments. We propose a generalization of genetical genomics, which combines genetic and sensibly chosen environmental perturbations, to study the plasticity of molecular networks. This strategy forms a crucial step toward understanding why individuals respond differently to drugs, toxins, pathogens, nutrients and other environmental influences. Here we outline a strategy for selecting and allocating individuals to particular treatments, and we discuss the promises and pitfalls of the generalized genetical genomics approach.
Collapse
|
119
|
Breitling R, Gilbert D, Heiner M, Orton R. A structured approach for the engineering of biochemical network models, illustrated for signalling pathways. Brief Bioinform 2008; 9:404-21. [PMID: 18573813 DOI: 10.1093/bib/bbn026] [Citation(s) in RCA: 48] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Quantitative models of biochemical networks (signal transduction cascades, metabolic pathways, gene regulatory circuits) are a central component of modern systems biology. Building and managing these complex models is a major challenge that can benefit from the application of formal methods adopted from theoretical computing science. Here we provide a general introduction to the field of formal modelling, which emphasizes the intuitive biochemical basis of the modelling process, but is also accessible for an audience with a background in computing science and/or model engineering. We show how signal transduction cascades can be modelled in a modular fashion, using both a qualitative approach--qualitative Petri nets, and quantitative approaches--continuous Petri nets and ordinary differential equations (ODEs). We review the major elementary building blocks of a cellular signalling model, discuss which critical design decisions have to be made during model building, and present a number of novel computational tools that can help to explore alternative modular models in an easy and intuitive manner. These tools, which are based on Petri net theory, offer convenient ways of composing hierarchical ODE models, and permit a qualitative analysis of their behaviour. We illustrate the central concepts using signal transduction as our main example. The ultimate aim is to introduce a general approach that provides the foundations for a structured formal engineering of large-scale models of biochemical networks.
Collapse
|
120
|
van Ham TJ, Thijssen KL, Breitling R, Hofstra RMW, Plasterk RHA, Nollen EAA. C. elegans model identifies genetic modifiers of alpha-synuclein inclusion formation during aging. PLoS Genet 2008; 4:e1000027. [PMID: 18369446 PMCID: PMC2265412 DOI: 10.1371/journal.pgen.1000027] [Citation(s) in RCA: 305] [Impact Index Per Article: 19.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2007] [Accepted: 02/08/2008] [Indexed: 11/19/2022] Open
Abstract
Inclusions in the brain containing α-synuclein are the pathological hallmark of Parkinson's disease, but how these inclusions are formed and how this links to disease is poorly understood. We have developed a C. elegans model that makes it possible to monitor, in living animals, the formation of α-synuclein inclusions. In worms of old age, inclusions contain aggregated α- synuclein, resembling a critical pathological feature. We used genome-wide RNA interference to identify processes involved in inclusion formation, and identified 80 genes that, when knocked down, resulted in a premature increase in the number of inclusions. Quality control and vesicle-trafficking genes expressed in the ER/Golgi complex and vesicular compartments were overrepresented, indicating a specific role for these processes in α-synuclein inclusion formation. Suppressors include aging-associated genes, such as sir-2.1/SIRT1 and lagr-1/LASS2. Altogether, our data suggest a link between α-synuclein inclusion formation and cellular aging, likely through an endomembrane-related mechanism. The processes and genes identified here present a framework for further study of the disease mechanism and provide candidate susceptibility genes and drug targets for Parkinson's disease and other α-synuclein related disorders. Parkinson's disease is the second most common brain disorder of the elderly. It is thought to be caused by environmental and genetic factors. However, little is known about the genes and processes involved. Pathologically, Parkinson's disease is recognized by inclusions in the brain that contain a disease-specific protein: alpha-synuclein. We created a small animal model (C. elegans) in which we could follow the formation of alpha-synuclein inclusions in living and aging animals. With a genome-wide RNAi screen we identified 80 genes whose expression influences inclusion formation. These genes include evolutionarily conserved regulators of longevity, suggesting a link between inclusion formation and the molecular mechanism of aging. Our results offer a refined understanding of how Parkinson's disease arises during aging and we identify processes and genes that may underlie an increased susceptibility for the disease, which is important for improving diagnostics and developing strategies for therapeutic intervention.
Collapse
|
121
|
Breitling R. Greased hedgehogs: new links between hedgehog signaling and cholesterol metabolism. Bioessays 2008; 29:1085-94. [PMID: 17935218 DOI: 10.1002/bies.20663] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
The close link between signaling by the developmental regulators of the Hedgehog family and cholesterol biochemistry has been known for some time. The morphogen is covalently attached to cholesterol in a peculiar autocatalytic reaction and embryonal disruption of cholesterol synthesis leads to malformations that mimic Hh signaling defects. Recently, it was furthermore shown that secreted Hh could hitchhike on lipoprotein particles to establish its morphogenic gradient in the developing embryo. Additionally, there is new evidence that the Hh-receptor Patched transmits the Hh signal by modulating the secretion of an inhibitory sterol molecule from the receiving cells. Here we present some of the most recent discoveries on the Hh-sterol link and discuss their implications from a systems design perspective. We predict that a robust functioning of the Hh pathway will require the involvement of more sterol metabolites, and these should be the subject of future research.
Collapse
|
122
|
Hong F, Breitling R. A comparison of meta-analysis methods for detecting differentially expressed genes in microarray experiments. ACTA ACUST UNITED AC 2008; 24:374-82. [PMID: 18204063 DOI: 10.1093/bioinformatics/btm620] [Citation(s) in RCA: 161] [Impact Index Per Article: 10.1] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]
Abstract
MOTIVATION The proliferation of public data repositories creates a need for meta-analysis methods to efficiently evaluate, integrate and validate related datasets produced by independent groups. A t-based approach has been proposed to integrate effect size from multiple studies by modeling both intra- and between-study variation. Recently, a non-parametric 'rank product' method, which is derived based on biological reasoning of fold-change criteria, has been applied to directly combine multiple datasets into one meta study. Fisher's Inverse chi(2) method, which only depends on P-values from individual analyses of each dataset, has been used in a couple of medical studies. While these methods address the question from different angles, it is not clear how they compare with each other. RESULTS We comparatively evaluate the three methods; t-based hierarchical modeling, rank products and Fisher's Inverse chi(2) test with P-values from either the t-based or the rank product method. A simulation study shows that the rank product method, in general, has higher sensitivity and selectivity than the t-based method in both individual and meta-analysis, especially in the setting of small sample size and/or large between-study variation. Not surprisingly, Fisher's chi(2) method highly depends on the method used in the individual analysis. Application to real datasets demonstrates that meta-analysis achieves more reliable identification than an individual analysis, and rank products are more robust in gene ranking, which leads to a much higher reproducibility among independent studies. Though t-based meta-analysis greatly improves over the individual analysis, it suffers from a potentially large amount of false positives when P-values serve as threshold. We conclude that careful meta-analysis is a powerful tool for integrating multiple array studies.
Collapse
|
123
|
Jourdan F, Breitling R, Barrett MP, Gilbert D. MetaNetter: inference and visualization of high-resolution metabolomic networks. Bioinformatics 2007; 24:143-5. [DOI: 10.1093/bioinformatics/btm536] [Citation(s) in RCA: 46] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
|
124
|
Roelofsen H, Alvarez-Llamas G, Dijkstra M, Breitling R, Havenga K, Bijzet J, Zandbergen W, de Vries MP, Ploeg RJ, Vonk RJ. Analyses of intricate kinetics of the serum proteome during and after colon surgery by protein expression time series. Proteomics 2007; 7:3219-28. [PMID: 17806085 DOI: 10.1002/pmic.200601047] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2023]
Abstract
Monitoring changes in serum protein expression in response to acute events such as trauma, infection or drug intervention may reveal key proteins of great value in predicting recovery or treatment response. Concerted actions of many proteins are expected. Proteins sharing similar expression changes may function in the same physiological process. As a model we analyzed expression changes in serum of colon cancer patients, before, during, and after laparoscopic colon resection. Eight samples were taken from each of four patients before, during, and up to 5 days after surgery. Total serum and a low molecular weight fraction were analyzed by SELDI-TOF-MS. In total 146 masses were detected. A principal components analysis (PCA) illustrates the temporal variation in the postsurgery proteome. Time series for each mass could be clustered into four distinct groups based on similarity in expression pattern. Two masses of 11.4 and 11.6 kDa, part of a slow response cluster, were identified as forms of the acute phase protein serum amyloid A (SAA). Fourteen more proteins belong to this cluster and may also function in acute phase response. We present an approach to analyze temporal variation in the proteome. This approach may be useful to evaluate surgical, nutritional, and pharmacological interventions.
Collapse
|
125
|
Alberts R, Terpstra P, Li Y, Breitling R, Nap JP, Jansen RC. Sequence polymorphisms cause many false cis eQTLs. PLoS One 2007; 2:e622. [PMID: 17637838 PMCID: PMC1906859 DOI: 10.1371/journal.pone.0000622] [Citation(s) in RCA: 108] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2007] [Accepted: 05/29/2007] [Indexed: 11/23/2022] Open
Abstract
Many investigations have reported the successful mapping of quantitative trait loci (QTLs) for gene expression phenotypes (eQTLs). Local eQTLs, where expression phenotypes map to the genes themselves, are of especially great interest, because they are direct candidates for previously mapped physiological QTLs. Here we show that many mapped local eQTLs in genetical genomics experiments do not reflect actual expression differences caused by sequence polymorphisms in cis-acting factors changing mRNA levels. Instead they indicate hybridization differences caused by sequence polymorphisms in the mRNA region that is targeted by the microarray probes. Many such polymorphisms can be detected by a sensitive and novel statistical approach that takes the individual probe signals into account. Applying this approach to recent mouse and human eQTL data, we demonstrate that indeed many local eQTLs are falsely reported as “cis-acting” or “cis” and can be successfully detected and eliminated with this approach.
Collapse
|
126
|
Alberts R, Terpstra P, Hardonk M, Bystrykh LV, de Haan G, Breitling R, Nap JP, Jansen RC. A verification protocol for the probe sequences of Affymetrix genome arrays reveals high probe accuracy for studies in mouse, human and rat. BMC Bioinformatics 2007; 8:132. [PMID: 17448222 PMCID: PMC1865557 DOI: 10.1186/1471-2105-8-132] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2006] [Accepted: 04/20/2007] [Indexed: 01/09/2023] Open
Abstract
Background The Affymetrix GeneChip technology uses multiple probes per gene to measure its expression level. Individual probe signals can vary widely, which hampers proper interpretation. This variation can be caused by probes that do not properly match their target gene or that match multiple genes. To determine the accuracy of Affymetrix arrays, we developed an extensive verification protocol, for mouse arrays incorporating the NCBI RefSeq, NCBI UniGene Unique, NIA Mouse Gene Index, and UCSC mouse genome databases. Results Applying this protocol to Affymetrix Mouse Genome arrays (the earlier U74Av2 and the newer 430 2.0 array), the number of sequence-verified probes with perfect matches was no less than 85% and 95%, respectively; and for 74% and 85% of the probe sets all probes were sequence verified. The latter percentages increased to 80% and 94% after discarding one or two unverifiable probes per probe set, and even further to 84% and 97% when, in addition, allowing for one or two mismatches between probe and target gene. Similar results were obtained for other mouse arrays, as well as for human and rat arrays. Based on these data, refined chip definition files for all arrays are provided online. Researchers can choose the version appropriate for their study to (re)analyze expression data. Conclusion The accuracy of Affymetrix probe sequences is higher than previously reported, particularly on newer arrays. Yet, refined probe set definitions have clear effects on the detection of differentially expressed genes. We demonstrate that the interpretation of the results of Affymetrix arrays is improved when the new chip definition files are used.
Collapse
|
127
|
Al-Shahib A, Breitling R, Gilbert DR. Predicting protein function by machine learning on amino acid sequences--a critical evaluation. BMC Genomics 2007; 8:78. [PMID: 17374164 PMCID: PMC1847686 DOI: 10.1186/1471-2164-8-78] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2006] [Accepted: 03/20/2007] [Indexed: 11/10/2022] Open
Abstract
Background Predicting the function of newly discovered proteins by simply inspecting their amino acid sequence is one of the major challenges of post-genomic computational biology, especially when done without recourse to experimentation or homology information. Machine learning classifiers are able to discriminate between proteins belonging to different functional classes. Until now, however, it has been unclear if this ability would be transferable to proteins of unknown function, which may show distinct biases compared to experimentally more tractable proteins. Results Here we show that proteins with known and unknown function do indeed differ significantly. We then show that proteins from different bacterial species also differ to an even larger and very surprising extent, but that functional classifiers nonetheless generalize successfully across species boundaries. We also show that in the case of highly specialized proteomes classifiers from a different, but more conventional, species may in fact outperform the endogenous species-specific classifier. Conclusion We conclude that there is very good prospect of successfully predicting the function of yet uncharacterized proteins using machine learning classifiers trained on proteins of known function.
Collapse
|
128
|
Kammenga JE, Herman MA, Ouborg NJ, Johnson L, Breitling R. Microarray challenges in ecology. Trends Ecol Evol 2007; 22:273-9. [PMID: 17296243 DOI: 10.1016/j.tree.2007.01.013] [Citation(s) in RCA: 49] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2006] [Revised: 01/11/2007] [Accepted: 01/29/2007] [Indexed: 01/03/2023]
Abstract
Microarrays are used to measure simultaneously the amount of mRNAs transcribed from many genes. They were originally designed for gene expression profiling in relatively simple biological systems, such as cell lines and model systems under constant laboratory conditions. This poses a challenge to ecologists who increasingly want to use microarrays to unravel the genetic mechanisms underlying complex interactions among organisms and between organisms and their environment. Here, we discuss typical experimental and statistical problems that arise when analyzing genome-wide expression profiles in an ecological context. We show that experimental design and environmental confounders greatly influence the identification of candidate genes in ecological microarray studies, and that following several simple recommendations could facilitate the analysis of microarray data in ecological settings.
Collapse
|
129
|
Blom EJ, Bosman DWJ, van Hijum SAFT, Breitling R, Tijsma L, Silvis R, Roerdink JBTM, Kuipers OP. FIVA: Functional Information Viewer and Analyzer extracting biological knowledge from transcriptome data of prokaryotes. Bioinformatics 2007; 23:1161-3. [PMID: 17237043 DOI: 10.1093/bioinformatics/btl658] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
UNLABELLED FIVA (Function Information Viewer and Analyzer) aids researchers in the prokaryotic community to quickly identify relevant biological processes following transcriptome analysis. Our software assists in functional profiling of large sets of genes and generates a comprehensive overview of affected biological processes. AVAILABILITY http://bioinformatics.biol.rug.nl/standalone/fiva/
Collapse
|
130
|
van Ham T, Thijssen K, Breitling R, Hofstra R, Plasterk R, Nollen E. 2.314 Identification of modifiers of alpha-synuclein inclusion in a C. elegans model by genome-wide RNAi. Parkinsonism Relat Disord 2007. [DOI: 10.1016/s1353-8020(08)70712-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/01/2022]
|
131
|
van Ham T, Thijssen K, Breitling R, Hofstra R, Plasterk R, Nollen E. 2.019 Identification of modifiers of alpha-synuclein inclusion in a C. elegans model by genome-wide RNAi. Parkinsonism Relat Disord 2007. [DOI: 10.1016/s1353-8020(08)70587-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
|
132
|
Abstract
Complex diseases, such as allergy, diabetes and obesity depend on altered interactions between multiple genes, rather than changes in a single causal gene. DNA microarray studies of a complex disease often implicate hundreds of genes in the pathogenesis. This indicates that many different mechanisms and pathways are involved. How can we understand such complexity? How can hypotheses be formulated and tested? One approach is to organize the data in network models and to analyze these in a top-down manner. Globally, networks in nature are often characterized by a small number of highly connected nodes, while the majority of nodes have few connections. The highly connected nodes serve as hubs that affect many other nodes. Such hubs have key roles in the network. In yeast cells, for example, deletion of highly connected proteins is associated with increased lethality, compared to deletion of less connected proteins. This suggests the biological relevance of networks. Moving down in the network structure, there may be sub-networks or modules with specific functions. These modules may be further dissected to analyze individual nodes. In the context of DNA microarray studies of complex diseases, gene-interaction networks may contain modules of co-regulated or interacting genes that have distinct biological functions. Such modules may be linked to specific gene polymorphisms, transcription factors, cellular functions and disease mechanisms. Genes that are reliably active only in the context of their modules can be considered markers for the activity of the modules and may thus be promising candidates for biomarkers or therapeutic targets. This review aims to give an introduction to network theory and how it can be applied to microarray studies of complex diseases.
Collapse
|
133
|
Al-Shahib A, Breitling R, Gilbert D. Feature selection and the class imbalance problem in predicting protein function from sequence. ACTA ACUST UNITED AC 2006; 4:195-203. [PMID: 16231961 DOI: 10.2165/00822942-200504030-00004] [Citation(s) in RCA: 80] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
Abstract
When the standard approach to predict protein function by sequence homology fails, other alternative methods can be used that require only the amino acid sequence for predicting function. One such approach uses machine learning to predict protein function directly from amino acid sequence features. However, there are two issues to consider before successful functional prediction can take place: identifying discriminatory features, and overcoming the challenge of a large imbalance in the training data. We show that by applying feature subset selection followed by undersampling of the majority class, significantly better support vector machine (SVM) classifiers are generated compared with standard machine learning approaches. As well as revealing that the features selected could have the potential to advance our understanding of the relationship between sequence and function, we also show that undersampling to produce fully balanced data significantly improves performance. The best discriminating ability is achieved using SVMs together with feature selection and full undersampling; this approach strongly outperforms other competitive learning algorithms. We conclude that this combined approach can generate powerful machine learning classifiers for predicting protein function directly from sequence.
Collapse
|
134
|
Breitling R, Pitt AR, Barrett MP. Precision mapping of the metabolome. Trends Biotechnol 2006; 24:543-8. [PMID: 17064801 DOI: 10.1016/j.tibtech.2006.10.006] [Citation(s) in RCA: 88] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2006] [Accepted: 10/11/2006] [Indexed: 12/27/2022]
Abstract
The global study of the structure and dynamics of metabolic networks has been hindered by a lack of techniques that identify metabolites and their biochemical relationship in complex mixtures. The recent application of Fourier transform ion cyclotron resonance mass spectrometry (FTICR-MS) to metabolomic analysis suggests a way to tackle the problem. A lower-cost alternative to high-field FTICR-MS, the Orbitrap mass analyzer, promises accelerated activity in this area. Here, we show how the ultra-high mass accuracy and resolution provided by this new generation of mass spectrometers can help to identify metabolites and connect them into metabolic networks. Data from perturbation studies and isotope-tracking experiments can complement this information to create metabolic maps de novo and chart unexplored areas of metabolism.
Collapse
|
135
|
Li Y, Álvarez OA, Gutteling EW, Tijsterman M, Fu J, Riksen JAG, Hazendonk E, Prins P, Plasterk RHA, Jansen RC, Breitling R, Kammenga JE. Mapping determinants of gene expression plasticity by genetical genomics in C. elegans. PLoS Genet 2006; 2:e222. [PMID: 17196041 PMCID: PMC1756913 DOI: 10.1371/journal.pgen.0020222] [Citation(s) in RCA: 236] [Impact Index Per Article: 13.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2006] [Accepted: 11/09/2006] [Indexed: 11/18/2022] Open
Abstract
Recent genetical genomics studies have provided intimate views on gene regulatory networks. Gene expression variations between genetically different individuals have been mapped to the causal regulatory regions, termed expression quantitative trait loci. Whether the environment-induced plastic response of gene expression also shows heritable difference has not yet been studied. Here we show that differential expression induced by temperatures of 16 °C and 24 °C has a strong genetic component in Caenorhabditis elegans recombinant inbred strains derived from a cross between strains CB4856 (Hawaii) and N2 (Bristol). No less than 59% of 308 trans-acting genes showed a significant eQTL-by-environment interaction, here termed plasticity quantitative trait loci. In contrast, only 8% of an estimated 188 cis-acting genes showed such interaction. This indicates that heritable differences in plastic responses of gene expression are largely regulated in trans. This regulation is spread over many different regulators. However, for one group of trans-genes we found prominent evidence for a common master regulator: a transband of 66 coregulated genes appeared at 24 °C. Our results suggest widespread genetic variation of differential expression responses to environmental impacts and demonstrate the potential of genetical genomics for mapping the molecular determinants of phenotypic plasticity. It is widely documented that environmental changes will induce differential expression of genes, yet it is unknown how these patterns of environment-induced expression plasticity are inherited and how they differ between genetically divergent individuals of a biological species. In this paper the authors used recombinant inbred lines of the nematode worm C. elegans that were derived from parental lines originally collected in Bristol (United Kingdom) and Hawaii, and measured genome-wide gene expression at two different temperatures. Using statistical analysis tools developed for quantitative trait locus mapping, they found genes with genetically determined differences in their plastic response to temperature changes. A majority of them were found to be regulated by genes at a different genome position (regulated in trans). A striking observation was a group of 66 genes that share a common potential regulator and may be related to differences in fertility plasticity. These results show that differential responses of different genotypes to environmental changes are widespread. Because all species are subjected to environmental change, both at individual and evolutionary time scales, the authors' work calls for studying the heritable component of plasticity of gene regulation in other organisms to enhance understanding of the environmental forces that drive evolutionary adaptation.
Collapse
|
136
|
Hong F, Breitling R, McEntee CW, Wittner BS, Nemhauser JL, Chory J. RankProd: a bioconductor package for detecting differentially expressed genes in meta-analysis. ACTA ACUST UNITED AC 2006; 22:2825-7. [PMID: 16982708 DOI: 10.1093/bioinformatics/btl476] [Citation(s) in RCA: 541] [Impact Index Per Article: 30.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]
Abstract
UNLABELLED While meta-analysis provides a powerful tool for analyzing microarray experiments by combining data from multiple studies, it presents unique computational challenges. The Bioconductor package RankProd provides a new and intuitive tool for this purpose in detecting differentially expressed genes under two experimental conditions. The package modifies and extends the rank product method proposed by Breitling et al., [(2004) FEBS Lett., 573, 83-92] to integrate multiple microarray studies from different laboratories and/or platforms. It offers several advantages over t-test based methods and accepts pre-processed expression datasets produced from a wide variety of platforms. The significance of the detection is assessed by a non-parametric permutation test, and the associated P-value and false discovery rate (FDR) are included in the output alongside the genes that are detected by user-defined criteria. A visualization plot is provided to view actual expression levels for each gene with estimated significance measurements. AVAILABILITY RankProd is available at Bioconductor http://www.bioconductor.org. A web-based interface will soon be available at http://cactus.salk.edu/RankProd
Collapse
|
137
|
Breitling R. Biological microarray interpretation: The rules of engagement. ACTA ACUST UNITED AC 2006; 1759:319-27. [PMID: 16904203 DOI: 10.1016/j.bbaexp.2006.06.003] [Citation(s) in RCA: 38] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2006] [Revised: 06/30/2006] [Accepted: 06/30/2006] [Indexed: 11/25/2022]
Abstract
Gene expression microarrays are now established as a standard tool in biological and biochemical laboratories. Interpreting the masses of data generated by this technology poses a number of unusual new challenges. Over the past few years a consensus has begun to emerge concerning the most important pitfalls and the proper ways to avoid them. This review provides an overview of these ideas, beginning with relevant aspects of experimental design and normalization, but focusing in particular on the various tools and concepts that help to interpret microarray results. These new approaches make it much easier to extract biologically relevant and reliable hypotheses in an objective and reasonably unbiased fashion.
Collapse
|
138
|
Morrison JL, Breitling R, Higham DJ, Gilbert DR. A lock-and-key model for protein-protein interactions. Bioinformatics 2006; 22:2012-9. [PMID: 16787977 DOI: 10.1093/bioinformatics/btl338] [Citation(s) in RCA: 37] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION Protein-protein interaction networks are one of the major post-genomic data sources available to molecular biologists. They provide a comprehensive view of the global interaction structure of an organism's proteome, as well as detailed information on specific interactions. Here we suggest a physical model of protein interactions that can be used to extract additional information at an intermediate level: It enables us to identify proteins which share biological interaction motifs, and also to identify potentially missing or spurious interactions. RESULTS Our new graph model explains observed interactions between proteins by an underlying interaction of complementary binding domains (lock-and-key model). This leads to a novel graph-theoretical algorithm to identify bipartite subgraphs within protein-protein interaction networks where the underlying data are taken from yeast two-hybrid experimental results. By testing on synthetic data, we demonstrate that under certain modelling assumptions, the algorithm will return correct domain information about each protein in the network. Tests on data from various model organisms show that the local and global patterns predicted by the model are indeed found in experimental data. Using functional and protein structure annotations, we show that bipartite subnetworks can be identified that correspond to biologically relevant interaction motifs. Some of these are novel and we discuss an example involving SH3 domains from the Saccharomyces cerevisiae interactome. AVAILABILITY The algorithm (in Matlab format) is available (see http://www.maths.strath.ac.uk/~aas96106/lock_key.html).
Collapse
|
139
|
Keller B, Ohnesorg T, Mindnich R, Gloeckner CJ, Breitling R, Scharfe M, Moeller G, Blöcker H, Adamski J. Interspecies comparison of gene structure and computational analysis of gene regulation of 17beta-hydroxysteroid dehydrogenase type 1. Mol Cell Endocrinol 2006; 248:168-71. [PMID: 16337734 DOI: 10.1016/j.mce.2005.10.014] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
Abstract
17Beta-hydroxysteroid dehydrogenase type 1 (HSD17B1) is a key enzyme of 17beta-estradiol biosynthesis, and in rodents is additionally involved in testosterone biosynthesis. The human HSD17B1 gene, located on chromosome 17q12-21, is duplicated in tandem, with the 3'-copy being the functional gene. Here we show by sequencing the gene from a diverse set of related species that this duplication is of very recent evolutionary origin, having occurred in the common ancestor of Hominoidae (apes and humans) while being absent in the closely related Old World monkeys (Macaca) and the outgroup species Tupaia belangeri and Mus musculus. By computational analysis of the conserved regulatory elements in the 5'-untranslated (5'-UTR) and putative promoter region of the HSD17B1 gene and, where present, pseudogene, across our broad sample of species we can show significant differences that might point to the origin of the divergent substrate specificity of human and rodent HSD17B1 and highlight potential functionally relevant differences in regulatory patterns in different evolutionary lineages.
Collapse
|
140
|
Hoeller D, Crosetto N, Blagoev B, Raiborg C, Tikkanen R, Wagner S, Kowanetz K, Breitling R, Mann M, Stenmark H, Dikic I. Regulation of ubiquitin-binding proteins by monoubiquitination. Nat Cell Biol 2006; 8:163-9. [PMID: 16429130 DOI: 10.1038/ncb1354] [Citation(s) in RCA: 261] [Impact Index Per Article: 14.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2005] [Accepted: 12/19/2005] [Indexed: 11/09/2022]
Abstract
Proteins containing ubiquitin-binding domains (UBDs) interact with ubiquitinated targets and regulate diverse biological processes, including endocytosis, signal transduction, transcription and DNA repair. Many of the UBD-containing proteins are also themselves monoubiquitinated, but the functional role and the mechanisms that underlie this modification are less well understood. Here, we demonstrate that monoubiquitination of the endocytic proteins Sts1, Sts2, Eps15 and Hrs results in intramolecular interactions between ubiquitin and their UBDs, thereby preventing them from binding in trans to ubiquitinated targets. Permanent monoubiquitination of these proteins, mimicked by the fusion of ubiquitin to their carboxyl termini, impairs their ability to regulate trafficking of ubiquitinated receptors. Moreover, we mapped the in vivo monoubiquitination site in Sts2 and demonstrated that its mutation enhances the Sts2-mediated effects of epidermal-growth-factor-receptor downregulation. We propose that monoubiquitination of ubiquitin-binding proteins inhibits their capacity to bind to and control the functions of ubiquitinated targets in vivo.
Collapse
|
141
|
Breitling R, Herzyk P. Rank-based methods as a non-parametric alternative of the T-statistic for the analysis of biological microarray data. J Bioinform Comput Biol 2006; 3:1171-89. [PMID: 16278953 DOI: 10.1142/s0219720005001442] [Citation(s) in RCA: 113] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2005] [Revised: 05/11/2005] [Accepted: 05/13/2005] [Indexed: 11/18/2022]
Abstract
We have recently introduced a rank-based test statistic, RankProducts (RP), as a new non-parametric method for detecting differentially expressed genes in microarray experiments. It has been shown to generate surprisingly good results with biological datasets. The basis for this performance and the limits of the method are, however, little understood. Here we explore the performance of such rank-based approaches under a variety of conditions using simulated microarray data, and compare it with classical Wilcoxon rank sums and t-statistics, which form the basis of most alternative differential gene expression detection techniques. We show that for realistic simulated microarray datasets, RP is more powerful and accurate for sorting genes by differential expression than t-statistics or Wilcoxon rank sums - in particular for replicate numbers below 10, which are most commonly used in biological experiments. Its relative performance is particularly strong when the data are contaminated by non-normal random noise or when the samples are very inhomogenous, e.g. because they come from different time points or contain a mixture of affected and unaffected cells. However, RP assumes equal measurement variance for all genes and tends to give overly optimistic p-values when this assumption is violated. It is therefore essential that proper variance stabilizing normalization is performed on the data before calculating the RP values. Where this is impossible, another rank-based variant of RP (average ranks) provides a useful alternative with very similar overall performance. The Perl scripts implementing the simulation and evaluation are available upon request. Implementations of the RP method are available for download from the authors website (http://www.brc.dcs.gla.ac.uk/glama).
Collapse
|
142
|
Breitling R, Ritchie S, Goodenowe D, Stewart ML, Barrett MP. Ab initio prediction of metabolic networks using Fourier transform mass spectrometry data. Metabolomics 2006; 2:155-164. [PMID: 24489532 PMCID: PMC3906711 DOI: 10.1007/s11306-006-0029-z] [Citation(s) in RCA: 79] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/01/2006] [Accepted: 05/19/2006] [Indexed: 01/31/2023]
Abstract
Fourier transform mass spectrometry has recently been introduced into the field of metabolomics as a technique that enables the mass separation of complex mixtures at very high resolution and with ultra high mass accuracy. Here we show that this enhanced mass accuracy can be exploited to predict large metabolic networks ab initio, based only on the observed metabolites without recourse to predictions based on the literature. The resulting networks are highly information-rich and clearly non-random. They can be used to infer the chemical identity of metabolites and to obtain a global picture of the structure of cellular metabolic networks. This represents the first reconstruction of metabolic networks based on unbiased metabolomic data and offers a breakthrough in the systems-wide analysis of cellular metabolism.
Collapse
|
143
|
Al-Shahib A, Breitling R, Gilbert D. FrankSum: new feature selection method for protein function prediction. Int J Neural Syst 2005; 15:259-75. [PMID: 16187402 DOI: 10.1142/s0129065705000281] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
In the study of in silico functional genomics, improving the performance of protein function prediction is the ultimate goal for identifying proteins associated with defined cellular functions. The classical prediction approach is to employ pairwise sequence alignments. However this method often faces difficulties when no statistically significant homologous sequences are identified. An alternative way is to predict protein function from sequence-derived features using machine learning. In this case the choice of possible features which can be derived from the sequence is of vital importance to ensure adequate discrimination to predict function. In this paper we have successfully selected biologically significant features for protein function prediction. This was performed using a new feature selection method (FrankSum) that avoids data distribution assumptions, uses a data independent measurement (p-value) within the feature, identifies redundancy between features and uses an appropriate ranking criterion for feature selection. We have shown that classifiers generated from features selected by FrankSum outperforms classifiers generated from full feature sets, randomly selected features and features selected from the Wrapper method. We have also shown the features are concordant across all species and top ranking features are biologically informative. We conclude that feature selection is vital for successful protein function prediction and FrankSum is one of the feature selection methods that can be applied successfully to such a domain.
Collapse
|
144
|
Breitling R, Herzyk P. Biological master games: using biologists' reasoning to guide algorithm development for integrated functional genomics. OMICS-A JOURNAL OF INTEGRATIVE BIOLOGY 2005; 9:225-32. [PMID: 16209637 DOI: 10.1089/omi.2005.9.225] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
We review some powerful new algorithms that build on the intuitive biological interpretation techniques for statistical analysis of functional genomics experiments. Although they were originally designed for transcriptomics, we argue that these algorithms are applicable to any type of -omics study (transcriptomics, proteomics, metabolomics). Rank Products (RP), a strictly non-parametric test statistic to detect differentially regulated elements (genes, proteins, metabolites) in genome-wide screens. RP is particularly powerful for noisy data and low numbers of replicates and makes full use of the availability of a large number of parallel measurements that is typical of modern large-scale experiments. Iterative Group Analysis (iGA), a statistical method that makes the transition from regulated single elements to significant classes of elements, and thus provides an automatic functional annotation of an experiment. Graph-based iGA (GiGA), an extension of iGA that combines experimental data with a broad variety of biological annotations to highlight physiologically relevant regions in a given "evidence graph" (e.g., metabolic networks, signaling pathway diagrams, protein interaction maps). The sequential application of these techniques yields an increasingly abstract interpretation of experimental data that is at the same time quantitative, statistically rigorous, and biologically significant. The results can be used either as helpful tools to guide data visualization and exploration, or as the input for downstream computational applications in a systems biology framework.
Collapse
|
145
|
Breitling R, Hoeller D. Current challenges in quantitative modeling of epidermal growth factor signaling. FEBS Lett 2005; 579:6289-94. [PMID: 16288752 DOI: 10.1016/j.febslet.2005.10.034] [Citation(s) in RCA: 18] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2005] [Revised: 10/18/2005] [Accepted: 10/18/2005] [Indexed: 10/25/2022]
Abstract
Over the last decade, epidermal growth factor (EGF) signaling has been used repeatedly as a test-bed for pioneering computational systems biology. Recent breakthroughs in our molecular understanding of EGF signaling pose new challenges for mathematical modeling strategies. Three key areas emerge as particularly relevant: the pervasive importance of compartmentalization and endosomal trafficking; the complexity of signalosome complexes; and the regulatory influence of diffusion and spatiality. Each one of them demands a drastic change in current computational approaches. We discuss recent developments in the field that address these emerging aspects in a new generation of more realistic - and potential more useful - models of EGF signaling.
Collapse
|
146
|
Morrison JL, Breitling R, Higham DJ, Gilbert DR. GeneRank: using search engine technology for the analysis of microarray experiments. BMC Bioinformatics 2005; 6:233. [PMID: 16176585 PMCID: PMC1261158 DOI: 10.1186/1471-2105-6-233] [Citation(s) in RCA: 186] [Impact Index Per Article: 9.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2005] [Accepted: 09/21/2005] [Indexed: 12/16/2022] Open
Abstract
Background Interpretation of simple microarray experiments is usually based on the fold-change of gene expression between a reference and a "treated" sample where the treatment can be of many types from drug exposure to genetic variation. Interpretation of the results usually combines lists of differentially expressed genes with previous knowledge about their biological function. Here we evaluate a method – based on the PageRank algorithm employed by the popular search engine Google – that tries to automate some of this procedure to generate prioritized gene lists by exploiting biological background information. Results GeneRank is an intuitive modification of PageRank that maintains many of its mathematical properties. It combines gene expression information with a network structure derived from gene annotations (gene ontologies) or expression profile correlations. Using both simulated and real data we find that the algorithm offers an improved ranking of genes compared to pure expression change rankings. Conclusion Our modification of the PageRank algorithm provides an alternative method of evaluating microarray experimental results which combines prior knowledge about the underlying network. GeneRank offers an improvement compared to assessing the importance of a gene based on its experimentally observed fold-change alone and may be used as a basis for further analytical developments.
Collapse
|
147
|
Breitling R, Armengaud P, Amtmann A. Vector analysis as a fast and easy method to compare gene expression responses between different experimental backgrounds. BMC Bioinformatics 2005; 6:181. [PMID: 16029491 PMCID: PMC1190156 DOI: 10.1186/1471-2105-6-181] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2005] [Accepted: 07/19/2005] [Indexed: 12/04/2022] Open
Abstract
Background Gene expression studies increasingly compare expression responses between different experimental backgrounds (genetic, physiological, or phylogenetic). By focusing on dynamic responses rather than a direct comparison of static expression levels, this type of study allows a finer dissection of primary and secondary regulatory effects in the various backgrounds. Usually, results of such experiments are presented in the form of Venn diagrams, which are intuitive and visually appealing, but lack a statistical foundation. Results Here we introduce Vector Analysis (VA) as a simple, yet principled, approach to comparing expression responses in different experimental backgrounds. VA enables the automatic assignment of genes to response prototypes and provides statistical significance estimates to eliminate spurious response patterns. The application of VA to a real dataset, comparing nutrient starvation responses in wild type and mutant Arabidopsis plants, reveals that consistent patterns of expression behavior are present in the data and are reliably detected by the algorithm. Conclusion Vector analysis is a flexible, easy-to-use technique to compare gene expression patterns in different experimental backgrounds. It compares favorably with the classical Venn diagram approach and can be implemented manually using spreadsheets, such as Excel, or automatically by using the supplied software.
Collapse
|
148
|
Rogers S, Girolami M, Campbell C, Breitling R. The latent process decomposition of cDNA microarray data sets. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2005; 2:143-56. [PMID: 17044179 DOI: 10.1109/tcbb.2005.29] [Citation(s) in RCA: 32] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/12/2023]
Abstract
We present a new computational technique (a software implementation, data sets, and supplementary information are available at http://www.enm.bris.ac.uk/lpd/) which enables the probabilistic analysis of cDNA microarray data and we demonstrate its effectiveness in identifying features of biomedical importance. A hierarchical Bayesian model, called Latent Process Decomposition (LPD), is introduced in which each sample in the data set is represented as a combinatorial mixture over a finite set of latent processes, which are expected to correspond to biological processes. Parameters in the model are estimated using efficient variational methods. This type of probabilistic model is most appropriate for the interpretation of measurement data generated by cDNA microarray technology. For determining informative substructure in such data sets, the proposed model has several important advantages over the standard use of dendrograms. First, the ability to objectively assess the optimal number of sample clusters. Second, the ability to represent samples and gene expression levels using a common set of latent variables (dendrograms cluster samples and gene expression values separately which amounts to two distinct reduced space representations). Third, in constrast to standard cluster models, observations are not assigned to a single cluster and, thus, for example, gene expression levels are modeled via combinations of the latent processes identified by the algorithm. We show this new method compares favorably with alternative cluster analysis methods. To illustrate its potential, we apply the proposed technique to several microarray data sets for cancer. For these data sets it successfully decomposes the data into known subtypes and indicates possible further taxonomic subdivision in addition to highlighting, in a wholly unsupervised manner, the importance of certain genes which are known to be medically significant. To illustrate its wider applicability, we also illustrate its performance on a microarray data set for yeast.
Collapse
|
149
|
Deluca D, Krazeisen A, Breitling R, Prehn C, Möller G, Adamski J. Inhibition of 17beta-hydroxysteroid dehydrogenases by phytoestrogens: comparison with other steroid metabolizing enzymes. J Steroid Biochem Mol Biol 2005; 93:285-92. [PMID: 15860272 DOI: 10.1016/j.jsbmb.2004.12.035] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
Effects of phytoestrogens on human health have been reported for decades. These include not only beneficial action in cancer prevention but also endocrine disruption in males. Since then many molecular mechanisms underlying these effects have been identified. Targets of phytoestrogens comprise steroid receptors, steroid metabolising enzymes, elements of signal transduction and apoptosis pathways, and even the DNA processing machinery. Understanding the specific versus pleiotropic effects of selected phytoestrogens will be crucial for their biomedical application. This review will concentrate on the influence of phytoestrogens on 17beta-hydroxysteroid dehydrogenases from a comparative perspective with other steroid metabolizing enzymes.
Collapse
|
150
|
Breitling R, Armengaud P, Amtmann A, Herzyk P. Rank products: a simple, yet powerful, new method to detect differentially regulated genes in replicated microarray experiments. FEBS Lett 2004; 573:83-92. [PMID: 15327980 DOI: 10.1016/j.febslet.2004.07.055] [Citation(s) in RCA: 1110] [Impact Index Per Article: 55.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2004] [Revised: 07/20/2004] [Accepted: 07/22/2004] [Indexed: 10/26/2022]
Abstract
One of the main objectives in the analysis of microarray experiments is the identification of genes that are differentially expressed under two experimental conditions. This task is complicated by the noisiness of the data and the large number of genes that are examined simultaneously. Here, we present a novel technique for identifying differentially expressed genes that does not originate from a sophisticated statistical model but rather from an analysis of biological reasoning. The new technique, which is based on calculating rank products (RP) from replicate experiments, is fast and simple. At the same time, it provides a straightforward and statistically stringent way to determine the significance level for each gene and allows for the flexible control of the false-detection rate and familywise error rate in the multiple testing situation of a microarray experiment. We use the RP technique on three biological data sets and show that in each case it performs more reliably and consistently than the non-parametric t-test variant implemented in Tusher et al.'s significance analysis of microarrays (SAM). We also show that the RP results are reliable in highly noisy data. An analysis of the physiological function of the identified genes indicates that the RP approach is powerful for identifying biologically relevant expression changes. In addition, using RP can lead to a sharp reduction in the number of replicate experiments needed to obtain reproducible results.
Collapse
|