1
|
Kaderbhai NN, Broadhurst DI, Ellis DI, Goodacre R, Kell DB. Functional genomics via metabolic footprinting: monitoring metabolite secretion by Escherichia coli tryptophan metabolism mutants using FT-IR and direct injection electrospray mass spectrometry. Comp Funct Genomics 2010; 4:376-91. [PMID: 18629082 PMCID: PMC2447367 DOI: 10.1002/cfg.302] [Citation(s) in RCA: 101] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2003] [Revised: 04/23/2003] [Accepted: 05/22/2003] [Indexed: 12/14/2022] Open
Abstract
We sought to test the hypothesis that mutant bacterial strains could be discriminated from each other on the basis of the metabolites they secrete into the medium (their
‘metabolic footprint’), using two methods of ‘global’ metabolite analysis (FT–IR and
direct injection electrospray mass spectrometry). The biological system used was
based on a published study of Escherichia coli tryptophan mutants that had been
analysed and discriminated by Yanofsky and colleagues using transcriptome analysis.
Wild-type strains supplemented with tryptophan or analogues could be discriminated
from controls using FT–IR of 24 h broths, as could each of the mutant strains in both
minimal and supplemented media. Direct injection electrospray mass spectrometry
with unit mass resolution could also be used to discriminate the strains from each
other, and had the advantage that the discrimination required the use of just two
or three masses in each case. These were determined via a genetic algorithm. Both
methods are rapid, reagentless, reproducible and cheap, and might beneficially be
extended to the analysis of gene knockout libraries.
Collapse
Affiliation(s)
- Naheed N Kaderbhai
- Institute of Biological Sciences, University of Wales, Aberystwyth, Wales Ceredigion SY23 3DD, UK
| | | | | | | | | |
Collapse
|
2
|
Blom EJ, Breitling R, Hofstede KJ, Roerdink JBTM, van Hijum SAFT, Kuipers OP. Prosecutor: parameter-free inference of gene function for prokaryotes using DNA microarray data, genomic context and multiple gene annotation sources. BMC Genomics 2008; 9:495. [PMID: 18939968 PMCID: PMC2585105 DOI: 10.1186/1471-2164-9-495] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2008] [Accepted: 10/21/2008] [Indexed: 01/23/2023] Open
Abstract
Background Despite a plethora of functional genomic efforts, the function of many genes in sequenced genomes remains unknown. The increasing amount of microarray data for many species allows employing the guilt-by-association principle to predict function on a large scale: genes exhibiting similar expression patterns are more likely to participate in shared biological processes. Results We developed Prosecutor, an application that enables researchers to rapidly infer gene function based on available gene expression data and functional annotations. Our parameter-free functional prediction method uses a sensitive algorithm to achieve a high association rate of linking genes with unknown function to annotated genes. Furthermore, Prosecutor utilizes additional biological information such as genomic context and known regulatory mechanisms that are specific for prokaryotes. We analyzed publicly available transcriptome data sets and used literature sources to validate putative functions suggested by Prosecutor. We supply the complete results of our analysis for 11 prokaryotic organisms on a dedicated website. Conclusion The Prosecutor software and supplementary datasets available at allow researchers working on any of the analyzed organisms to quickly identify the putative functions of their genes of interest. A de novo analysis allows new organisms to be studied.
Collapse
Affiliation(s)
- Evert Jan Blom
- Molecular Genetics, Groningen Biomolecular Sciences and Biotechnology Institute, University of Groningen, the Netherlands.
| | | | | | | | | | | |
Collapse
|
3
|
Kell DB. Theodor Bücher Lecture. Metabolomics, modelling and machine learning in systems biology - towards an understanding of the languages of cells. Delivered on 3 July 2005 at the 30th FEBS Congress and the 9th IUBMB conference in Budapest. FEBS J 2006; 273:873-94. [PMID: 16478464 DOI: 10.1111/j.1742-4658.2006.05136.x] [Citation(s) in RCA: 130] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
The newly emerging field of systems biology involves a judicious interplay between high-throughput 'wet' experimentation, computational modelling and technology development, coupled to the world of ideas and theory. This interplay involves iterative cycles, such that systems biology is not at all confined to hypothesis-dependent studies, with intelligent, principled, hypothesis-generating studies being of high importance and consequently very far from aimless fishing expeditions. I seek to illustrate each of these facets. Novel technology development in metabolomics can increase substantially the dynamic range and number of metabolites that one can detect, and these can be exploited as disease markers and in the consequent and principled generation of hypotheses that are consistent with the data and achieve this in a value-free manner. Much of classical biochemistry and signalling pathway analysis has concentrated on the analyses of changes in the concentrations of intermediates, with 'local' equations - such as that of Michaelis and Menten v=(Vmax x S)/(S+K m) - that describe individual steps being based solely on the instantaneous values of these concentrations. Recent work using single cells (that are not subject to the intellectually unsupportable averaging of the variable displayed by heterogeneous cells possessing nonlinear kinetics) has led to the recognition that some protein signalling pathways may encode their signals not (just) as concentrations (AM or amplitude-modulated in a radio analogy) but via changes in the dynamics of those concentrations (the signals are FM or frequency-modulated). This contributes in principle to a straightforward solution of the crosstalk problem, leads to a profound reassessment of how to understand the downstream effects of dynamic changes in the concentrations of elements in these pathways, and stresses the role of signal processing (and not merely the intermediates) in biological signalling. It is this signal processing that lies at the heart of understanding the languages of cells. The resolution of many of the modern and postgenomic problems of biochemistry requires the development of a myriad of new technologies (and maybe a new culture), and thus regular input from the physical sciences, engineering, mathematics and computer science. One solution, that we are adopting in the Manchester Interdisciplinary Biocentre (http://www.mib.ac.uk/) and the Manchester Centre for Integrative Systems Biology (http://www.mcisb.org/), is thus to colocate individuals with the necessary combinations of skills. Novel disciplines that require such an integrative approach continue to emerge. These include fields such as chemical genomics, synthetic biology, distributed computational environments for biological data and modelling, single cell diagnostics/bionanotechnology, and computational linguistics/text mining.
Collapse
Affiliation(s)
- Douglas B Kell
- School of Chemistry, Faraday Building, The University of Manchester, UK.
| |
Collapse
|
4
|
Affiliation(s)
- Roger Brent
- Molecular Sciences Institute, Berkeley, CA 94704, USA.
| | | |
Collapse
|
5
|
Kell DB, Oliver SG. Here is the evidence, now what is the hypothesis? The complementary roles of inductive and hypothesis-driven science in the post-genomic era. Bioessays 2004; 26:99-105. [PMID: 14696046 DOI: 10.1002/bies.10385] [Citation(s) in RCA: 279] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
It is considered in some quarters that hypothesis-driven methods are the only valuable, reliable or significant means of scientific advance. Data-driven or 'inductive' advances in scientific knowledge are then seen as marginal, irrelevant, insecure or wrong-headed, while the development of technology--which is not of itself 'hypothesis-led' (beyond the recognition that such tools might be of value)--must be seen as equally irrelevant to the hypothetico-deductive scientific agenda. We argue here that data- and technology-driven programmes are not alternatives to hypothesis-led studies in scientific knowledge discovery but are complementary and iterative partners with them. Many fields are data-rich but hypothesis-poor. Here, computational methods of data analysis, which may be automated, provide the means of generating novel hypotheses, especially in the post-genomic era.
Collapse
|
6
|
Kell DB. Metabolomics and machine learning: explanatory analysis of complex metabolome data using genetic programming to produce simple, robust rules. Mol Biol Rep 2003; 29:237-41. [PMID: 12241064 DOI: 10.1023/a:1020342216314] [Citation(s) in RCA: 47] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]
Affiliation(s)
- Douglas B Kell
- Institute of Biological Sciences, University of Wales, Aberystwyth, UK.
| |
Collapse
|
7
|
Massoud TF, Gambhir SS. Molecular imaging in living subjects: seeing fundamental biological processes in a new light. Genes Dev 2003; 17:545-80. [PMID: 12629038 DOI: 10.1101/gad.1047403] [Citation(s) in RCA: 1414] [Impact Index Per Article: 67.3] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Affiliation(s)
- Tarik F Massoud
- The Crump Institute for Molecular Imaging, David Geffen School of Medicine at University of California at Los Angeles, Los Angeles, California 90095, USA
| | | |
Collapse
|
8
|
Abstract
The effects of genes on phenotype are mediated by processes that are typically unknown but whose determination is desirable. The conversion from gene to phenotype is not a simple function of individual genes, but involves the complex interactions of many genes; it is what is known as a nonlinear mapping problem. A computational method called genetic programming allows the representation of candidate nonlinear mappings in several possible trees. To find the best model, the trees are 'evolved' by processes akin to mutation and recombination, and the trees that more closely represent the actual data are preferentially selected. The result is an improved tree of rules that represent the nonlinear mapping directly. In this way, the encoding of cellular and higher-order activities by genes is seen as directly analogous to computer programs. This analogy is of utility in biological genetics and in problems of genotype-phenotype mapping.
Collapse
|
9
|
Xu H, el-Gewely MR. P53-responsive genes and the potential for cancer diagnostics and therapeutics development. BIOTECHNOLOGY ANNUAL REVIEW 2002; 7:131-64. [PMID: 11686042 DOI: 10.1016/s1387-2656(01)07035-1] [Citation(s) in RCA: 30] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/15/2023]
Abstract
P53 protein regulates cell responses to DNA damage to keep genomic stability by transactivation and trans-repression of its downstream target genes. P53 protein also has activators, inactivators, or co-factors via interaction with other proteins. Both the p53-regulated genes and interacted proteins form a huge network. As tumors usually escape from proliferating controls by means of accumulation of genetic alterations, p53 is one of the most important tumor suppressor genes that can be targeted for diagnosis, prognosis, and therapeutic intervention. Reviewing the p53-network is of great importance. In this review, we are focusing on cancer-related p53 downstream-regulated genes. Various methods dealing with the discovery of p53-regulated genes by the detection of gene expression have been applied. Recently high throughput functional genomics methods, such as DNA microarray, serial analysis of gene expression (SAGE), differential display, and protein two-dimensional gel electrophoresis, have provided a wealth of information on the dynamics of cell context responses. Hundreds of genes have been discovered whose transcriptions are regulated by p53 protein. They were grouped, based on their functions, into sub-classes including cell-cycle regulation, DNA repair, angiogenesis, metastasis, and multidrug resistance. P53 plays a pivotal role in keeping genomic stability and tumor suppression. The deeper we investigate the cell responses as mediated by p53, the more complex p53-network becomes. However, understanding p53-network, offers great opportunities to develop more sensitive and accurate diagnostic/prognostic tools, as well as more efficient therapies for cancer.
Collapse
Affiliation(s)
- H Xu
- Department of Biotechnology, Institute of Medical Biology, University of Tromsø, 9037 Tromsø, Norway
| | | |
Collapse
|
10
|
Kell DB, Darby RM, Draper J. Genomic computing. Explanatory analysis of plant expression profiling data using machine learning. PLANT PHYSIOLOGY 2001; 126:943-951. [PMID: 11457944 PMCID: PMC1540126 DOI: 10.1104/pp.126.3.943] [Citation(s) in RCA: 44] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/23/2023]
Affiliation(s)
- D B Kell
- of Biological Sciences, University of Wales, Aberystwyth SY23 3DD, United Kingdom
| | | | | |
Collapse
|
11
|
Lucchini S, Thompson A, Hinton JCD. Microarrays for microbiologists. MICROBIOLOGY (READING, ENGLAND) 2001; 147:1403-1414. [PMID: 11390672 DOI: 10.1099/00221287-147-6-1403] [Citation(s) in RCA: 89] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/26/2022]
Affiliation(s)
- S Lucchini
- Molecular Microbiology, Institute of Food Research, Norwich Research Park, Norwich NR4 7UA, UK1
| | - A Thompson
- Molecular Microbiology, Institute of Food Research, Norwich Research Park, Norwich NR4 7UA, UK1
| | - J C D Hinton
- Molecular Microbiology, Institute of Food Research, Norwich Research Park, Norwich NR4 7UA, UK1
| |
Collapse
|
12
|
Affiliation(s)
- K M Weiss
- Departments of Anthropology and Biology, Penn State University, University Park, Pennsylvania, USA.
| | | |
Collapse
|
13
|
Albelda SM, Sheppard D. Functional genomics and expression profiling: be there or be square. Am J Respir Cell Mol Biol 2000; 23:265-9. [PMID: 10970813 DOI: 10.1165/ajrcmb.23.3.f196] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open
Affiliation(s)
- S M Albelda
- Pulmonary, Allergy, and Critical Care Division, Department of Medicine, University of Pennsylvania Medical Center, Philadelphia, Pennsylvania, USA.
| | | |
Collapse
|
14
|
Kell DB, King RD. On the optimization of classes for the assignment of unidentified reading frames in functional genomics programmes: the need for machine learning. Trends Biotechnol 2000; 18:93-8. [PMID: 10675895 DOI: 10.1016/s0167-7799(99)01407-9] [Citation(s) in RCA: 62] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
At present, the assignment of function to novel genes uncovered by the systematic genome-sequencing programmes is a problem. Many studies anticipate that this can be achieved by analysing patterns of gene expression via the transcriptome, proteome and metabolome. Thus, functional genomics is, in part, an exercise in pattern classification. Because many genes have known functional classes, the problem of predicting their functional class is a supervised learning problem. However, most pattern classification methods that have been applied to the problem have been unsupervised clustering methods. Consequently, the best classification tools have not always been used. Furthermore, the present functional classes are suboptimal and new unsupervised clustering methods are needed to improve them. Better-structured functional classes will facilitate the prediction of biochemically testable functions.
Collapse
Affiliation(s)
- D B Kell
- Institute of Biological Sciences, University of Wales, Aberystwyth, UK SY23 3DD.
| | | |
Collapse
|
15
|
Abstract
The advent of rapid DNA sequencing technologies is generating vast quantities of raw genomic information ranging from in-depth analysis of the expressed genes to complete sequencing of genomes at an increasing rate (bioinformatics). However, it is the functional characterisation of a specific gene product that is the key limiting factor for validation as targets for high throughput assay development. The challenge is to obtain the raw genomic information from parasites of economic importance and to effectively integrate broad technologies such as gene disruption and over-expression, DNA arrays, proteomics, antisense RNAs, with bioinformatics in a timely fashion to identify relevant biological targets. Screening of validated targets in a strategy that includes large numbers of chemistries with high diversity and predictive in vitro and in vivo assays should permit the successful identification of novel chemical entities with high specificity to the target parasite. It is proposed that this rational approach will permit the identification of new antiparasitic therapies able to surpass the current toxicological, environmental, and economic challenges of the marketplace.
Collapse
Affiliation(s)
- J A Gutierrez
- Elanco Animal Health. A division of Eli Lilly and Company, P.O. Box 708, Greenfield, IN 46140, USA.
| |
Collapse
|
16
|
King RD, Karwath A, Clare A, Dehaspe L. Accurate prediction of protein functional class from sequence in the Mycobacterium tuberculosis and Escherichia coli genomes using data mining. Yeast 2000; 17:283-93. [PMID: 11119305 PMCID: PMC2448385 DOI: 10.1002/1097-0061(200012)17:4<283::aid-yea52>3.0.co;2-f] [Citation(s) in RCA: 31] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022] Open
Abstract
The analysis of genomics data needs to become as automated as its generation. Here we present a novel data-mining approach to predicting protein functional class from sequence. This method is based on a combination of inductive logic programming clustering and rule learning. We demonstrate the effectiveness of this approach on the M. tuberculosis and E. coli genomes, and identify biologically interpretable rules which predict protein functional class from information only available from the sequence. These rules predict 65% of the ORFs with no assigned function in M. tuberculosis and 24% of those in E. coli, with an estimated accuracy of 60-80% (depending on the level of functional assignment). The rules are founded on a combination of detection of remote homology, convergent evolution and horizontal gene transfer. We identify rules that predict protein functional class even in the absence of detectable sequence or structural homology. These rules give insight into the evolutionary history of M. tuberculosis and E. coli.
Collapse
Affiliation(s)
- R D King
- Department of Computer Science, University of Wales, Aberystwyth, Penglais, Aberystwyth, Ceredigion SY23 3DB, UK
| | | | | | | |
Collapse
|