1
|
Min L, Nie M, Zhang A, Wen J, Noel SD, Lee V, Carroll RS, Kaiser UB. Computational Analysis of Missense Variants of G Protein-Coupled Receptors Involved in the Neuroendocrine Regulation of Reproduction. Neuroendocrinology 2016; 103:230-9. [PMID: 26088945 PMCID: PMC4684493 DOI: 10.1159/000435884] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/01/2015] [Accepted: 06/10/2015] [Indexed: 01/13/2023]
Abstract
INTRODUCTION Many missense variants in G protein-coupled receptors (GPCRs) involved in the neuroendocrine regulation of reproduction have been identified by phenotype-driven or large-scale exome sequencing. Computational functional prediction analysis is commonly performed to evaluate their impact on receptor function. METHODS To assess the performance and outcome of functional prediction analyses for these GPCRs, we performed a statistical analysis of the prediction performance of SIFT and PolyPhen-2 for variants with documented biological function as well as variants retrieved from Ensembl. We obtained missense variants with documented biological function testing from patients with reproductive disorders from a comprehensive literature search. Missense variants from individuals with known reproductive disorders were retrieved from the Human Gene Mutation Database. Missense variants from the general population were retrieved from the Ensembl genome database. RESULTS The accuracies of SIFT and PolyPhen-2 were 83 and 85%, respectively. The performance of both prediction tools was greater in predicting loss-of-function variants (SIFT: 92%; PolyPhen-2: 95%) than in predicting variants that did not affect function (SIFT: 54%; PolyPhen-2: 57%). Concordance between SIFT and PolyPhen-2 did not improve accuracy. Surprisingly, approximately half of the variants retrieved from Ensembl were predicted as loss-of-function variants by SIFT (47%) and PolyPhen-2 (54%). CONCLUSION Our findings provide new guidance for interpreting the results and limitations of computational functional prediction analyses for GPCRs and will help to determine which variants require biological function testing. In addition, our findings raise important questions regarding the link between genotype and phenotype in the general population.
Collapse
Affiliation(s)
- Le Min
- Division of Endocrinology, Diabetes and Hypertension, Brigham and Women’s Hospital, Harvard Medical School, 221 Longwood Avenue, Boston, MA, 02115 USA
- To whom correspondence and reprint requests should be addressed: Le Min, M.D., Ph.D., Division of Endocrinology, Diabetes and Hypertension, Brigham and Women’s Hospital, 221 Longwood Avenue, Boston, Massachusetts 02115.
| | - Min Nie
- Division of Endocrinology, Diabetes and Hypertension, Brigham and Women’s Hospital, Harvard Medical School, 221 Longwood Avenue, Boston, MA, 02115 USA
| | - Anna Zhang
- Division of Endocrinology, Diabetes and Hypertension, Brigham and Women’s Hospital, Harvard Medical School, 221 Longwood Avenue, Boston, MA, 02115 USA
| | - Junping Wen
- Division of Endocrinology, Diabetes and Hypertension, Brigham and Women’s Hospital, Harvard Medical School, 221 Longwood Avenue, Boston, MA, 02115 USA
| | - Sekoni D. Noel
- Division of Endocrinology, Diabetes and Hypertension, Brigham and Women’s Hospital, Harvard Medical School, 221 Longwood Avenue, Boston, MA, 02115 USA
| | - Vivian Lee
- Division of Endocrinology, Diabetes and Hypertension, Brigham and Women’s Hospital, Harvard Medical School, 221 Longwood Avenue, Boston, MA, 02115 USA
| | - Rona S. Carroll
- Division of Endocrinology, Diabetes and Hypertension, Brigham and Women’s Hospital, Harvard Medical School, 221 Longwood Avenue, Boston, MA, 02115 USA
| | - Ursula B. Kaiser
- Division of Endocrinology, Diabetes and Hypertension, Brigham and Women’s Hospital, Harvard Medical School, 221 Longwood Avenue, Boston, MA, 02115 USA
| |
Collapse
|
2
|
JIANG LH, LI YX, LIU Q. Reconstruction of Gene Regulatory Networks by Integrating ChIP-chip, Knock out and Expression Data*. PROG BIOCHEM BIOPHYS 2010. [DOI: 10.3724/sp.j.1206.2010.00184] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
3
|
Jiang Y, Cukic B, Adjeroh DA, Skinner HD, Lin J, Shen QJ, Jiang BH. An algorithm for identifying novel targets of transcription factor families: application to hypoxia-inducible factor 1 targets. Cancer Inform 2009; 7:75-89. [PMID: 19352460 PMCID: PMC2664698 DOI: 10.4137/cin.s1054] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022] Open
Abstract
Efficient and effective analysis of the growing genomic databases requires the development of adequate computational tools. We introduce a fast method based on the suffix tree data structure for predicting novel targets of hypoxia-inducible factor 1 (HIF-1) from huge genome databases. The suffix tree data structure has two powerful applications here: one is to extract unknown patterns from multiple strings/sequences in linear time; the other is to search multiple strings/sequences using multiple patterns in linear time. Using 15 known HIF-1 target gene sequences as a training set, we extracted 105 common patterns that all occur in the 15 training genes using suffix trees. Using these 105 common patterns along with known subsequences surrounding HIF-1 binding sites from the literature, the algorithm searches a genome database that contains 2,078,786 DNA sequences. It reported 258 potentially novel HIF-1 targets including 25 known HIF-1 targets. Based on microarray studies from the literature, 17 putative genes were confirmed to be upregulated by HIF-1 or hypoxia inside these 258 genes. We further studied one of the potential targets, COX-2, in the biological lab; and showed that it was a biologically relevant HIF-1 target. These results demonstrate that our methodology is an effective computational approach for identifying novel HIF-1 targets.
Collapse
Affiliation(s)
- Yue Jiang
- Lane Department of Computer Science and Electrical Engineering, West Virginia University, Morgantown, WV 26506, USA.
| | | | | | | | | | | | | |
Collapse
|
4
|
Abstract
Network analysis of living systems is an essential component of contemporary systems biology. It is targeted at assemblance of mutual dependences between interacting systems elements into an integrated view of whole-system functioning. In the following chapter we describe the existing classification of what is referred to as biological networks and show how complex interdependencies in biological systems can be represented in a simpler form of network graphs. Further structural analysis of the assembled biological network allows getting knowledge on the functioning of the entire biological system. Such aspects of network structure as connectivity of network elements and connectivity degree distribution, degree of node centralities, clustering coefficient, network diameter and average path length are touched. Networks are analyzed as static entities, or the dynamical behavior of underlying biological systems may be considered. The description of mathematical and computational approaches for determining the dynamics of regulatory networks is provided. Causality as another characteristic feature of a dynamically functioning biosystem can be also accessed in the reconstruction of biological networks; we give the examples of how this integration is accomplished. Further questions about network dynamics and evolution can be approached by means of network comparison. Network analysis gives rise to new global hypotheses on systems functionality and reductionist findings of novel molecular interactions, based on the reliability of network reconstructions, which has to be tested in the subsequent experiments. We provide a collection of useful links to be used for the analysis of biological networks.
Collapse
Affiliation(s)
- Victoria J Nikiforova
- Max-Planck-Institut für Molekulare Pflanzenphysiologie, Am Mühlenberg 1, 14476 Potsdam-Golm, Germany.
| | | |
Collapse
|
5
|
Ho Sui SJ, Fulton DL, Arenillas DJ, Kwon AT, Wasserman WW. oPOSSUM: integrated tools for analysis of regulatory motif over-representation. Nucleic Acids Res 2007; 35:W245-52. [PMID: 17576675 PMCID: PMC1933229 DOI: 10.1093/nar/gkm427] [Citation(s) in RCA: 132] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
The identification of over-represented transcription factor binding sites from sets of co-expressed genes provides insights into the mechanisms of regulation for diverse biological contexts. oPOSSUM, an internet-based system for such studies of regulation, has been improved and expanded in this new release. New features include a worm-specific version for investigating binding sites conserved between Caenorhabditis elegans and C. briggsae, as well as a yeast-specific version for the analysis of co-expressed sets of Saccharomyces cerevisiae genes. The human and mouse applications feature improvements in ortholog mapping, sequence alignments and the delineation of multiple alternative promoters. oPOSSUM2, introduced for the analysis of over-represented combinations of motifs in human and mouse genes, has been integrated with the original oPOSSUM system. Analysis using user-defined background gene sets is now supported. The transcription factor binding site models have been updated to include new profiles from the JASPAR database. oPOSSUM is available at http://www.cisreg.ca/oPOSSUM/
Collapse
Affiliation(s)
- Shannan J. Ho Sui
- Centre for Molecular Medicine and Therapeutics, Child and Family Research Institute, Genetics Graduate Program and Department of Medical Genetics, University of British Columbia, Vancouver BC, Canada
| | - Debra L. Fulton
- Centre for Molecular Medicine and Therapeutics, Child and Family Research Institute, Genetics Graduate Program and Department of Medical Genetics, University of British Columbia, Vancouver BC, Canada
| | - David J. Arenillas
- Centre for Molecular Medicine and Therapeutics, Child and Family Research Institute, Genetics Graduate Program and Department of Medical Genetics, University of British Columbia, Vancouver BC, Canada
| | - Andrew T. Kwon
- Centre for Molecular Medicine and Therapeutics, Child and Family Research Institute, Genetics Graduate Program and Department of Medical Genetics, University of British Columbia, Vancouver BC, Canada
| | - Wyeth W. Wasserman
- Centre for Molecular Medicine and Therapeutics, Child and Family Research Institute, Genetics Graduate Program and Department of Medical Genetics, University of British Columbia, Vancouver BC, Canada
- *To whom correspondence should be addressed. +1 604 875 3812+1 604 875 3819
| |
Collapse
|
6
|
Abstract
Functional knowledge of individual genes encoding components of the cell signaling, metabolic and regulatory pathways is crucial to our understanding of physiology and pathophysiology. A central challenge in functional genomics is the creation of a working map delineating how eukaryotic cells coordinate and govern patterns of gene expression. This coordination is often depicted as an intertwined network or circuit of genes that alternately activate and repress each other. Multiple bioinformatic and high-throughput experimental approaches exist to aid in the reconstruction of gene networks. Albeit far from being complete, the ability to recreate gene networks from experimental data facilitates the systematic dissection of cell function at the molecular and genetic level. In this review, several different genomic technologies are discussed, and example studies that are promoting new discoveries and hypotheses are detailed.
Collapse
Affiliation(s)
- Norman H Lee
- The Institute for Genomic Research, Department of Functional Genomics, Rockville, MD 20850, USA.
| |
Collapse
|
7
|
Vishnevsky OV, Kolchanov NA. ARGO: a web system for the detection of degenerate motifs and large-scale recognition of eukaryotic promoters. Nucleic Acids Res 2005; 33:W417-22. [PMID: 15980502 PMCID: PMC1160220 DOI: 10.1093/nar/gki459] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2005] [Revised: 04/13/2005] [Accepted: 04/13/2005] [Indexed: 11/13/2022] Open
Abstract
Reliable recognition of the promoters in eukaryotic genomes remains an open issue. This is largely owing to the poor understanding of the features of the structural-functional organization of the eukaryotic promoters essential for their function and recognition. However, it was demonstrated that detection of ensembles of regulatory signals characteristic of specific promoter groups increases the accuracy of promoter recognition and prediction of specific expression features of the queried genes. The ARGO_Motifs package was developed for the detection of sets of region-specific degenerate oligonucleotide motifs in the regulatory regions of the eukaryotic genes. The ARGO_Viewer package was developed for the recognition of tissue-specific gene promoters based on the presence and distribution of oligonucleotide motifs obtained by the ARGO_Motifs program. Analysis and recognition of tissue-specific promoters in five gene samples demonstrated high quality of promoter recognition. The public version of the ARGO system is available at http://wwwmgs2.bionet.nsc.ru/argo/ and http://emj-pc.ics.uci.edu/argo/.
Collapse
Affiliation(s)
- Oleg V Vishnevsky
- Institute of Cytology and Genetics, SB RAS Lavrentyev Avenue, 10, Novosibirsk, 630090, Russia.
| | | |
Collapse
|
8
|
Bhardwaj N, Lu H. Correlation between gene expression profiles and protein-protein interactions within and across genomes. Bioinformatics 2005; 21:2730-8. [PMID: 15797912 DOI: 10.1093/bioinformatics/bti398] [Citation(s) in RCA: 112] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022] Open
Abstract
MOTIVATION Function annotation of an unclassified protein on the basis of its interaction partners is well documented in the literature. Reliable predictions of interactions from other data sources such as gene expression measurements would provide a useful route to function annotation. We investigate the global relationship of protein-protein interactions with gene expression. This relationship is studied in four evolutionarily diverse species, for which substantial information regarding their interactions and expression is available: human, mouse, yeast and Escherichia coli. RESULTS In E.coli the expression of interacting pairs is highly correlated in comparison to random pairs, while in the other three species, the correlation of expression of interacting pairs is only slightly stronger than that of random pairs. To strengthen the correlation, we developed a protocol to integrate ortholog information into the interaction and expression datasets. In all four genomes, the likelihood of predicting protein interactions from highly correlated expression data is increased using our protocol. In yeast, for example, the likelihood of predicting a true interaction, when the correlation is > 0.9, increases from 1.4 to 9.4. The improvement demonstrates that protein interactions are reflected in gene expression and the correlation between the two is strengthened by evolution information. The results establish that co-expression of interacting protein pairs is more conserved than that of random ones.
Collapse
Affiliation(s)
- Nitin Bhardwaj
- Bioinformatics Program, Department of Bioengineering, University of Illinois at Chicago, Chicago, IL 60607, USA
| | | |
Collapse
|
9
|
Zhang MQ. Prediction, annotation, and analysis of human promoters. COLD SPRING HARBOR SYMPOSIA ON QUANTITATIVE BIOLOGY 2004; 68:217-25. [PMID: 15338621 DOI: 10.1101/sqb.2003.68.217] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Affiliation(s)
- M Q Zhang
- Cold Spring Harbor Laboratory, Cold Spring Harbor, New York 11724, USA
| |
Collapse
|
10
|
Kellis M, Patterson N, Birren B, Berger B, Lander ES. Methods in comparative genomics: genome correspondence, gene identification and regulatory motif discovery. J Comput Biol 2004; 11:319-55. [PMID: 15285895 DOI: 10.1089/1066527041410319] [Citation(s) in RCA: 68] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
In Kellis et al. (2003), we reported the genome sequences of S. paradoxus, S. mikatae, and S. bayanus and compared these three yeast species to their close relative, S. cerevisiae. Genomewide comparative analysis allowed the identification of functionally important sequences, both coding and noncoding. In this companion paper we describe the mathematical and algorithmic results underpinning the analysis of these genomes. (1) We present methods for the automatic determination of genome correspondence. The algorithms enabled the automatic identification of orthologs for more than 90% of genes and intergenic regions across the four species despite the large number of duplicated genes in the yeast genome. The remaining ambiguities in the gene correspondence revealed recent gene family expansions in regions of rapid genomic change. (2) We present methods for the identification of protein-coding genes based on their patterns of nucleotide conservation across related species. We observed the pressure to conserve the reading frame of functional proteins and developed a test for gene identification with high sensitivity and specificity. We used this test to revisit the genome of S. cerevisiae, reducing the overall gene count by 500 genes (10% of previously annotated genes) and refining the gene structure of hundreds of genes. (3) We present novel methods for the systematic de novo identification of regulatory motifs. The methods do not rely on previous knowledge of gene function and in that way differ from the current literature on computational motif discovery. Based on genomewide conservation patterns of known motifs, we developed three conservation criteria that we used to discover novel motifs. We used an enumeration approach to select strongly conserved motif cores, which we extended and collapsed into a small number of candidate regulatory motifs. These include most previously known regulatory motifs as well as several noteworthy novel motifs. The majority of discovered motifs are enriched in functionally related genes, allowing us to infer a candidate function for novel motifs. Our results demonstrate the power of comparative genomics to further our understanding of any species. Our methods are validated by the extensive experimental knowledge in yeast and will be invaluable in the study of complex genomes like that of the human.
Collapse
Affiliation(s)
- Manolis Kellis
- Whitehead Institute Center for Genome Research, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, MA 02139, USA
| | | | | | | | | |
Collapse
|
11
|
Chen G, Hata N, Zhang MQ. Transcription factor binding element detection using functional clustering of mutant expression data. Nucleic Acids Res 2004; 32:2362-71. [PMID: 15115798 PMCID: PMC419446 DOI: 10.1093/nar/gkh557] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
As a powerful tool to reveal gene functions, gene mutation has been used extensively in molecular biology studies. With high throughput technologies, such as DNA microarray, genome-wide gene expression changes can be monitored in mutants. Here we present a simple approach to detect the transcription-factor-binding motif using microarray expression data from a mutant in which the relevant transcription factor is deleted. A core part of our approach is clustering of differentially expressed genes based on functional annotations, such as Gene Ontology (GO). We tested our method with eight microarray data sets from the Rosetta Compendium and were able to detect canonical binding motifs for at least four transcription factors. With the support of chromatin IP chip data, we also predict a possible variant of the Swi4 binding motif and recover a core motif for Arg80. Our approach should be readily applicable to microarray experiments using other types of molecular biology techniques, such as conditional knockout/overexpression or RNAi-mediated 'knockdown', to perturb the expression of a transcription factor. Functional clustering included in our approach may also provide new insights into the function of the relevant transcription factor.
Collapse
Affiliation(s)
- Gengxin Chen
- Cold Spring Harbor Laboratory, 1 Bungtown Road, Cold Spring Harbor, NY 11724, USA
| | | | | |
Collapse
|
12
|
Vazquez A, Flammini A, Maritan A, Vespignani A. Global protein function prediction from protein-protein interaction networks. Nat Biotechnol 2003; 21:697-700. [PMID: 12740586 DOI: 10.1038/nbt825] [Citation(s) in RCA: 490] [Impact Index Per Article: 23.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2002] [Accepted: 02/21/2003] [Indexed: 11/08/2022]
Abstract
Determining protein function is one of the most challenging problems of the post-genomic era. The availability of entire genome sequences and of high-throughput capabilities to determine gene coexpression patterns has shifted the research focus from the study of single proteins or small complexes to that of the entire proteome. In this context, the search for reliable methods for assigning protein function is of primary importance. There are various approaches available for deducing the function of proteins of unknown function using information derived from sequence similarity or clustering patterns of co-regulated genes, phylogenetic profiles, protein-protein interactions (refs. 5-8 and Samanta, M.P. and Liang, S., unpublished data), and protein complexes. Here we propose the assignment of proteins to functional classes on the basis of their network of physical interactions as determined by minimizing the number of protein interactions among different functional categories. Function assignment is proteome-wide and is determined by the global connectivity pattern of the protein network. The approach results in multiple functional assignments, a consequence of the existence of multiple equivalent solutions. We apply the method to analyze the yeast Saccharomyces cerevisiae protein-protein interaction network. The robustness of the approach is tested in a system containing a high percentage of unclassified proteins and also in cases of deletion and insertion of specific protein interactions.
Collapse
Affiliation(s)
- Alexei Vazquez
- Department of Physics, University of Notre Dame, Notre Dame, Indiana 46556, USA.
| | | | | | | |
Collapse
|
13
|
Abstract
We used a biochemical genomics method of assaying Saccharomyces cerevisiae proteins, derived from a nearly complete set of glutathione S-transferase fusions, to develop an approach that is able to identify proteins that bind to a DNA element. Using the upstream activation sequence (UAS) of the promoter for the invertase gene, SUC2, we identified both specific and nonspecific binding activities, which could be classified based on whether they bound with equivalent affinity to a nonspecific DNA competitor. Three transcription factors, Mig1, Yer028c, and Rgt1, were found to be binding activities specific to the SUC2 UAS. Mig1 and Yer028c had been reported previously to bind to elements within the SUC2 UAS, validating the ability of the method to identify sequence-specific factors. The third activity, Rgt1, had not been reported previously to bind to SUC2. Additional gel shift assays narrowed the Rgt1 binding site to the SUC2-B element within the SUC2 UAS, which is similar to previously identified Rgt1 binding sites present in other genes. In vivo levels of invertase activity in an rgt1Delta strain were reduced relative to an isogenic RGT+ strain when these strains were grown under inducing (low glucose) conditions, suggesting that Rgt1 may have a role in the activated transcription of SUC2. This report demonstrates the feasibility of identifying DNA binding activities by rapidly assaying a large fraction of the predicted open reading frames of an organism for binding to a regulatory DNA motif.
Collapse
Affiliation(s)
- Tony R Hazbun
- Department of Genome Sciences, Howard Hughes Medical Institute, University of Washington, Seattle, Washington 98195, USA.
| | | |
Collapse
|
14
|
Banerjee N, Zhang MQ. Functional genomics as applied to mapping transcription regulatory networks. Curr Opin Microbiol 2002; 5:313-7. [PMID: 12057687 DOI: 10.1016/s1369-5274(02)00322-3] [Citation(s) in RCA: 62] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
Abstract
The sequencing of the human genome and the entire genomes of many model organisms has resulted in the identification of many genes. Many large-scale experiments for generating gene disruptions and analyzing the phenotypes are underway to ascertain gene function. A future challenge will be to determine interaction and regulation of all the genes of an organism. Recent advances in functional genomic technology have begun to shine light on such gene network problems at both transcriptomic and proteomic levels. Functional genomics will not only elucidate what the genes do, but will also help determine when, where and how they are expressed as an orchestrated system. In this review, we discuss the functional genomics approaches to extract knowledge about transcription regulatory mechanisms from combinations of sequence data, microarray data and ChIP data. We focus in particular on the budding yeast Saccharomyces cerevisiae.
Collapse
Affiliation(s)
- Nila Banerjee
- Cold Spring Harbor Laboratory, 1 Bungtown Road, Cold Spring Harbor, New York 11724, USA
| | | |
Collapse
|
15
|
Lescot M, Déhais P, Thijs G, Marchal K, Moreau Y, Van de Peer Y, Rouzé P, Rombauts S. PlantCARE, a database of plant cis-acting regulatory elements and a portal to tools for in silico analysis of promoter sequences. Nucleic Acids Res 2002; 30:325-7. [PMID: 11752327 PMCID: PMC99092 DOI: 10.1093/nar/30.1.325] [Citation(s) in RCA: 3630] [Impact Index Per Article: 165.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
PlantCARE is a database of plant cis-acting regulatory elements, enhancers and repressors. Regulatory elements are represented by positional matrices, consensus sequences and individual sites on particular promoter sequences. Links to the EMBL, TRANSFAC and MEDLINE databases are provided when available. Data about the transcription sites are extracted mainly from the literature, supplemented with an increasing number of in silico predicted data. Apart from a general description for specific transcription factor sites, levels of confidence for the experimental evidence, functional information and the position on the promoter are given as well. New features have been implemented to search for plant cis-acting regulatory elements in a query sequence. Furthermore, links are now provided to a new clustering and motif search method to investigate clusters of co-expressed genes. New regulatory elements can be sent automatically and will be added to the database after curation. The PlantCARE relational database is available via the World Wide Web at http://sphinx.rug.ac.be:8080/PlantCARE/.
Collapse
Affiliation(s)
- Magali Lescot
- Vakgroep Moleculaire Genetica, Departement Plantengenetica, Vlaams Interuniversitair Instituut voor Biotechnologie, Universiteit Gent, K. L. Ledeganckstraat 35, B-9000 Gent, Belgium
| | | | | | | | | | | | | | | |
Collapse
|
16
|
Wade C, Shea KA, Jensen RV, McAlear MA. EBP2 is a member of the yeast RRB regulon, a transcriptionally coregulated set of genes that are required for ribosome and rRNA biosynthesis. Mol Cell Biol 2001; 21:8638-50. [PMID: 11713296 PMCID: PMC100024 DOI: 10.1128/mcb.21.24.8638-8650.2001] [Citation(s) in RCA: 57] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2001] [Accepted: 09/10/2001] [Indexed: 11/20/2022] Open
Abstract
In an effort to identify sets of yeast genes that are coregulated across various cellular transitions, gene expression data sets derived from yeast cells progressing through the cell cycle, sporulation, and diauxic shift were analyzed. A partitioning algorithm was used to divide each data set into 24 clusters of similar expression profiles, and the membership of the clusters was compared across the three experiments. A single cluster of 189 genes from the cell cycle experiment was found to share 65 genes with a cluster of 159 genes from the sporulation data set. Many of these genes were found to be clustered in the diauxic-shift experiment as well. The overlapping set was enriched for genes required for rRNA biosynthesis and included genes encoding RNA helicases, subunits of RNA polymerases I and III, and rRNA processing factors. A subset of the 65 genes was tested for expression by a quantitative-relative reverse transcriptase PCR technique, and they were found to be coregulated after release from alpha factor arrest, heat shock, and tunicamycin treatment. Promoter scanning analysis revealed that the 65 genes within this ribosome and rRNA biosynthesis (RRB) regulon were enriched for two motifs: the 13-base GCGATGAGATGAG and the 11-base TGAAAAATTTT consensus sequences. Both motifs were found to be important for promoting gene expression after release from alpha factor arrest in a test rRNA processing gene (EBP2), which suggests that these consensus sequences may function broadly in the regulation of a set of genes required for ribosome and rRNA biosynthesis.
Collapse
Affiliation(s)
- C Wade
- Molecular Biology and Biochemistry Department, Wesleyan University, Middletown, CT 06459, USA
| | | | | | | |
Collapse
|
17
|
Abstract
Microarray technologies for measuring mRNA abundances in cells allow monitoring of gene expression levels for tens of thousands of genes in parallel. By measuring expression responses across hundreds of different conditions or timepoints a relatively detailed gene expression map starts to emerge. Using cluster analysis techniques, it is possible to identify genes that are consistently coexpressed under several different conditions or treatments. These sets of coexpressed genes can then be compared to existing knowledge about biochemical or signalling pathways, the function of unknown genes can be hypothesised by comparing them to other genes with characterised function, or from trends in expression profiles in general - why cell needs to transcribe or silence the genes during particular treatment. The regulation of genes on the DNA level is largely guided by particular sequence features, the transcription factor binding sites, and other signals encaptured in DNA. By analyzing the regulatory regions of the DNA of the genes consistently coexpressed, we can discover the potential signals hidden in DNA by computational analysis methods. The prerequisite for this kind of analysis is the existence of genomic DNA sequence, knowledge about gene locations, and experimental gene expression measurements for a variety of conditions. This article surveys some of the analysis methods and studies for such a computational discovery approach for yeast Saccharomyces cerevisiae.
Collapse
Affiliation(s)
- J Vilo
- European Bioinformatics Institute EBI, EMBL Outstation - Hinxton, Wellcome Trust Genome Campus, Cambridge CB10 1SD, UK.
| | | |
Collapse
|
18
|
Fujibuchi W, Anderson JS, Landsman D. PROSPECT improves cis-acting regulatory element prediction by integrating expression profile data with consensus pattern searches. Nucleic Acids Res 2001; 29:3988-96. [PMID: 11574681 PMCID: PMC60241 DOI: 10.1093/nar/29.19.3988] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Consensus pattern and matrix-based searches designed to predict cis-acting transcriptional regulatory sequences have historically been subject to large numbers of false positives. We sought to decrease false positives by incorporating expression profile data into a consensus pattern-based search method. We have systematically analyzed the expression phenotypes of over 6000 yeast genes, across 121 expression profile experiments, and correlated them with the distribution of 14 known regulatory elements over sequences upstream of the genes. Our method is based on a metric we term probabilistic element assessment (PEA), which is a ranking of potential sites based on sequence similarity in the upstream regions of genes with similar expression phenotypes. For eight of the 14 known elements that we examined, our method had a much higher selectivity than a naïve consensus pattern search. Based on our analysis, we have developed a web-based tool called PROSPECT, which allows consensus pattern-based searching of gene clusters obtained from microarray data.
Collapse
Affiliation(s)
- W Fujibuchi
- Computational Biology Branch, National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, 45 Center Drive, Bethesda, MD 20894, USA
| | | | | |
Collapse
|
19
|
Birnbaum K, Benfey PN, Shasha DE. cis element/transcription factor analysis (cis/TF): a method for discovering transcription factor/cis element relationships. Genome Res 2001; 11:1567-73. [PMID: 11544201 PMCID: PMC311103 DOI: 10.1101/gr.158301] [Citation(s) in RCA: 33] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2001] [Accepted: 06/13/2001] [Indexed: 11/25/2022]
Abstract
We report a simple new algorithm, cis/TF, that uses genomewide expression data and the full genomic sequence to match transcription factors to their binding sites. Most previous computational methods discovered binding sites by clustering genes having similar expression patterns and then identifying over-represented subsequences in the promoter regions of those genes. By contrast, cis/TF asserts that B is a likely binding site of a transcription factor T if the expression pattern of T is correlated to the composite expression patterns of all genes containing B, even when those genes are not mutually correlated. Thus, our method focuses on binding sites rather than genes. The algorithm has successfully identified experimentally-supported transcription factor binding relationships in tests on several data sets from Saccharomyces cerevisiae.
Collapse
Affiliation(s)
- K Birnbaum
- Department of Biology, New York University, New York, New York 10003, USA
| | | | | |
Collapse
|
20
|
Lucchini S, Thompson A, Hinton JCD. Microarrays for microbiologists. MICROBIOLOGY (READING, ENGLAND) 2001; 147:1403-1414. [PMID: 11390672 DOI: 10.1099/00221287-147-6-1403] [Citation(s) in RCA: 89] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/26/2022]
Affiliation(s)
- S Lucchini
- Molecular Microbiology, Institute of Food Research, Norwich Research Park, Norwich NR4 7UA, UK1
| | - A Thompson
- Molecular Microbiology, Institute of Food Research, Norwich Research Park, Norwich NR4 7UA, UK1
| | - J C D Hinton
- Molecular Microbiology, Institute of Food Research, Norwich Research Park, Norwich NR4 7UA, UK1
| |
Collapse
|
21
|
Kel AE, Kel-Margoulis OV, Farnham PJ, Bartley SM, Wingender E, Zhang MQ. Computer-assisted identification of cell cycle-related genes: new targets for E2F transcription factors. J Mol Biol 2001; 309:99-120. [PMID: 11491305 DOI: 10.1006/jmbi.2001.4650] [Citation(s) in RCA: 133] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
The processes that take place during development and differentiation are directed through coordinated regulation of expression of a large number of genes. One such gene regulatory network provides cell cycle control in eukaryotic organisms. In this work, we have studied the structural features of the 5' regulatory regions of cell cycle-related genes. We developed a new method for identifying composite substructures (modules) in regulatory regions of genes consisting of a binding site for a key transcription factor and additional contextual motifs: potential targets for other transcription factors that may synergistically regulate gene transcription. Applying this method to cell cycle-related promoters, we created a program for context-specific identification of binding sites for transcription factors of the E2F family which are key regulators of the cell cycle. We found that E2F composite modules are found at a high frequency and in close proximity to the start of transcription in cell cycle-related promoters in comparison with other promoters. Using this information, we then searched for E2F sites in genomic sequences with the goal of identifying new genes which play important roles in controlling cell proliferation, differentiation and apoptosis. Using a chromatin immunoprecipitation assay, we then experimentally verified the binding of E2F in vivo to the promoters predicted by the computer-assisted methods. Our identification of new E2F target genes provides new insight into gene regulatory networks and provides a framework for continued analysis of the role of contextual promoter features in transcriptional regulation. The tools described are available at http://compel.bionet.nsc.ru/FunSite/SiteScan.html.
Collapse
Affiliation(s)
- A E Kel
- Institute of Cytology and Genetics, Novosibirsk, Russia.
| | | | | | | | | | | |
Collapse
|
22
|
Abstract
The year 2000 stands as a landmark in modern biology: the first draft of the human genome sequence has been completed. For the pharmaceutical industry, this achievement provides tremendous opportunities because the genomic sequence exposes all human drug targets for therapeutic intervention. The challenge for the pharmaceutical companies is to exploit this definitive resource for the identification of potential molecular targets, rapid characterization of their function and validation of their involvement in disease pathology. Bioinformatics approaches provide increasingly crucial tools to systematically support this exploratory target drug discovery activity.
Collapse
Affiliation(s)
- P Sanseau
- Target Bioinformatics, Glaxo SmithKline, Gunnels Wood Road, SG1 2NY, Stevenage, UK
| |
Collapse
|
23
|
Abstract
With the continuing accomplishments of the human genome project, high-throughput strategies to identify DNA sequences that are important in mammalian gene regulation are becoming increasingly feasible. In contrast to the historic, labour-intensive, wet-laboratory methods for identifying regulatory sequences, many modern approaches are heavily focused on the computational analysis of large genomic data sets. Data from inter-species genomic sequence comparisons and genome-wide expression profiling, integrated with various computational tools, are poised to contribute to the decoding of genomic sequence and to the identification of those sequences that orchestrate gene regulation. In this review, we highlight several genomic approaches that are being used to identify regulatory sequences in mammalian genomes.
Collapse
Affiliation(s)
- L A Pennacchio
- Genome Sciences Department, Lawrence Berkeley National Laboratory, 1 Cyclotron Road, Berkeley, California 94720, USA
| | | |
Collapse
|
24
|
|
25
|
Abstract
DNA microarrays are powerful tools for the analysis of the organization and regulation of the brain, in both illness and health. Such messenger RNA expression methods are outgrowths of a marriage between the several genome sequencing projects and a wide variety of physical, chemical, optical, and electronic systems. The advantages of microarray analyses include the ability to study the regulation of several genes or even the entire genome in a single experiment. However, there are substantive issues associated with the use of these tools that need to be considered before drawing conclusions about the genomic regulation of the brain. These issues include the loss of most anatomic (i.e., cellular and circuit) specificity, only fair sensitivity, lack of absolute quantitative data, poor comparability between studies, and high variability in sample values, to mention the most obvious. In this review we point to some of the solutions proposed for these problems and novel techniques and approaches for newer methods. Among these are methods for making arrays more sensitive, including nonarray messenger RNA expression systems. The future of this field and its links to deeper protein and cell biology are both emphasized.
Collapse
Affiliation(s)
- S J Watson
- Department of Psychiatry and The Mental Health Research Institute, University of Michigan, Ann Arbor, Michigan 48109-0720, USA
| | | | | | | |
Collapse
|
26
|
Girke T, Todd J, Ruuska S, White J, Benning C, Ohlrogge J. Microarray analysis of developing Arabidopsis seeds. PLANT PHYSIOLOGY 2000; 124:1570-81. [PMID: 11115875 PMCID: PMC59856 DOI: 10.1104/pp.124.4.1570] [Citation(s) in RCA: 213] [Impact Index Per Article: 8.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/17/2000] [Revised: 06/20/2000] [Accepted: 09/13/2000] [Indexed: 05/18/2023]
Abstract
To provide a broad analysis of gene expression in developing Arabidopsis seeds, microarrays have been produced that display approximately 2,600 seed-expressed genes. DNA for genes spotted on the arrays were selected from >10,000 clones partially sequenced from a cDNA library of developing seeds. Based on a series of controls, sensitivity of the arrays was estimated at one to two copies of mRNA per cell and cross hybridization was estimated to occur if closely related genes have >70% to 80% sequence identity. These arrays have been hybridized in a series of experiments with probes derived from seeds, leaves, and roots of Arabidopsis. Analysis of expression ratios between the different tissues has allowed the tissue-specific expression patterns of many hundreds of genes to be described for the first time. Approximately 25% of the 2, 600 genes were expressed at ratios > or =2-fold higher in seeds than leaves or roots and 10% at ratios > or =10. Included in this list are a large number of proteins of unknown function, and potential regulatory factors such as protein kinases, phosphatases, and transcription factors. The Arabidopsis arrays were also found to be useful for transcriptional profiling of mRNA isolated from developing oilseed rape (Brassica napus) seeds and expression patterns correlated well between the two species.
Collapse
Affiliation(s)
- T Girke
- Department of Botany and Plant Pathology, Michigan State University, East Lansing, Michigan 48824, USA
| | | | | | | | | | | |
Collapse
|
27
|
Abstract
A global analysis of 2,709 published interactions between proteins of the yeast Saccharomyces cerevisiae has been performed, enabling the establishment of a single large network of 2,358 interactions among 1,548 proteins. Proteins of known function and cellular location tend to cluster together, with 63% of the interactions occurring between proteins with a common functional assignment and 76% occurring between proteins found in the same subcellular compartment. Possible functions can be assigned to a protein based on the known functions of its interacting partners. This approach correctly predicts a functional category for 72% of the 1,393 characterized proteins with at least one partner of known function, and has been applied to predict functions for 364 previously uncharacterized proteins.
Collapse
Affiliation(s)
- B Schwikowski
- The Institute for Systems Biology, 4225 Roosevelt Way NE, Suite 200, Seattle, WA 98105, USA
| | | | | |
Collapse
|
28
|
Getz G, Levine E, Domany E. Coupled two-way clustering analysis of gene microarray data. Proc Natl Acad Sci U S A 2000; 97:12079-84. [PMID: 11035779 PMCID: PMC17297 DOI: 10.1073/pnas.210134797] [Citation(s) in RCA: 309] [Impact Index Per Article: 12.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
We present a coupled two-way clustering approach to gene microarray data analysis. The main idea is to identify subsets of the genes and samples, such that when one of these is used to cluster the other, stable and significant partitions emerge. The search for such subsets is a computationally complex task. We present an algorithm, based on iterative clustering, that performs such a search. This analysis is especially suitable for gene microarray data, where the contributions of a variety of biological mechanisms to the gene expression levels are entangled in a large body of experimental data. The method was applied to two gene microarray data sets, on colon cancer and leukemia. By identifying relevant subsets of the data and focusing on them we were able to discover partitions and correlations that were masked and hidden when the full dataset was used in the analysis. Some of these partitions have clear biological interpretation; others can serve to identify possible directions for future research.
Collapse
Affiliation(s)
- G Getz
- Department of Physics of Complex Systems, Weizmann Institute of Science, Rehovot 76100, Israel
| | | | | |
Collapse
|
29
|
van Helden J, Rios AF, Collado-Vides J. Discovering regulatory elements in non-coding sequences by analysis of spaced dyads. Nucleic Acids Res 2000; 28:1808-18. [PMID: 10734201 PMCID: PMC102821 DOI: 10.1093/nar/28.8.1808] [Citation(s) in RCA: 214] [Impact Index Per Article: 8.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
The application of microarray and related technologies is currently generating a systematic catalog of the transcriptional response of any single gene to a multiplicity of experimental conditions. Clustering genes according to the similarity of their transcriptional response provides a direct hint to the regulons of the different transcription factors, many of which have still not been characterized. We have developed a new method for deciphering the mechanism underlying the common transcriptional response of a set of genes, i.e. discovering cis -acting regulatory elements from a set of unaligned upstream sequences. This method, called dyad analysis, is based on the observation that many regulatory sites consist of a pair of highly conserved trinucleotides, spaced by a non-conserved region of fixed width. The approach is to count the number of occurrences of each possible spaced pair of trinucleotides, and to assess its statistical significance. The method is highly efficient in the detection of sites bound by C(6)Zn(2)binuclear cluster proteins, as well as other transcription factors. In addition, we show that the dyad and single-word analyses are efficient for the detection of regulatory patterns in gene clusters from DNA chip experiments. In combination, these programs should provide a fast and efficient way to discover new regulatory sites for as yet unknown transcription factors.
Collapse
Affiliation(s)
- J van Helden
- Unité de Conformation des Macromolécules Biologiques, Université Libre de Bruxelles, CP 160/16, 50 av. F. D. Roosevelt, B-1050 Bruxelles, Belgium.
| | | | | |
Collapse
|
30
|
Richmond T, Somerville S. Chasing the dream: plant EST microarrays. CURRENT OPINION IN PLANT BIOLOGY 2000; 3:108-116. [PMID: 10712953 DOI: 10.1016/s1369-5266(99)00049-7] [Citation(s) in RCA: 56] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/23/2023]
Abstract
DNA microarray technology is poised to make an important contribution to the field of plant biology. Stimulated by recent funding programs, expressed sequence tag sequencing and microarray production either has begun or is being contemplated for most economically important plant species. Although the DNA microarray technology is still being refined, the basic methods are well established. The real challenges lie in data analysis and data management. To fully realize the value of this technology, centralized databases that are capable of storing microarray expression data and managing information from a variety of sources will be needed. These information resources are under development and will help usher in a new era in plant functional genomics.
Collapse
Affiliation(s)
- T Richmond
- Department of Plant Biology, Carnegie Institution of Washington, Stanford 94305, USA.
| | | |
Collapse
|
31
|
Abstract
Bioinformatics has, out of necessity, become a key aspect of drug discovery in the genomic revolution, contributing to both target discovery and target validation. The author describes the role that bioinformatics has played and will continue to play in response to the waves of genome-wide data sources that have become available to the industry, including expressed sequence tags, microbial genome sequences, model organism sequences, polymorphisms, gene expression data and proteomics. However, these knowledge sources must be intelligently integrated.
Collapse
|
32
|
Abstract
Genome-wide analysis techniques such as chromosome painting, comparative genomic hybridization, representational difference analysis, restriction landmark genome scanning and high-throughput analysis of LOH are now accelerating high-resolution genome aberration localization in human tumors. These techniques are complemented by procedures for detection of differentially expressed genes such as differential display, nucleic acid subtraction, serial analysis of gene expression and expression microarray analysis. These efforts are enabled by work from the human genome program in physical map development, cDNA library production/sequencing and in genome sequencing. This review covers several commonly used large-scale genome and gene expression analysis techniques, outlines genomic approaches to gene discovery and summarizes information that has come from large-scale analyses of human solid tumors.
Collapse
Affiliation(s)
- J W Gray
- UCSF Cancer Center, 2340 Sutter Street, University of California San Francisco, San Francisco, CA 94143-0808, USA.
| | | |
Collapse
|