1
|
Zhan Q, Fu Y, Jiang Q, Liu B, Peng J, Wang Y. SpliVert: A Protein Multiple Sequence Alignment Refinement Method Based on Splitting-Splicing Vertically. Protein Pept Lett 2020; 27:295-302. [PMID: 31385760 DOI: 10.2174/0929866526666190806143959] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2019] [Revised: 04/26/2019] [Accepted: 06/14/2019] [Indexed: 11/22/2022]
Abstract
BACKGROUND Multiple Sequence Alignment (MSA) is a fundamental task in bioinformatics and is required for many biological analysis tasks. The more accurate the alignments are, the more credible the downstream analyses. Most protein MSA algorithms realign an alignment to refine it by dividing it into two groups horizontally and then realign the two groups. However, this strategy does not consider that different regions of the sequences have different conservation; this property may lead to incorrect residue-residue or residue-gap pairs, which cannot be corrected by this strategy. OBJECTIVE In this article, our motivation is to develop a novel refinement method based on splitting- splicing vertically. METHODS Here, we present a novel refinement method based on splitting-splicing vertically, called SpliVert. For an alignment, we split it vertically into 3 parts, remove the gap characters in the middle, realign the middle part alone, and splice the realigned middle parts with the other two initial pieces to obtain a refined alignment. In the realign procedure of our method, the aligner will only focus on a certain part, ignoring the disturbance of the other parts, which could help fix the incorrect pairs. RESULTS We tested our refinement strategy for 2 leading MSA tools on 3 standard benchmarks, according to the commonly used average SP (and TC) score. The results show that given appropriate proportions to split the initial alignment, the average scores are increased comparably or slightly after using our method. We also compared the alignments refined by our method with alignments directly refined by the original alignment tools. The results suggest that using our SpliVert method to refine alignments can also outperform direct use of the original alignment tools. CONCLUSION The results reveal that splitting vertically and realigning part of the alignment is a good strategy for the refinement of protein multiple sequence alignments.
Collapse
Affiliation(s)
- Qing Zhan
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Yilei Fu
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Qinghua Jiang
- School of Life Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Bo Liu
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Jiajie Peng
- School of Computer Science, Northwestern Polytechnical University, Xi'an, China
| | - Yadong Wang
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| |
Collapse
|
2
|
Zhan Q, Wang N, Jin S, Tan R, Jiang Q, Wang Y. ProbPFP: a multiple sequence alignment algorithm combining hidden Markov model optimized by particle swarm optimization with partition function. BMC Bioinformatics 2019; 20:573. [PMID: 31760933 PMCID: PMC6876095 DOI: 10.1186/s12859-019-3132-7] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND During procedures for conducting multiple sequence alignment, that is so essential to use the substitution score of pairwise alignment. To compute adaptive scores for alignment, researchers usually use Hidden Markov Model or probabilistic consistency methods such as partition function. Recent studies show that optimizing the parameters for hidden Markov model, as well as integrating hidden Markov model with partition function can raise the accuracy of alignment. The combination of partition function and optimized HMM, which could further improve the alignment's accuracy, however, was ignored by these researches. RESULTS A novel algorithm for MSA called ProbPFP is presented in this paper. It intergrate optimized HMM by particle swarm with partition function. The algorithm of PSO was applied to optimize HMM's parameters. After that, the posterior probability obtained by the HMM was combined with the one obtained by partition function, and thus to calculate an integrated substitution score for alignment. In order to evaluate the effectiveness of ProbPFP, we compared it with 13 outstanding or classic MSA methods. The results demonstrate that the alignments obtained by ProbPFP got the maximum mean TC scores and mean SP scores on these two benchmark datasets: SABmark and OXBench, and it got the second highest mean TC scores and mean SP scores on the benchmark dataset BAliBASE. ProbPFP is also compared with 4 other outstanding methods, by reconstructing the phylogenetic trees for six protein families extracted from the database TreeFam, based on the alignments obtained by these 5 methods. The result indicates that the reference trees are closer to the phylogenetic trees reconstructed from the alignments obtained by ProbPFP than the other methods. CONCLUSIONS We propose a new multiple sequence alignment method combining optimized HMM and partition function in this paper. The performance validates this method could make a great improvement of the alignment's accuracy.
Collapse
Affiliation(s)
- Qing Zhan
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, 150001, China
| | - Nan Wang
- Department of Mathematics, Harbin Institute of Technology, Harbin, 150001, China
| | - Shuilin Jin
- Department of Mathematics, Harbin Institute of Technology, Harbin, 150001, China
| | - Renjie Tan
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, 150001, China
| | - Qinghua Jiang
- School of Life Science and Technology, Harbin Institute of Technology, Harbin, 150001, China
| | - Yadong Wang
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, 150001, China.
| |
Collapse
|
3
|
Yu Z, Ding Y, Yin J, Yu D, Zhang J, Zhang M, Ding M, Zhong W, Qiu J, Li J. Dissemination of Genetic Acquisition/Loss Provides a Variety of Quorum Sensing Regulatory Properties in Pseudoalteromonas. Int J Mol Sci 2018; 19:E3636. [PMID: 30453700 PMCID: PMC6275029 DOI: 10.3390/ijms19113636] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2018] [Revised: 11/15/2018] [Accepted: 11/15/2018] [Indexed: 01/20/2023] Open
Abstract
A bstract: Quorum sensing (QS) enables single-celled bacteria to communicate with chemical signals in order to synchronize group-level bacterial behavior. Pseudoalteromonas are marine bacteria found in versatile environments, of which QS regulation for their habitat adaptation is extremely fragmentary. To distinguish genes required for QS regulation in Pseudoalteromonas, comparative genomics was deployed to define the pan-genomics for twelve isolates and previously-sequenced genomes, of which acyl-homoserine lactone (AHL)-based QS traits were characterized. Additionally, transposon mutagenesis was used to identify the essential QS regulatory genes in the selected Pseudoalteromonas isolate. A remarkable feature showed that AHL-based colorization intensity of biosensors induced by Pseudoalteromonas most likely correlates with QS regulators genetic heterogeneity within the genus. This is supported by the relative expression levels of two of the main QS regulatory genes (luxO and rpoN) analyzed in representative Pseudoalteromonas isolates. Notably, comprehensive QS regulatory schema and the working model proposed in Pseudoalteromonas seem to phylogenetically include the network architectures derived from Escherichia coli, Pseudomonas, and Vibrio. Several associated genes were mapped by transposon mutagenesis. Among them, a right origin-binding protein-encoding gene (robp) was functionally identified as a positive QS regulatory gene. This gene lies on a genomic instable region and exists in the aforementioned bioinformatically recruited QS regulatory schema. The obtained data emphasize that the distinctly- and hierarchically-organized mechanisms probably target QS association in Pseudoalteromonas dynamic genomes, thus leading to bacterial ability to accommodate their adaption fitness and survival advantages.
Collapse
Affiliation(s)
- Zhiliang Yu
- College of Biotechnology and Bioengineering, Zhejiang University of Technology, Hangzhou 310014, China.
| | - Yajuan Ding
- College of Biotechnology and Bioengineering, Zhejiang University of Technology, Hangzhou 310014, China.
| | - Jianhua Yin
- College of Biotechnology and Bioengineering, Zhejiang University of Technology, Hangzhou 310014, China.
| | - Dongliang Yu
- College of Biotechnology and Bioengineering, Zhejiang University of Technology, Hangzhou 310014, China.
| | - Jiadi Zhang
- College of Biotechnology and Bioengineering, Zhejiang University of Technology, Hangzhou 310014, China.
| | - Mengting Zhang
- College of Biotechnology and Bioengineering, Zhejiang University of Technology, Hangzhou 310014, China.
| | - Mengdan Ding
- College of Biotechnology and Bioengineering, Zhejiang University of Technology, Hangzhou 310014, China.
| | - Weihong Zhong
- College of Biotechnology and Bioengineering, Zhejiang University of Technology, Hangzhou 310014, China.
| | - Juanping Qiu
- College of Biotechnology and Bioengineering, Zhejiang University of Technology, Hangzhou 310014, China.
| | - Jun Li
- College of Biotechnology and Bioengineering, Zhejiang University of Technology, Hangzhou 310014, China.
| |
Collapse
|
4
|
Boutet I, Ripp R, Lecompte O, Dossat C, Corre E, Tanguy A, Lallier FH. Conjugating effects of symbionts and environmental factors on gene expression in deep-sea hydrothermal vent mussels. BMC Genomics 2011; 12:530. [PMID: 22034982 PMCID: PMC3218092 DOI: 10.1186/1471-2164-12-530] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2011] [Accepted: 10/28/2011] [Indexed: 11/17/2022] Open
Abstract
Background The deep-sea hydrothermal vent mussel Bathymodiolus azoricus harbors thiotrophic and methanotrophic symbiotic bacteria in its gills. While the symbiotic relationship between this hydrothermal mussel and these chemoautotrophic bacteria has been described, the molecular processes involved in the cross-talking between symbionts and host, in the maintenance of the symbiois, in the influence of environmental parameters on gene expression, and in transcriptome variation across individuals remain poorly understood. In an attempt to understand how, and to what extent, this double symbiosis affects host gene expression, we used a transcriptomic approach to identify genes potentially regulated by symbiont characteristics, environmental conditions or both. This study was done on mussels from two contrasting populations. Results Subtractive libraries allowed the identification of about 1000 genes putatively regulated by symbiosis and/or environmental factors. Microarray analysis showed that 120 genes (3.5% of all genes) were differentially expressed between the Menez Gwen (MG) and Rainbow (Rb) vent fields. The total number of regulated genes in mussels harboring a high versus a low symbiont content did not differ significantly. With regard to the impact of symbiont content, only 1% of all genes were regulated by thiotrophic (SOX) and methanotrophic (MOX) bacteria content in MG mussels whereas 5.6% were regulated in mussels collected at Rb. MOX symbionts also impacted a higher proportion of genes than SOX in both vent fields. When host transcriptome expression was analyzed with respect to symbiont gene expression, it was related to symbiont quantity in each field. Conclusions Our study has produced a preliminary description of a transcriptomic response in a hydrothermal vent mussel host of both thiotrophic and methanotrophic symbiotic bacteria. This model can help to identify genes involved in the maintenance of symbiosis or regulated by environmental parameters. Our results provide evidence of symbiont effect on transcriptome regulation, with differences related to type of symbiont, even though the relative percentage of genes involved remains limited. Differences observed between the vent site indicate that environment strongly influences transcriptome regulation and impacts both activity and relative abundance of each symbiont. Among all these genes, those participating in recognition, the immune system, oxidative stress, and energy metabolism constitute new promising targets for extended studies on symbiosis and the effect of environmental parameters on the symbiotic relationships in B. azoricus.
Collapse
Affiliation(s)
- Isabelle Boutet
- CNRS, UMR 7144, Adaptation et Diversité en Milieu Marin, Station Biologique de Roscoff, 29682 Roscoff, France.
| | | | | | | | | | | | | |
Collapse
|
5
|
Nian Chua H. Prediction of Protein Function. Genomics 2010. [DOI: 10.1002/9780470711675.ch9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
|
6
|
Gagnière N, Jollivet D, Boutet I, Brélivet Y, Busso D, Da Silva C, Gaill F, Higuet D, Hourdez S, Knoops B, Lallier F, Leize-Wagner E, Mary J, Moras D, Perrodou E, Rees JF, Segurens B, Shillito B, Tanguy A, Thierry JC, Weissenbach J, Wincker P, Zal F, Poch O, Lecompte O. Insights into metazoan evolution from Alvinella pompejana cDNAs. BMC Genomics 2010; 11:634. [PMID: 21080938 PMCID: PMC3018142 DOI: 10.1186/1471-2164-11-634] [Citation(s) in RCA: 43] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2010] [Accepted: 11/16/2010] [Indexed: 11/29/2022] Open
Abstract
Background Alvinella pompejana is a representative of Annelids, a key phylum for evo-devo studies that is still poorly studied at the sequence level. A. pompejana inhabits deep-sea hydrothermal vents and is currently known as one of the most thermotolerant Eukaryotes in marine environments, withstanding the largest known chemical and thermal ranges (from 5 to 105°C). This tube-dwelling worm forms dense colonies on the surface of hydrothermal chimneys and can withstand long periods of hypo/anoxia and long phases of exposure to hydrogen sulphides. A. pompejana specifically inhabits chimney walls of hydrothermal vents on the East Pacific Rise. To survive, Alvinella has developed numerous adaptations at the physiological and molecular levels, such as an increase in the thermostability of proteins and protein complexes. It represents an outstanding model organism for studying adaptation to harsh physicochemical conditions and for isolating stable macromolecules resistant to high temperatures. Results We have constructed four full length enriched cDNA libraries to investigate the biology and evolution of this intriguing animal. Analysis of more than 75,000 high quality reads led to the identification of 15,858 transcripts and 9,221 putative protein sequences. Our annotation reveals a good coverage of most animal pathways and networks with a prevalence of transcripts involved in oxidative stress resistance, detoxification, anti-bacterial defence, and heat shock protection. Alvinella proteins seem to show a slow evolutionary rate and a higher similarity with proteins from Vertebrates compared to proteins from Arthropods or Nematodes. Their composition shows enrichment in positively charged amino acids that might contribute to their thermostability. The gene content of Alvinella reveals that an important pool of genes previously considered to be specific to Deuterostomes were in fact already present in the last common ancestor of the Bilaterian animals, but have been secondarily lost in model invertebrates. This pool is enriched in glycoproteins that play a key role in intercellular communication, hormonal regulation and immunity. Conclusions Our study starts to unravel the gene content and sequence evolution of a deep-sea annelid, revealing key features in eukaryote adaptation to extreme environmental conditions and highlighting the proximity of Annelids and Vertebrates.
Collapse
Affiliation(s)
- Nicolas Gagnière
- Department of Structural Biology and Genomics, Institut de Génétique et de Biologie Moléculaire et Cellulaire, CERBM F-67400 Illkirch, France
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
7
|
Jung J, Yi G, Sukno SA, Thon MR. PoGO: Prediction of Gene Ontology terms for fungal proteins. BMC Bioinformatics 2010; 11:215. [PMID: 20429880 PMCID: PMC2882390 DOI: 10.1186/1471-2105-11-215] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2010] [Accepted: 04/29/2010] [Indexed: 11/10/2022] Open
Abstract
Background Automated protein function prediction methods are the only practical approach for assigning functions to genes obtained from model organisms. Many of the previously reported function annotation methods are of limited utility for fungal protein annotation. They are often trained only to one species, are not available for high-volume data processing, or require the use of data derived by experiments such as microarray analysis. To meet the increasing need for high throughput, automated annotation of fungal genomes, we have developed a tool for annotating fungal protein sequences with terms from the Gene Ontology. Results We describe a classifier called PoGO (Prediction of Gene Ontology terms) that uses statistical pattern recognition methods to assign Gene Ontology (GO) terms to proteins from filamentous fungi. PoGO is organized as a meta-classifier in which each evidence source (sequence similarity, protein domains, protein structure and biochemical properties) is used to train independent base-level classifiers. The outputs of the base classifiers are used to train a meta-classifier, which provides the final assignment of GO terms. An independent classifier is trained for each GO term, making the system amenable to updating, without having to re-train the whole system. The resulting system is robust. It provides better accuracy and can assign GO terms to a higher percentage of unannotated protein sequences than other methods that we tested. Conclusions Our annotation system overcomes many of the shortcomings that we found in other methods. We also provide a web server where users can submit protein sequences to be annotated.
Collapse
Affiliation(s)
- Jaehee Jung
- Centro Hispano-Luso de Investigaciones Agrarias (CIALE), Department of Microbiology and Genetics, University of Salamanca, Villamayor 37185, Spain
| | | | | | | |
Collapse
|
8
|
Talukdar V, Konar A, Datta A, Choudhury AR. Changing from computing grid to knowledge grid in life-science grid. Biotechnol J 2009; 4:1244-52. [PMID: 19579217 DOI: 10.1002/biot.200800073] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Grid computing has a great potential to become a standard cyber infrastructure for life sciences that often require high-performance computing and large data handling, which exceeds the computing capacity of a single institution. Grid computer applies the resources of many computers in a network to a single problem at the same time. It is useful to scientific problems that require a great number of computer processing cycles or access to a large amount of data.As biologists,we are constantly discovering millions of genes and genome features, which are assembled in a library and distributed on computers around the world.This means that new, innovative methods must be developed that exploit the re-sources available for extensive calculations - for example grid computing.This survey reviews the latest grid technologies from the viewpoints of computing grid, data grid and knowledge grid. Computing grid technologies have been matured enough to solve high-throughput real-world life scientific problems. Data grid technologies are strong candidates for realizing a "resourceome" for bioinformatics. Knowledge grids should be designed not only from sharing explicit knowledge on computers but also from community formulation for sharing tacit knowledge among a community. By extending the concept of grid from computing grid to knowledge grid, it is possible to make use of a grid as not only sharable computing resources, but also as time and place in which people work together, create knowledge, and share knowledge and experiences in a community.
Collapse
Affiliation(s)
- Veera Talukdar
- Department of NCMT, NSHM-Knowledge Campus, Kolkata, India.
| | | | | | | |
Collapse
|
9
|
Tang ZQ, Lin HH, Zhang HL, Han LY, Chen X, Chen YZ. Prediction of functional class of proteins and peptides irrespective of sequence homology by support vector machines. Bioinform Biol Insights 2009; 1:19-47. [PMID: 20066123 PMCID: PMC2789692 DOI: 10.4137/bbi.s315] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022] Open
Abstract
Various computational methods have been used for the prediction of protein and peptide function based on their sequences. A particular challenge is to derive functional properties from sequences that show low or no homology to proteins of known function. Recently, a machine learning method, support vector machines (SVM), have been explored for predicting functional class of proteins and peptides from amino acid sequence derived properties independent of sequence similarity, which have shown promising potential for a wide spectrum of protein and peptide classes including some of the low- and non-homologous proteins. This method can thus be explored as a potential tool to complement alignment-based, clustering-based, and structure-based methods for predicting protein function. This article reviews the strategies, current progresses, and underlying difficulties in using SVM for predicting the functional class of proteins. The relevant software and web-servers are described. The reported prediction performances in the application of these methods are also presented.
Collapse
Affiliation(s)
- Zhi Qun Tang
- Department of Pharmacy and Department of Computational Science, National University of Singapore, Republic of Singapore, 117543
| | - Hong Huang Lin
- Department of Pharmacy and Department of Computational Science, National University of Singapore, Republic of Singapore, 117543
| | - Hai Lei Zhang
- Department of Pharmacy and Department of Computational Science, National University of Singapore, Republic of Singapore, 117543
| | - Lian Yi Han
- Department of Pharmacy and Department of Computational Science, National University of Singapore, Republic of Singapore, 117543
| | - Xin Chen
- Department of Biotechnology, Zhejiang University, Hang Zhou, Zhejiang Province, P. R. China, 310029
| | - Yu Zong Chen
- Department of Pharmacy and Department of Computational Science, National University of Singapore, Republic of Singapore, 117543
- Shanghai Center for Bioinformatics Technology, Shanghai, P. R. China, 201203
| |
Collapse
|
10
|
Befort K, Filliol D, Ghate A, Darcq E, Matifas A, Muller J, Lardenois A, Thibault C, Dembele D, Le Merrer J, Becker JAJ, Poch O, Kieffer BL. Mu-opioid receptor activation induces transcriptional plasticity in the central extended amygdala. Eur J Neurosci 2008; 27:2973-84. [PMID: 18588537 DOI: 10.1111/j.1460-9568.2008.06273.x] [Citation(s) in RCA: 57] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/30/2023]
Abstract
Addiction develops from the gradual adaptation of the brain to chronic drug exposure, and involves genetic reprogramming of neuronal function. The central extended amygdala (EAc) is a network formed by the central amygdala and the bed nucleus of the stria terminalis. This key site controls drug craving and seeking behaviors, and has not been investigated at the gene regulation level. We used Affymetrix microarrays to analyze transcriptional activity in the murine EAc, with a focus on mu-opioid receptor-associated events because these receptors mediate drug reward and dependence. We identified 132 genes whose expression is regulated by a chronic escalating morphine regimen in the EAc from wild-type but not mu-opioid receptor knockout mice. These modifications are mostly EAc-specific. Gene ontology analysis reveals an overrepresentation of neurogenesis, cell growth and signaling protein categories. A separate quantitative PCR analysis of genes in the last of these groups confirms the dysregulation of both orphan (Gpr88) and known (DrD1A, Adora2A, Cnr1, Grm5, Gpr6) G protein-coupled receptors, scaffolding (PSD95, Homer1) and signaling (Sgk, Cap1) proteins, and neuropeptides (CCK, galanin). These transcriptional modifications do not occur following a single morphine injection, and hence result from long-term adaptation to excessive mu receptor activation. Proteins encoded by these genes are classically associated with spine modules function in other brain areas, and therefore our data suggest a remodeling of EAc circuits at sites where glutamatergic and monoaminergic afferences interact. Together, mu receptor-dependent genes identified in this study potentially contribute to drug-induced neural plasticity, and provide a unique molecular repertoire towards understanding drug craving and relapse.
Collapse
Affiliation(s)
- K Befort
- IGBMC (Institut de Génétique et de Biologie Moléculaire et Cellulaire), Département Neurobiologie et Génétique, Illkirch, F-67400 France.
| | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
11
|
Becker JAJ, Befort K, Blad C, Filliol D, Ghate A, Dembele D, Thibault C, Koch M, Muller J, Lardenois A, Poch O, Kieffer BL. Transcriptome analysis identifies genes with enriched expression in the mouse central extended amygdala. Neuroscience 2008; 156:950-65. [PMID: 18786617 DOI: 10.1016/j.neuroscience.2008.07.070] [Citation(s) in RCA: 43] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2008] [Revised: 07/18/2008] [Accepted: 07/30/2008] [Indexed: 01/18/2023]
Abstract
The central extended amygdala (EAc) is an ensemble of highly interconnected limbic structures of the anterior brain, and forms a cellular continuum including the bed nucleus of the stria terminalis (BNST), the central nucleus of the amygdala (CeA) and the nucleus accumbens shell (AcbSh). This neural network is a key site for interactions between brain reward and stress systems, and has been implicated in several aspects of drug abuse. In order to increase our understanding of EAc function at the molecular level, we undertook a genome-wide screen (Affymetrix) to identify genes whose expression is enriched in the mouse EAc. We focused on the less-well known BNST-CeA areas of the EAc, and identified 121 genes that exhibit more than twofold higher expression level in the EAc compared with whole brain. Among these, 43 genes have never been described to be expressed in the EAc. We mapped these genes throughout the brain, using non-radioactive in situ hybridization, and identified eight genes with a unique and distinct rostro-caudal expression pattern along AcbSh, BNST and CeA. Q-PCR analysis performed in brain and peripheral organ tissues indicated that, with the exception of one (Spata13), all these genes are predominantly expressed in brain. These genes encode signaling proteins (Adora2, GPR88, Arpp21 and Rem2), a transcription factor (Limh6) or proteins of unknown function (Rik130, Spata13 and Wfs1). The identification of genes with enriched expression expands our knowledge of EAc at a molecular level, and provides useful information to toward genetic manipulations within the EAc.
Collapse
Affiliation(s)
- J A J Becker
- Institut de Génétique et de Biologie Moléculaire et Cellulaire, Département Neurobiologie et Génétique, Illkirch, France.
| | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
12
|
Götz S, García-Gómez JM, Terol J, Williams TD, Nagaraj SH, Nueda MJ, Robles M, Talón M, Dopazo J, Conesa A. High-throughput functional annotation and data mining with the Blast2GO suite. Nucleic Acids Res 2008; 36:3420-35. [PMID: 18445632 PMCID: PMC2425479 DOI: 10.1093/nar/gkn176] [Citation(s) in RCA: 2904] [Impact Index Per Article: 181.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open
Abstract
Functional genomics technologies have been widely adopted in the biological research of both model and non-model species. An efficient functional annotation of DNA or protein sequences is a major requirement for the successful application of these approaches as functional information on gene products is often the key to the interpretation of experimental results. Therefore, there is an increasing need for bioinformatics resources which are able to cope with large amount of sequence data, produce valuable annotation results and are easily accessible to laboratories where functional genomics projects are being undertaken. We present the Blast2GO suite as an integrated and biologist-oriented solution for the high-throughput and automatic functional annotation of DNA or protein sequences based on the Gene Ontology vocabulary. The most outstanding Blast2GO features are: (i) the combination of various annotation strategies and tools controlling type and intensity of annotation, (ii) the numerous graphical features such as the interactive GO-graph visualization for gene-set function profiling or descriptive charts, (iii) the general sequence management features and (iv) high-throughput capabilities. We used the Blast2GO framework to carry out a detailed analysis of annotation behaviour through homology transfer and its impact in functional genomics research. Our aim is to offer biologists useful information to take into account when addressing the task of functionally characterizing their sequence data.
Collapse
Affiliation(s)
- Stefan Götz
- Bioinformatics Department, Centro de Investigación Principe Felipe, Valencia, Spain
| | | | | | | | | | | | | | | | | | | |
Collapse
|
13
|
Nair R, Rost B. Protein subcellular localization prediction using artificial intelligence technology. Methods Mol Biol 2008; 484:435-63. [PMID: 18592195 DOI: 10.1007/978-1-59745-398-1_27] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/20/2023]
Abstract
Proteins perform many important tasks in living organisms, such as catalysis of biochemical reactions, transport of nutrients, and recognition and transmission of signals. The plethora of aspects of the role of any particular protein is referred to as its "function." One aspect of protein function that has been the target of intensive research by computational biologists is its subcellular localization. Proteins must be localized in the same subcellular compartment to cooperate toward a common physiological function. Aberrant subcellular localization of proteins can result in several diseases, including kidney stones, cancer, and Alzheimer's disease. To date, sequence homology remains the most widely used method for inferring the function of a protein. However, the application of advanced artificial intelligence (AI)-based techniques in recent years has resulted in significant improvements in our ability to predict the subcellular localization of a protein. The prediction accuracy has risen steadily over the years, in large part due to the application of AI-based methods such as hidden Markov models (HMMs), neural networks (NNs), and support vector machines (SVMs), although the availability of larger experimental datasets has also played a role. Automatic methods that mine textual information from the biological literature and molecular biology databases have considerably sped up the process of annotation for proteins for which some information regarding function is available in the literature. State-of-the-art methods based on NNs and HMMs can predict the presence of N-terminal sorting signals extremely accurately. Ab initio methods that predict subcellular localization for any protein sequence using only the native amino acid sequence and features predicted from the native sequence have shown the most remarkable improvements. The prediction accuracy of these methods has increased by over 30% in the past decade. The accuracy of these methods is now on par with high-throughput methods for predicting localization, and they are beginning to play an important role in directing experimental research. In this chapter, we review some of the most important methods for the prediction of subcellular localization.
Collapse
Affiliation(s)
- Rajesh Nair
- CUBIC Department of Biochemistry and Molecular Biophysics and Center for Computational Biology and Bioinformatics, Columbia University, New York, NY, USA
| | | |
Collapse
|
14
|
Conesa A, Götz S. Blast2GO: A Comprehensive Suite for Functional Analysis in Plant Genomics. INTERNATIONAL JOURNAL OF PLANT GENOMICS 2008; 2008:619832. [PMID: 18483572 PMCID: PMC2375974 DOI: 10.1155/2008/619832] [Citation(s) in RCA: 1344] [Impact Index Per Article: 84.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/05/2007] [Accepted: 11/26/2007] [Indexed: 05/09/2023]
Abstract
Functional annotation of novel sequence data is a primary requirement for the utilization of functional genomics approaches in plant research. In this paper, we describe the Blast2GO suite as a comprehensive bioinformatics tool for functional annotation of sequences and data mining on the resulting annotations, primarily based on the gene ontology (GO) vocabulary. Blast2GO optimizes function transfer from homologous sequences through an elaborate algorithm that considers similarity, the extension of the homology, the database of choice, the GO hierarchy, and the quality of the original annotations. The tool includes numerous functions for the visualization, management, and statistical analysis of annotation results, including gene set enrichment analysis. The application supports InterPro, enzyme codes, KEGG pathways, GO direct acyclic graphs (DAGs), and GOSlim. Blast2GO is a suitable tool for plant genomics research because of its versatility, easy installation, and friendly use.
Collapse
Affiliation(s)
- Ana Conesa
- Bioinformatics Department,
Centro de Investigación Príncipe Felipe,
4012 Valencia,
Spain
- *Ana Conesa:
| | - Stefan Götz
- Bioinformatics Department,
Centro de Investigación Príncipe Felipe,
4012 Valencia,
Spain
| |
Collapse
|
15
|
Muller J, Mehlen A, Vetter G, Yatskou M, Muller A, Chalmel F, Poch O, Friederich E, Vallar L. Design and evaluation of Actichip, a thematic microarray for the study of the actin cytoskeleton. BMC Genomics 2007; 8:294. [PMID: 17727702 PMCID: PMC2077341 DOI: 10.1186/1471-2164-8-294] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2007] [Accepted: 08/29/2007] [Indexed: 01/07/2023] Open
Abstract
Background The actin cytoskeleton plays a crucial role in supporting and regulating numerous cellular processes. Mutations or alterations in the expression levels affecting the actin cytoskeleton system or related regulatory mechanisms are often associated with complex diseases such as cancer. Understanding how qualitative or quantitative changes in expression of the set of actin cytoskeleton genes are integrated to control actin dynamics and organisation is currently a challenge and should provide insights in identifying potential targets for drug discovery. Here we report the development of a dedicated microarray, the Actichip, containing 60-mer oligonucleotide probes for 327 genes selected for transcriptome analysis of the human actin cytoskeleton. Results Genomic data and sequence analysis features were retrieved from GenBank and stored in an integrative database called Actinome. From these data, probes were designed using a home-made program (CADO4MI) allowing sequence refinement and improved probe specificity by combining the complementary information recovered from the UniGene and RefSeq databases. Actichip performance was analysed by hybridisation with RNAs extracted from epithelial MCF-7 cells and human skeletal muscle. Using thoroughly standardised procedures, we obtained microarray images with excellent quality resulting in high data reproducibility. Actichip displayed a large dynamic range extending over three logs with a limit of sensitivity between one and ten copies of transcript per cell. The array allowed accurate detection of small changes in gene expression and reliable classification of samples based on the expression profiles of tissue-specific genes. When compared to two other oligonucleotide microarray platforms, Actichip showed similar sensitivity and concordant expression ratios. Moreover, Actichip was able to discriminate the highly similar actin isoforms whereas the two other platforms did not. Conclusion Our data demonstrate that Actichip is a powerful alternative to commercial high density microarrays for cytoskeleton gene profiling in normal or pathological samples. Actichip is available upon request.
Collapse
Affiliation(s)
- Jean Muller
- Laboratoire de Biologie Moléculaire, d'Analyse Génique et de Modélisation, Centre de Recherche Public-Santé, 84 rue Val Fleuri, L-1526 Luxembourg, Luxembourg
- Laboratoire de Bioinformatique et Génomique Intégratives, Institut de Génétique et de Biologie Moléculaire et Cellulaire; Inserm, U596; CNRS, UMR7104, F-67400 Illkirch, Université Louis Pasteur, F-67000 Strasbourg, France
- Computational Biology Unit, European Molecular Biology Laboratory, Meyerhofstrasse 1, D-69117 Heidelberg, Germany
| | - André Mehlen
- Laboratoire de Biologie Moléculaire, d'Analyse Génique et de Modélisation, Centre de Recherche Public-Santé, 84 rue Val Fleuri, L-1526 Luxembourg, Luxembourg
| | - Guillaume Vetter
- Laboratoire de Biologie Moléculaire, d'Analyse Génique et de Modélisation, Centre de Recherche Public-Santé, 84 rue Val Fleuri, L-1526 Luxembourg, Luxembourg
- Cytoskeleton and cell plasticity laboratory, Life Sciences RU, University of Luxembourg, 162a Avenue de la faïencerie, L-1511 Luxembourg, Luxembourg
| | - Mikalai Yatskou
- Laboratoire de Biologie Moléculaire, d'Analyse Génique et de Modélisation, Centre de Recherche Public-Santé, 84 rue Val Fleuri, L-1526 Luxembourg, Luxembourg
| | - Arnaud Muller
- Laboratoire de Biologie Moléculaire, d'Analyse Génique et de Modélisation, Centre de Recherche Public-Santé, 84 rue Val Fleuri, L-1526 Luxembourg, Luxembourg
| | - Frédéric Chalmel
- Laboratoire de Bioinformatique et Génomique Intégratives, Institut de Génétique et de Biologie Moléculaire et Cellulaire; Inserm, U596; CNRS, UMR7104, F-67400 Illkirch, Université Louis Pasteur, F-67000 Strasbourg, France
- GERHM-Inserm U625, Université Rennes I, Campus de Beaulieu, Bt 13, Avenue du Général Leclerc, F-35042 Rennes cedex, France
| | - Olivier Poch
- Laboratoire de Bioinformatique et Génomique Intégratives, Institut de Génétique et de Biologie Moléculaire et Cellulaire; Inserm, U596; CNRS, UMR7104, F-67400 Illkirch, Université Louis Pasteur, F-67000 Strasbourg, France
| | - Evelyne Friederich
- Laboratoire de Biologie Moléculaire, d'Analyse Génique et de Modélisation, Centre de Recherche Public-Santé, 84 rue Val Fleuri, L-1526 Luxembourg, Luxembourg
- Cytoskeleton and cell plasticity laboratory, Life Sciences RU, University of Luxembourg, 162a Avenue de la faïencerie, L-1511 Luxembourg, Luxembourg
| | - Laurent Vallar
- Laboratoire de Biologie Moléculaire, d'Analyse Génique et de Modélisation, Centre de Recherche Public-Santé, 84 rue Val Fleuri, L-1526 Luxembourg, Luxembourg
| |
Collapse
|
16
|
Automatic extraction of gene ontology annotation and its correlation with clusters in protein networks. BMC Bioinformatics 2007; 8:243. [PMID: 17620146 PMCID: PMC1940026 DOI: 10.1186/1471-2105-8-243] [Citation(s) in RCA: 36] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2006] [Accepted: 07/10/2007] [Indexed: 11/23/2022] Open
Abstract
Background Uncovering cellular roles of a protein is a task of tremendous importance and complexity that requires dedicated experimental work as well as often sophisticated data mining and processing tools. Protein functions, often referred to as its annotations, are believed to manifest themselves through topology of the networks of inter-proteins interactions. In particular, there is a growing body of evidence that proteins performing the same function are more likely to interact with each other than with proteins with other functions. However, since functional annotation and protein network topology are often studied separately, the direct relationship between them has not been comprehensively demonstrated. In addition to having the general biological significance, such demonstration would further validate the data extraction and processing methods used to compose protein annotation and protein-protein interactions datasets. Results We developed a method for automatic extraction of protein functional annotation from scientific text based on the Natural Language Processing (NLP) technology. For the protein annotation extracted from the entire PubMed, we evaluated the precision and recall rates, and compared the performance of the automatic extraction technology to that of manual curation used in public Gene Ontology (GO) annotation. In the second part of our presentation, we reported a large-scale investigation into the correspondence between communities in the literature-based protein networks and GO annotation groups of functionally related proteins. We found a comprehensive two-way match: proteins within biological annotation groups form significantly denser linked network clusters than expected by chance and, conversely, densely linked network communities exhibit a pronounced non-random overlap with GO groups. We also expanded the publicly available GO biological process annotation using the relations extracted by our NLP technology. An increase in the number and size of GO groups without any noticeable decrease of the link density within the groups indicated that this expansion significantly broadens the public GO annotation without diluting its quality. We revealed that functional GO annotation correlates mostly with clustering in a physical interaction protein network, while its overlap with indirect regulatory network communities is two to three times smaller. Conclusion Protein functional annotations extracted by the NLP technology expand and enrich the existing GO annotation system. The GO functional modularity correlates mostly with the clustering in the physical interaction network, suggesting that the essential role of structural organization maintained by these interactions. Reciprocally, clustering of proteins in physical interaction networks can serve as an evidence for their functional similarity.
Collapse
|
17
|
Han L, Cui J, Lin H, Ji Z, Cao Z, Li Y, Chen Y. Recent progresses in the application of machine learning approach for predicting protein functional class independent of sequence similarity. Proteomics 2006; 6:4023-37. [PMID: 16791826 DOI: 10.1002/pmic.200500938] [Citation(s) in RCA: 50] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2023]
Abstract
Protein sequence contains clues to its function. Functional prediction from sequence presents a challenge particularly for proteins that have low or no sequence similarity to proteins of known function. Recently, machine learning methods have been explored for predicting functional class of proteins from sequence-derived properties independent of sequence similarity, which showed promising potential for low- and non-homologous proteins. These methods can thus be explored as potential tools to complement alignment- and clustering-based methods for predicting protein function. This article reviews the strategies, current progresses, and underlying difficulties in using machine learning methods for predicting the functional class of proteins. The relevant software and web-servers are described. The reported prediction performances in the application of these methods are also presented, which need to be interpreted with caution as they are dependent on such factors as datasets used and choice of parameters.
Collapse
Affiliation(s)
- Lianyi Han
- Department of Computational Science, National University of Singapore, Singapore, Singapore
| | | | | | | | | | | | | |
Collapse
|
18
|
Thompson JD, Muller A, Waterhouse A, Procter J, Barton GJ, Plewniak F, Poch O. MACSIMS: multiple alignment of complete sequences information management system. BMC Bioinformatics 2006; 7:318. [PMID: 16792820 PMCID: PMC1539025 DOI: 10.1186/1471-2105-7-318] [Citation(s) in RCA: 34] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2006] [Accepted: 06/23/2006] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND In the post-genomic era, systems-level studies are being performed that seek to explain complex biological systems by integrating diverse resources from fields such as genomics, proteomics or transcriptomics. New information management systems are now needed for the collection, validation and analysis of the vast amount of heterogeneous data available. Multiple alignments of complete sequences provide an ideal environment for the integration of this information in the context of the protein family. RESULTS MACSIMS is a multiple alignment-based information management program that combines the advantages of both knowledge-based and ab initio sequence analysis methods. Structural and functional information is retrieved automatically from the public databases. In the multiple alignment, homologous regions are identified and the retrieved data is evaluated and propagated from known to unknown sequences with these reliable regions. In a large-scale evaluation, the specificity of the propagated sequence features is estimated to be >99%, i.e. very few false positive predictions are made. MACSIMS is then used to characterise mutations in a test set of 100 proteins that are known to be involved in human genetic diseases. The number of sequence features associated with these proteins was increased by 60%, compared to the features available in the public databases. An XML format output file allows automatic parsing of the MACSIM results, while a graphical display using the JalView program allows manual analysis. CONCLUSION MACSIMS is a new information management system that incorporates detailed analyses of protein families at the structural, functional and evolutionary levels. MACSIMS thus provides a unique environment that facilitates knowledge extraction and the presentation of the most pertinent information to the biologist. A web server and the source code are available at http://bips.u-strasbg.fr/MACSIMS/.
Collapse
Affiliation(s)
- Julie D Thompson
- Laboratoire de Biologie et Genomique Structurales, Institut de Génétique et de Biologie Moléculaire et Cellulaire, Illkirch, France
| | - Arnaud Muller
- The Laboratory of Molecular Biology, Genetic Analysis & Modelling, Luxembourg
| | - Andrew Waterhouse
- Post Genomics & Molecular Interactions Centre, School of Life Sciences, University of Dundee, UK
| | - Jim Procter
- Post Genomics & Molecular Interactions Centre, School of Life Sciences, University of Dundee, UK
| | - Geoffrey J Barton
- Post Genomics & Molecular Interactions Centre, School of Life Sciences, University of Dundee, UK
| | - Frédéric Plewniak
- Laboratoire de Biologie et Genomique Structurales, Institut de Génétique et de Biologie Moléculaire et Cellulaire, Illkirch, France
| | - Olivier Poch
- Laboratoire de Biologie et Genomique Structurales, Institut de Génétique et de Biologie Moléculaire et Cellulaire, Illkirch, France
| |
Collapse
|
19
|
Carles A, Millon R, Cromer A, Ganguli G, Lemaire F, Young J, Wasylyk C, Muller D, Schultz I, Rabouel Y, Dembélé D, Zhao C, Marchal P, Ducray C, Bracco L, Abecassis J, Poch O, Wasylyk B. Head and neck squamous cell carcinoma transcriptome analysis by comprehensive validated differential display. Oncogene 2006; 25:1821-31. [PMID: 16261155 DOI: 10.1038/sj.onc.1209203] [Citation(s) in RCA: 89] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Head and neck squamous cell carcinoma (HNSCC) is common worldwide and is associated with a poor rate of survival. Identification of new markers and therapeutic targets, and understanding the complex transformation process, will require a comprehensive description of genome expression, that can only be achieved by combining different methodologies. We report here the HNSCC transcriptome that was determined by exhaustive differential display (DD) analysis coupled with validation by different methods on the same patient samples. The resulting 820 nonredundant sequences were analysed by high throughput bioinformatics analysis. Human proteins were identified for 73% (596) of the DD sequences. A large proportion (>50%) of the remaining unassigned sequences match ESTs (expressed sequence tags) from human tumours. For the functionally annotated proteins, there is significant enrichment for relevant biological processes, including cell motility, protein biosynthesis, stress and immune responses, cell death, cell cycle, cell proliferation and/or maintenance and transport. Three of the novel proteins (TMEM16A, PHLDB2 and ARHGAP21) were analysed further to show that they have the potential to be developed as therapeutic targets.
Collapse
Affiliation(s)
- A Carles
- Institut de Génétique et de Biologie Moléculaire et Cellulaire, CNRS/INSERM/ULP, 67404 Illkirch Cedex, France
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
20
|
Abou-Sleymane G, Chalmel F, Helmlinger D, Lardenois A, Thibault C, Weber C, Mérienne K, Mandel JL, Poch O, Devys D, Trottier Y. Polyglutamine expansion causes neurodegeneration by altering the neuronal differentiation program. Hum Mol Genet 2006; 15:691-703. [PMID: 16434483 DOI: 10.1093/hmg/ddi483] [Citation(s) in RCA: 53] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022] Open
Abstract
Huntington's disease (HD) and spinocerebellar ataxia type 7 (SCA7) belong to a group of inherited neurodegenerative diseases caused by polyglutamine (polyQ) expansion in corresponding proteins. Transcriptional alteration is a unifying feature of polyQ disorders; however, the relationship between polyQ-induced gene expression deregulation and degenerative processes remains unclear. R6/2 and R7E mouse models of HD and SCA7, respectively, present a comparable retinal degeneration characterized by progressive reduction of electroretinograph activity and important morphological changes of rod photoreceptors. The retina, which is a simple central nervous system tissue, allows correlating functional, morphological and molecular defects. Taking advantage of comparing polyQ-induced degeneration in two retina models, we combined gene expression profiling and molecular biology techniques to decipher the molecular pathways underlying polyQ expansion toxicity. We show that R7E and R6/2 retinal phenotype strongly correlates with loss of expression of a large cohort of genes specifically involved in phototransduction function and morphogenesis of differentiated rod photoreceptors. Accordingly, three key transcription factors (Nrl, Crx and Nr2e3) controlling rod differentiation genes, hence expression of photoreceptor specific traits, are down-regulated. Interestingly, other transcription factors known to cause inhibitory effects on photoreceptor differentiation when mis-expressed, such as Stat3, are aberrantly re-activated. Thus, our results suggest that independently from the protein context, polyQ expansion overrides the control of neuronal differentiation and maintenance, thereby causing dysfunction and degeneration.
Collapse
Affiliation(s)
- Gretta Abou-Sleymane
- Department of Molecular Pathology, Institut de Génétique et Biologie Moléculaire et Cellulaire (IGBMC), CNRS/INSERM/ULP, BP10142, 67404 Illkirch Cédex, CU de Strasbourg, France
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
21
|
Thompson JD, Holbrook SR, Katoh K, Koehl P, Moras D, Westhof E, Poch O. MAO: a Multiple Alignment Ontology for nucleic acid and protein sequences. Nucleic Acids Res 2005; 33:4164-71. [PMID: 16043635 PMCID: PMC1180671 DOI: 10.1093/nar/gki735] [Citation(s) in RCA: 16] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
The application of high-throughput techniques such as genomics, proteomics or transcriptomics means that vast amounts of heterogeneous data are now available in the public databases. Bioinformatics is responding to the challenge with new integrated management systems for data collection, validation and analysis. Multiple alignments of genomic and protein sequences provide an ideal environment for the integration of this mass of information. In the context of the sequence family, structural and functional data can be evaluated and propagated from known to unknown sequences. However, effective integration is being hindered by syntactic and semantic differences between the different data resources and the alignment techniques employed. One solution to this problem is the development of an ontology that systematically defines the terms used in a specific domain. Ontologies are used to share data from different resources, to automatically analyse information and to represent domain knowledge for non-experts. Here, we present MAO, a new ontology for multiple alignments of nucleic and protein sequences. MAO is designed to improve interoperation and data sharing between different alignment protocols for the construction of a high quality, reliable multiple alignment in order to facilitate knowledge extraction and the presentation of the most pertinent information to the biologist.
Collapse
Affiliation(s)
- Julie D Thompson
- Institut de Génétique et deBiologie Moléculaire et Cellulaire 1 rue Laurent Fries, B.P. 10142, 67404 Illkirch Cedex, France.
| | | | | | | | | | | | | |
Collapse
|
22
|
Hodges E, Redelius JS, Wu W, Höög C. Accelerated discovery of novel protein function in cultured human cells. Mol Cell Proteomics 2005; 4:1319-27. [PMID: 15965266 DOI: 10.1074/mcp.m500117-mcp200] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022] Open
Abstract
Experimental approaches that enable direct investigation of human protein function are necessary for comprehensive annotation of the human proteome. We introduce a cell-based platform for rapid and unbiased functional annotation of undercharacterized human proteins. Utilizing a library of antibody biomarkers, the full-length proteins are investigated by tracking phenotypic changes caused by overexpression in human cell lines. We combine reverse transfection and immunodetection by fluorescence microscopy to facilitate this procedure at high resolution. Demonstrating the advantage of this approach, new annotations are provided for two novel proteins: 1) a membrane-bound O-acyltransferase protein (C3F) that, when overexpressed, disrupts Golgi and endosome integrity due likely to an endoplasmic reticulum-Golgi transport block and 2) a tumor marker (BC-2) that prompts a redistribution of a transcriptional silencing protein (BMI1) and a mitogen-activated protein kinase mediator (Rac1) to distinct nuclear regions that undergo chromatin compaction. Our strategy is an immediate application for directly addressing those proteins whose molecular function remains unknown.
Collapse
Affiliation(s)
- Emily Hodges
- Center for Genomics and Bioinformatics, Karolinska Institute, SE-171 77 Stockholm, Sweden
| | | | | | | |
Collapse
|