1
|
Idhaya T, Suruliandi A, Raja SP. A Comprehensive Review on Machine Learning Techniques for Protein Family Prediction. Protein J 2024; 43:171-186. [PMID: 38427271 DOI: 10.1007/s10930-024-10181-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 01/19/2024] [Indexed: 03/02/2024]
Abstract
Proteomics is a field dedicated to the analysis of proteins in cells, tissues, and organisms, aiming to gain insights into their structures, functions, and interactions. A crucial aspect within proteomics is protein family prediction, which involves identifying evolutionary relationships between proteins by examining similarities in their sequences or structures. This approach holds great potential for applications such as drug discovery and functional annotation of genomes. However, current methods for protein family prediction have certain limitations, including limited accuracy, high false positive rates, and challenges in handling large datasets. Some methods also rely on homologous sequences or protein structures, which introduce biases and restrict their applicability to specific protein families or structures. To overcome these limitations, researchers have turned to machine learning (ML) approaches that can identify connections between protein features and simplify complex high-dimensional datasets. This paper presents a comprehensive survey of articles that employ various ML techniques for predicting protein families. The primary objective is to explore and improve ML techniques specifically for protein family prediction, thus advancing future research in the field. Through qualitative and quantitative analyses of ML techniques, it is evident that multiple methods utilizing a range of classifiers have been applied for protein family prediction. However, there has been limited focus on developing novel classifiers for protein family classification, highlighting the urgent need for improved approaches in this area. By addressing these challenges, this research aims to enhance the accuracy and effectiveness of protein family prediction, ultimately facilitating advancements in proteomics and its diverse applications.
Collapse
Affiliation(s)
- T Idhaya
- Department of Computer Science and Engineering, Manonmaniam Sundaranar University, Tirunelveli, TamilNadu, India.
| | - A Suruliandi
- Department of Computer Science and Engineering, Manonmaniam Sundaranar University, Tirunelveli, TamilNadu, India
| | - S P Raja
- School of Computer Science and Engineering, Vellore Institute of Technology, Vellore, TamilNadu, India
| |
Collapse
|
2
|
Kabir MN, Wong L. EnsembleFam: towards more accurate protein family prediction in the twilight zone. BMC Bioinformatics 2022; 23:90. [PMID: 35287576 PMCID: PMC8919565 DOI: 10.1186/s12859-022-04626-w] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2021] [Accepted: 03/02/2022] [Indexed: 11/30/2022] Open
Abstract
Background Current protein family modeling methods like profile Hidden Markov Model (pHMM), k-mer based methods, and deep learning-based methods do not provide very accurate protein function prediction for proteins in the twilight zone, due to low sequence similarity to reference proteins with known functions. Results We present a novel method EnsembleFam, aiming at better function prediction for proteins in the twilight zone. EnsembleFam extracts the core characteristics of a protein family using similarity and dissimilarity features calculated from sequence homology relations. EnsembleFam trains three separate Support Vector Machine (SVM) classifiers for each family using these features, and an ensemble prediction is made to classify novel proteins into these families. Extensive experiments are conducted using the Clusters of Orthologous Groups (COG) dataset and G Protein-Coupled Receptor (GPCR) dataset. EnsembleFam not only outperforms state-of-the-art methods on the overall dataset but also provides a much more accurate prediction for twilight zone proteins. Conclusions EnsembleFam, a machine learning method to model protein families, can be used to better identify members with very low sequence homology. Using EnsembleFam protein functions can be predicted using just sequence information with better accuracy than state-of-the-art methods.
Collapse
Affiliation(s)
- Mohammad Neamul Kabir
- Department of Computer Science, National University of Singapore, 13 Computing Drive, 117417, Singapore, Singapore.
| | - Limsoon Wong
- Department of Computer Science, National University of Singapore, 13 Computing Drive, 117417, Singapore, Singapore
| |
Collapse
|
3
|
Pinto-Almeida A, Mendes TMF, Ferreira P, Abecasis AB, Belo S, Anibal FF, Allegretti SM, Galinaro CA, Carrilho E, Afonso A. A Comparative Proteomic Analysis of Praziquantel-Susceptible and Praziquantel-Resistant Schistosoma mansoni Reveals Distinct Response Between Male and Female Animals. FRONTIERS IN TROPICAL DISEASES 2021. [DOI: 10.3389/fitd.2021.664642] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Schistosomiasis is a chronic neglected tropical disease saddling millions of people in the world, mainly children living in poor rural areas. Praziquantel (PZQ) is currently the only drug used for the treatment and control of this disease. However, the extensive use of this drug has brought concern about the emergence of PZQ-resistance/tolerance by Schistosoma mansoni. Studies of Schistosoma spp. genome, transcriptome, and proteome are crucial to better understand this situation. In this in vitro study, we compare the proteomes of a S. mansoni variant strain stably resistant to PZQ and isogenic to its fully susceptible parental counterpart, identifying proteins from male and female adult parasites of PZQ-resistant and PZQ-susceptible strains, exposed and not exposed to PZQ. A total of 60 Schistosoma spp. proteins were identified, some of which present or absent in either strain, which may putatively be involved in the PZQ-resistance phenomenon. These proteins were present in adult parasites not exposed to PZQ, but some of them disappeared when these adult parasites were exposed to the drug. Understanding the development of PZQ-resistance in S. mansoni is crucial to prolong the efficacy of the current drug and develop markers for monitoring the potential emergence of drug resistance.
Collapse
|
4
|
Chen J, Liu Z, Liu Y, Zhang X, Zeng J. Preliminary investigations on the pathogenesis-related protein expression profile of the medicinal herb Macleaya cordata and anti-bacterial properties of recombinant proteins. PHYTOCHEMISTRY 2021; 184:112667. [PMID: 33548769 DOI: 10.1016/j.phytochem.2021.112667] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/23/2020] [Revised: 01/07/2021] [Accepted: 01/09/2021] [Indexed: 06/12/2023]
Abstract
The plant pathogenesis-related (PR) proteins play a crucial role in the defense of plants against pathogens and orchestrate the innate immune system of plants. In this paper, a non-normalized cDNA library of the leaf was constructed to obtain a comprehensive view of PR proteins of Macleaya cordata. Specifically, 511 expressed sequence tags (ESTs) were generated using Sanger sequencing. All ESTs were assembled into 364 non-redundancy sequences, including 78 clusters and 286 singlets. The PR protein expression profile of the medicinal herb M. cordata has been investigated and is represented by defensin, lipid-transfer protein, (S)-norcoclaurine synthase, and major allergen protein, suggesting that the herb contains rich active proteins against pathogens. Furthermore, two defensins were selected for recombinant expression in yeast, and the antimicrobial activities were explored. Since they both present a broad antimicrobial spectrum, they are of particular importance for agricultural and medicinal applications. Our study describes defensins in Papaveraceae for the first time and provides novel insights into the effective components. In addition to the alkaloids, PR proteins (such as defensins, lipid transfer proteins, (S) - norcoclaurine synthase, major allergen protein, and Class IV chitinases) are involved in the antibacterial and anti-inflammatory activities of M. cordata.
Collapse
Affiliation(s)
- Jinjun Chen
- College of Bioscience and Biotechnology, Hunan Agricultural University, Changsha, Hunan, 410128, China.
| | - Zihao Liu
- College of Bioscience and Biotechnology, Hunan Agricultural University, Changsha, Hunan, 410128, China
| | - Yisong Liu
- Hunan Key Laboratory of Traditional Chinese Veterinary Medicine, Hunan Agricultural University, Changsha, 410128, China
| | - Xuewen Zhang
- College of Bioscience and Biotechnology, Hunan Agricultural University, Changsha, Hunan, 410128, China
| | - Jianguo Zeng
- Hunan Key Laboratory of Traditional Chinese Veterinary Medicine, Hunan Agricultural University, Changsha, 410128, China.
| |
Collapse
|
5
|
ProLanGO: Protein Function Prediction Using Neural Machine Translation Based on a Recurrent Neural Network. Molecules 2017; 22:molecules22101732. [PMID: 29039790 PMCID: PMC6151571 DOI: 10.3390/molecules22101732] [Citation(s) in RCA: 116] [Impact Index Per Article: 16.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2017] [Revised: 10/11/2017] [Accepted: 10/11/2017] [Indexed: 11/25/2022] Open
Abstract
With the development of next generation sequencing techniques, it is fast and cheap to determine protein sequences but relatively slow and expensive to extract useful information from protein sequences because of limitations of traditional biological experimental techniques. Protein function prediction has been a long standing challenge to fill the gap between the huge amount of protein sequences and the known function. In this paper, we propose a novel method to convert the protein function problem into a language translation problem by the new proposed protein sequence language “ProLan” to the protein function language “GOLan”, and build a neural machine translation model based on recurrent neural networks to translate “ProLan” language to “GOLan” language. We blindly tested our method by attending the latest third Critical Assessment of Function Annotation (CAFA 3) in 2016, and also evaluate the performance of our methods on selected proteins whose function was released after CAFA competition. The good performance on the training and testing datasets demonstrates that our new proposed method is a promising direction for protein function prediction. In summary, we first time propose a method which converts the protein function prediction problem to a language translation problem and applies a neural machine translation model for protein function prediction.
Collapse
|
6
|
Making sense of genomes of parasitic worms: Tackling bioinformatic challenges. Biotechnol Adv 2016; 34:663-686. [DOI: 10.1016/j.biotechadv.2016.03.001] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2015] [Revised: 02/25/2016] [Accepted: 03/01/2016] [Indexed: 01/25/2023]
|
7
|
Olender T, Safran M, Edgar R, Stelzer G, Nativ N, Rosen N, Shtrichman R, Mazor Y, West MD, Keydar I, Rappaport N, Belinky F, Warshawsky D, Lancet D. An Overview of Synergistic Data Tools for Biological Scrutiny. Isr J Chem 2013. [DOI: 10.1002/ijch.201200094] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
|
8
|
Jaramillo-Garzón JA, Gallardo-Chacón JJ, Castellanos-Domínguez CG, Perera-Lluna A. Predictability of gene ontology slim-terms from primary structure information in Embryophyta plant proteins. BMC Bioinformatics 2013; 14:68. [PMID: 23441934 PMCID: PMC3660269 DOI: 10.1186/1471-2105-14-68] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2012] [Accepted: 02/19/2013] [Indexed: 11/25/2022] Open
Abstract
Background Proteins are the key elements on the path from genetic information to the development of life. The roles played by the different proteins are difficult to uncover experimentally as this process involves complex procedures such as genetic modifications, injection of fluorescent proteins, gene knock-out methods and others. The knowledge learned from each protein is usually annotated in databases through different methods such as the proposed by The Gene Ontology (GO) consortium. Different methods have been proposed in order to predict GO terms from primary structure information, but very few are available for large-scale functional annotation of plants, and reported success rates are much less than the reported by other non-plant predictors. This paper explores the predictability of GO annotations on proteins belonging to the Embryophyta group from a set of features extracted solely from their primary amino acid sequence. Results High predictability of several GO terms was found for Molecular Function and Cellular Component. As expected, a lower degree of predictability was found on Biological Process ontology annotations, although a few biological processes were easily predicted. Proteins related to transport and transcription were particularly well predicted from primary structure information. The most discriminant features for prediction were those related to electric charges of the amino-acid sequence and hydropathicity derived features. Conclusions An analysis of GO-slim terms predictability in plants was carried out, in order to determine single categories or groups of functions that are most related with primary structure information. For each highly predictable GO term, the responsible features of such successfulness were identified and discussed. In addition to most published studies, focused on few categories or single ontologies, results in this paper comprise a complete landscape of GO predictability from primary structure encompassing 75 GO terms at molecular, cellular and phenotypical level. Thus, it provides a valuable guide for researchers interested on further advances in protein function prediction on Embryophyta plants.
Collapse
Affiliation(s)
- Jorge Alberto Jaramillo-Garzón
- Departamento de Ingeniería Eléctrica, Electrónica y Computación, Universidad Nacional de Colombia sede Manizales, Campus La Nubia, Km 7 Vía al Magdalena, Manizales-Caldas, Colombia.
| | | | | | | |
Collapse
|
9
|
Huang H, Wang Y, Wang S, Wu X, Yang K, Niu Y, Dai S. Transcriptome-wide survey and expression analysis of stress-responsive NAC genes in Chrysanthemum lavandulifolium. PLANT SCIENCE : AN INTERNATIONAL JOURNAL OF EXPERIMENTAL PLANT BIOLOGY 2012; 193-194:18-27. [PMID: 22794915 DOI: 10.1016/j.plantsci.2012.05.004] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/16/2012] [Revised: 05/09/2012] [Accepted: 05/10/2012] [Indexed: 05/22/2023]
Abstract
The plant-specific NAC (NAM, ATAF, and CUC) transcription factor family plays a vital role in various plant growth and developmental processes as well as in stress resistance. Using RNA sequencing, we found that the ClNAC genes (ClNAC1-44) were the most strongly up-regulated transcription factor family in Chrysanthemum lavandulifolium leaves under salt treatment. We carried out reverse transcriptase polymerase chain reaction to monitor ClNAC genes response against multiple stresses and hormonal treatments including salt, drought, cold, heat, abscisic acid and salicylic acid treatments. The results showed that 35 ClNAC genes were differentially expressed in different organ, and 32 ClNAC genes could respond to at least 2 kinds of treatments. Quantitative real time polymerase chain reaction showed that 10 ClNAC genes belonging to 7 different subfamilies could respond to at least 5 kinds of treatments. Over 50-fold variation in transcriptional levels of ClNAC17 and ClNAC21 genes was observed under 6 different types of treatments. In the present study, high-level expression of ClNAC genes under abiotic stresses and hormonal treatments suggests that the NAC transcription factors play important roles in abiotic stress tolerance and adaptation.
Collapse
Affiliation(s)
- He Huang
- College of Landscape Architecture, Beijing Forestry University, Beijing 100038, China
| | - Yi Wang
- College of Landscape Architecture, Beijing Forestry University, Beijing 100038, China
| | - Shunli Wang
- College of Landscape Architecture, Beijing Forestry University, Beijing 100038, China; College of Life Science, Beijing Forestry University, Beijing 100038, China
| | - Xuan Wu
- College of Foresty, Beijing Forestry University, Beijing 100038, China
| | - Ke Yang
- College of Landscape Architecture, Beijing Forestry University, Beijing 100038, China; School of Forestry and Environmental Studies, Yale University, New Haven, CT 06511, USA
| | - Yajing Niu
- College of Landscape Architecture, Beijing Forestry University, Beijing 100038, China
| | - Silan Dai
- College of Landscape Architecture, Beijing Forestry University, Beijing 100038, China.
| |
Collapse
|
10
|
Yalamanchili HK, Xiao QW, Wang J. A novel neural response algorithm for protein function prediction. BMC SYSTEMS BIOLOGY 2012; 6 Suppl 1:S19. [PMID: 23046521 PMCID: PMC3403322 DOI: 10.1186/1752-0509-6-s1-s19] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Background Large amounts of data are being generated by high-throughput genome sequencing methods. But the rate of the experimental functional characterization falls far behind. To fill the gap between the number of sequences and their annotations, fast and accurate automated annotation methods are required. Many methods, such as GOblet, GOFigure, and Gotcha, are designed based on the BLAST search. Unfortunately, the sequence coverage of these methods is low as they cannot detect the remote homologues. Adding to this, the lack of annotation specificity advocates the need to improve automated protein function prediction. Results We designed a novel automated protein functional assignment method based on the neural response algorithm, which simulates the neuronal behavior of the visual cortex in the human brain. Firstly, we predict the most similar target protein for a given query protein and thereby assign its GO term to the query sequence. When assessed on test set, our method ranked the actual leaf GO term among the top 5 probable GO terms with accuracy of 86.93%. Conclusions The proposed algorithm is the first instance of neural response algorithm being used in the biological domain. The use of HMM profiles along with the secondary structure information to define the neural response gives our method an edge over other available methods on annotation accuracy. Results of the 5-fold cross validation and the comparison with PFP and FFPred servers indicate the prominent performance by our method. The program, the dataset, and help files are available at http://www.jjwanglab.org/NRProF/.
Collapse
Affiliation(s)
- Hari Krishna Yalamanchili
- Department of Biochemistry, LKS Faculty of Medicine, The University of Hong Kong, Hong Kong SAR, China
| | | | | |
Collapse
|
11
|
Pascovici D, Keighley T, Mirzaei M, Haynes PA, Cooke B. PloGO: Plotting gene ontology annotation and abundance in multi-condition proteomics experiments. Proteomics 2012; 12:406-10. [DOI: 10.1002/pmic.201100445] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2011] [Revised: 10/26/2011] [Accepted: 11/22/2011] [Indexed: 12/26/2022]
|
12
|
NGUYEN CAO, MANNINO MICHAEL, GARDINER KATHELEEN, CIOS KRZYSZTOFJ. ClusFCM: AN ALGORITHM FOR PREDICTING PROTEIN FUNCTIONS USING HOMOLOGIES AND PROTEIN INTERACTIONS. J Bioinform Comput Biol 2011; 6:203-22. [DOI: 10.1142/s0219720008003333] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2007] [Revised: 09/26/2007] [Accepted: 10/24/2007] [Indexed: 11/18/2022]
Abstract
We introduce a new algorithm, called ClusFCM, which combines techniques of clustering and fuzzy cognitive maps (FCM) for prediction of protein functions. ClusFCM takes advantage of protein homologies and protein interaction network topology to improve low recall predictions associated with existing prediction methods. ClusFCM exploits the fact that proteins of known function tend to cluster together and deduce functions not only through their direct interaction with other proteins, but also from other proteins in the network. We use ClusFCM to annotate protein functions for Saccharomyces cerevisiae (yeast), Caenorhabditis elegans (worm), and Drosophila melanogaster (fly) using protein–protein interaction data from the General Repository for Interaction Datasets (GRID) database and functional labels from Gene Ontology (GO) terms. The algorithm's performance is compared with four state-of-the-art methods for function prediction — Majority, χ2 statistics, Markov random field (MRF), and FunctionalFlow — using measures of Matthews correlation coefficient, harmonic mean, and area under the receiver operating characteristic (ROC) curves. The results indicate that ClusFCM predicts protein functions with high recall while not lowering precision. Supplementary information is available at .
Collapse
Affiliation(s)
- CAO NGUYEN
- Virginia Commonwealth University, VA 23238, USA
| | | | | | - KRZYSZTOF J. CIOS
- Virginia Commonwealth University, VA 23238, USA
- University of Colorado Boulder, Boulder, CO 80309, USA
- Polish Academy of Sciences, Poland
| |
Collapse
|
13
|
Janga SC, Díaz-Mejía JJ, Moreno-Hagelsieb G. Network-based function prediction and interactomics: the case for metabolic enzymes. Metab Eng 2010; 13:1-10. [PMID: 20654726 DOI: 10.1016/j.ymben.2010.07.001] [Citation(s) in RCA: 35] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2010] [Revised: 07/15/2010] [Accepted: 07/16/2010] [Indexed: 12/19/2022]
Abstract
As sequencing technologies increase in power, determining the functions of unknown proteins encoded by the DNA sequences so produced becomes a major challenge. Functional annotation is commonly done on the basis of amino-acid sequence similarity alone. Long after sequence similarity becomes undetectable by pair-wise comparison, profile-based identification of homologs can often succeed due to the conservation of position-specific patterns, important for a protein's three dimensional folding and function. Nevertheless, prediction of protein function from homology-driven approaches is not without problems. Homologous proteins might evolve different functions and the power of homology detection has already started to reach its maximum. Computational methods for inferring protein function, which exploit the context of a protein in cellular networks, have come to be built on top of homology-based approaches. These network-based functional inference techniques provide both a first hand hint into a proteins' functional role and offer complementary insights to traditional methods for understanding the function of uncharacterized proteins. Most recent network-based approaches aim to integrate diverse kinds of functional interactions to boost both coverage and confidence level. These techniques not only promise to solve the moonlighting aspect of proteins by annotating proteins with multiple functions, but also increase our understanding on the interplay between different functional classes in a cell. In this article we review the state of the art in network-based function prediction and describe some of the underlying difficulties and successes. Given the volume of high-throughput data that is being reported the time is ripe to employ these network-based approaches, which can be used to unravel the functions of the uncharacterized proteins accumulating in the genomic databases.
Collapse
Affiliation(s)
- S C Janga
- MRC Laboratory of Molecular Biology, Hills Road, Cambridge CB20QH, United Kingdom.
| | | | | |
Collapse
|
14
|
Hori TS, Gamperl AK, Afonso LOB, Johnson SC, Hubert S, Kimball J, Bowman S, Rise ML. Heat-shock responsive genes identified and validated in Atlantic cod (Gadus morhua) liver, head kidney and skeletal muscle using genomic techniques. BMC Genomics 2010; 11:72. [PMID: 20109224 PMCID: PMC2830189 DOI: 10.1186/1471-2164-11-72] [Citation(s) in RCA: 64] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2009] [Accepted: 01/28/2010] [Indexed: 01/17/2023] Open
Abstract
BACKGROUND Daily and seasonal changes in temperature are challenges that fish within aquaculture settings cannot completely avoid, and are known to elicit complex organismal and cellular stress responses. We conducted a large-scale gene discovery and transcript expression study in order to better understand the genes that are potentially involved in the physiological and cellular aspects of stress caused by heat-shock. We used suppression subtractive hybridization (SSH) cDNA library construction and characterization to identify transcripts that were dysregulated by heat-shock in liver, skeletal muscle and head kidney of Atlantic cod. These tissues were selected due to their roles in metabolic regulation, locomotion and growth, and immune function, respectively. Fish were exposed for 3 hours to an 8 degrees C elevation in temperature, and then allowed to recover for 24 hours at the original temperature (i.e. 10 degrees C). Tissue samples obtained before heat-shock (BHS), at the cessation of heat-shock (CS), and 3, 12, and 24 hours after the cessation of heat-shock (ACS), were used for reciprocal SSH library construction and quantitative reverse transcription - polymerase chain reaction (QPCR) analysis of gene expression using samples from a group that was transferred but not heat-shocked (CT) as controls. RESULTS We sequenced and characterized 4394 ESTs (1524 from liver, 1451 from head kidney and 1419 from skeletal muscle) from three "forward subtracted" libraries (enriched for genes up-regulated by heat-shock) and 1586 from the liver "reverse subtracted" library (enriched for genes down-regulated by heat-shock), for a total of 5980 ESTs. Several cDNAs encoding putative chaperones belonging to the heat-shock protein (HSP) family were found in these libraries, and "protein folding" was among the gene ontology (GO) terms with the highest proportion in the libraries. QPCR analysis of HSP90alpha and HSP70-1 (synonym: HSPA1A) mRNA expression showed significant up-regulation in all three tissues studied. These transcripts were more than 100-fold up-regulated in liver following heat-shock. We also identified HSP47, GRP78 and GRP94-like transcripts, which were significantly up-regulated in all 3 tissues studied. Toll-like receptor 22 (TLR22) transcript, found in the liver reverse SSH library, was shown by QPCR to be significantly down-regulated in the head kidney after heat-shock. CONCLUSION Chaperones are an important part of the cellular response to stress, and genes identified in this work may play important roles in resistance to thermal-stress. Moreover, the transcript for one key immune response gene (TLR22) was down-regulated by heat-shock, and this down-regulation may be a component of heat-induced immunosuppression.
Collapse
Affiliation(s)
- Tiago S Hori
- Ocean Sciences Centre, Memorial University of Newfoundland, St. John's, NL, A1C 5S7, Canada
| | - A Kurt Gamperl
- Ocean Sciences Centre, Memorial University of Newfoundland, St. John's, NL, A1C 5S7, Canada
| | - Luis OB Afonso
- British Columbia Centre for Aquatic Health Sciences, Campbell River, BC, V9W 2C2, Canada
| | - Stewart C Johnson
- Pacific Biological Station, Department for Fisheries and Oceans, Nanaimo, BC, V9T 6N7, Canada
| | - Sophie Hubert
- The Atlantic Genome Centre, Halifax, NS, B3H 3Z1, Canada
| | - Jennifer Kimball
- Institute for Marine Biosciences, National Research Council of Canada, Halifax, NS, B3H 3Z1, Canada
| | - Sharen Bowman
- The Atlantic Genome Centre, Halifax, NS, B3H 3Z1, Canada
| | - Matthew L Rise
- Ocean Sciences Centre, Memorial University of Newfoundland, St. John's, NL, A1C 5S7, Canada
| |
Collapse
|
15
|
Li T, Brouwer M. Bioinformatic analysis of expressed sequence tags from grass shrimp Palaemonetes pugio exposed to environmental stressors. COMPARATIVE BIOCHEMISTRY AND PHYSIOLOGY D-GENOMICS & PROTEOMICS 2009; 4:187-95. [DOI: 10.1016/j.cbd.2009.03.001] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/13/2008] [Revised: 03/01/2009] [Accepted: 03/02/2009] [Indexed: 11/26/2022]
|
16
|
Sant'Anna C, Nakayasu ES, Pereira MG, Lourenço D, de Souza W, Almeida IC, Cunha-E-Silva NL. Subcellular proteomics of Trypanosoma cruzi reservosomes. Proteomics 2009; 9:1782-94. [PMID: 19288526 DOI: 10.1002/pmic.200800730] [Citation(s) in RCA: 55] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023]
Abstract
Reservosomes are the endpoint of the endocytic pathway in Trypanosoma cruzi epimastigotes. These organelles have the particular ability to concentrate proteins and lipids obtained from medium together with the main proteolytic enzymes originated from the secretory pathway, being at the same time a storage organelle and the main site of protein degradation. Subcellular proteomics have been extensively used for profiling organelles in different cell types. Here, we combine cell fractionation and LC-MS/MS analysis to identify reservosome-resident proteins. Starting from a purified reservosome fraction, we established a protocol to isolate reservosome membranes. Transmission electron microscopy was applied to confirm the purity of the fractions. To achieve a better coverage of identified proteins we analyzed the fractions separately and combined the results. LC-MS/MS analysis identified in total 709 T. cruzi-specific proteins; of these, 456 had predicted function and 253 were classified as hypothetical proteins. We could confirm the presence of most of the proteins validated by previous work and identify new proteins from different classes such as enzymes, proton pumps, transport proteins, and others. The definition of the reservosome protein profile is a good tool to assess their molecular signature, identify molecular markers, and understand their relationship with different organelles.
Collapse
Affiliation(s)
- Celso Sant'Anna
- Laboratório de Ultraestrutura Celular Hertha Meyer, Instituto de Biofísica Carlos Chagas Filho, Universidade Federal do Rio de Janeiro, Brazil
| | | | | | | | | | | | | |
Collapse
|
17
|
A 454 sequencing approach for large scale phylogenomic analysis of the common emperor scorpion (Pandinus imperator). Mol Phylogenet Evol 2009; 53:826-34. [PMID: 19695333 DOI: 10.1016/j.ympev.2009.08.014] [Citation(s) in RCA: 58] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2009] [Revised: 08/02/2009] [Accepted: 08/04/2009] [Indexed: 10/20/2022]
Abstract
In recent years, phylogenetic tree reconstructions that rely on multiple gene alignments that had been deduced from expressed sequence tags (ESTs) have become a popular method in molecular systematics. Here, we present a 454 pyrosequencing approach to infer the transcriptome of the Emperor scorpion Pandinus imperator. We obtained 428,844 high-quality reads (mean length=223+/-50 b) from total cDNA, which were assembled into 8334 contigs (mean length 422+/-313 bp) and 26,147 singletons. About 1200 contigs were successfully annotated by BLAST and orthology search. Specific analyses of eight distinct hemocyanin sequences provided further proof for the quality of the 454 reads and the assembly process. The P. imperator sequences were included in a concatenated alignment of 149 orthologous genes of 67 metazoan taxa that covers 39,842 amino acids. After removal of low-quality regions, 11,168 positions were employed for phylogenetic reconstructions. Using Bayesian and maximum likelihood methods, we obtained strongly supported monophyletic Ecdysozoa, Arthropoda (excluding Tardigrada), Euarthropoda, Pancrustacea and Hexapoda. We also recovered the Myriochelata (Chelicerata+Myriapoda). Within the chelicerates, Pycnogonida form the sister group of Euchelicerata. However, Arachnida were found paraphyletic because the Acari (mites and ticks) were recovered as sister group of a clade comprising Xiphosura, Scorpiones and Araneae. In summary, we have shown that 454 pyrosequencing is a cost-effective method that provides sufficient data and coverage depth for gene detection and multigene-based phylogenetic analyses.
Collapse
|
18
|
Kim C, Lemke C, Paterson AH. Functional dissection of drought-responsive gene expression patterns in Cynodon dactylon L. PLANT MOLECULAR BIOLOGY 2009; 70:1-16. [PMID: 19152115 DOI: 10.1007/s11103-009-9453-6] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/11/2008] [Accepted: 01/05/2009] [Indexed: 05/08/2023]
Abstract
Water deficit is one of the main abiotic factors that affect plant productivity in subtropical regions. To identify genes induced during the water stress response in Bermudagrass (Cynodon dactylon), cDNA macroarrays were used. The macroarray analysis identified 189 drought-responsive candidate genes from C. dactylon, of which 120 were up-regulated and 69 were down-regulated. The candidate genes were classified into seven groups by cluster analysis of expression levels across two intensities and three durations of imposed stress. Annotation using BLASTX suggested that up-regulated genes may be involved in proline biosynthesis, signal transduction pathways, protein repair systems, and removal of toxins, while down-regulated genes were mostly related to basic plant metabolism such as photosynthesis and glycolysis. The functional classification of gene ontology (GO) was consistent with the BLASTX results, also suggesting some crosstalk between abiotic and biotic stress. Comparative analysis of cis-regulatory elements from the candidate genes implicated specific elements in drought response in Bermudagrass. Although only a subset of genes was studied, Bermudagrass shared many drought-responsive genes and cis-regulatory elements with other botanical models, supporting a strategy of cross-taxon application of drought-responsive genes, regulatory cues, and physiological-genetic information.
Collapse
Affiliation(s)
- Changsoo Kim
- Plant Genome Mapping Laboratory, University of Georgia, 111 Riverbend Road, Athens, GA 30602, USA
| | | | | |
Collapse
|
19
|
Lin HC, Morcillo F, Dussert S, Tranchant-Dubreuil C, Tregear JW, Tranbarger TJ. Transcriptome analysis during somatic embryogenesis of the tropical monocot Elaeis guineensis: evidence for conserved gene functions in early development. PLANT MOLECULAR BIOLOGY 2009; 70:173-92. [PMID: 19199047 DOI: 10.1007/s11103-009-9464-3] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/26/2008] [Accepted: 01/21/2009] [Indexed: 05/08/2023]
Abstract
With the aim of understanding the molecular mechanisms underlying somatic embryogenesis (SE) in oil palm, we examined transcriptome changes that occur when embryogenic suspension cells are initiated to develop somatic embryos. Two reciprocal suppression subtractive hybridization (SSH) libraries were constructed from oil palm embryogenic cell suspensions: one in which embryo development was blocked by the presence of the synthetic auxin analogue 2,4-dichlorophenoxyacetic acid (2,4-D: ) in the medium (proliferation library); and another in which cells were stimulated to form embryos by the removal of 2,4-D: from the medium (initiation library). A total of 1867 Expressed Sequence Tags (ESTs) consisting of 1567 potential unigenes were assembled from the two libraries. Functional annotation indicated that 928 of the ESTs correspond to proteins that have either no similarity to sequences in public databases or are of unknown function. Gene Ontology (GO) terms assigned to the two EST populations give clues to the underlying molecular functions, biological processes and cellular components involved in the initiation of embryo development. Macroarrays were used for transcript profiling the ESTs during SE. Hierarchical cluster analysis of differential transcript accumulation revealed 4 distinct profiles containing a total of 192 statistically significant developmentally regulated transcripts. Similarities and differences between the global results obtained with in vitro systems from dicots, monocots and gymnosperms will be discussed.
Collapse
Affiliation(s)
- Hsiang-Chun Lin
- IRD, UMR DIAPC, IRD/CIRAD Palm Development Group, 911 Avenue Agropolis, BP 64501, 34394, Montpellier Cedex 5, France
| | | | | | | | | | | |
Collapse
|
20
|
Skolnick J, Brylinski M. FINDSITE: a combined evolution/structure-based approach to protein function prediction. Brief Bioinform 2009; 10:378-91. [PMID: 19324930 DOI: 10.1093/bib/bbp017] [Citation(s) in RCA: 72] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
A key challenge of the post-genomic era is the identification of the function(s) of all the molecules in a given organism. Here, we review the status of sequence and structure-based approaches to protein function inference and ligand screening that can provide functional insights for a significant fraction of the approximately 50% of ORFs of unassigned function in an average proteome. We then describe FINDSITE, a recently developed algorithm for ligand binding site prediction, ligand screening and molecular function prediction, which is based on binding site conservation across evolutionary distant proteins identified by threading. Importantly, FINDSITE gives comparable results when high-resolution experimental structures as well as predicted protein models are used.
Collapse
Affiliation(s)
- Jeffrey Skolnick
- Center for the Study of Systems Biology, School of Biology, Georgia Institute of Technology 250 14th St NW, Atlanta, GA 30318, USA.
| | | |
Collapse
|
21
|
Fontana P, Cestaro A, Velasco R, Formentin E, Toppo S. Rapid annotation of anonymous sequences from genome projects using semantic similarities and a weighting scheme in gene ontology. PLoS One 2009; 4:e4619. [PMID: 19247487 PMCID: PMC2645684 DOI: 10.1371/journal.pone.0004619] [Citation(s) in RCA: 29] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2008] [Accepted: 01/09/2009] [Indexed: 11/22/2022] Open
Abstract
Background Large-scale sequencing projects have now become routine lab practice and this has led to the development of a new generation of tools involving function prediction methods, bringing the latter back to the fore. The advent of Gene Ontology, with its structured vocabulary and paradigm, has provided computational biologists with an appropriate means for this task. Methodology We present here a novel method called ARGOT (Annotation Retrieval of Gene Ontology Terms) that is able to process quickly thousands of sequences for functional inference. The tool exploits for the first time an integrated approach which combines clustering of GO terms, based on their semantic similarities, with a weighting scheme which assesses retrieved hits sharing a certain number of biological features with the sequence to be annotated. These hits may be obtained by different methods and in this work we have based ARGOT processing on BLAST results. Conclusions The extensive benchmark involved 10,000 protein sequences, the complete S. cerevisiae genome and a small subset of proteins for purposes of comparison with other available tools. The algorithm was proven to outperform existing methods and to be suitable for function prediction of single proteins due to its high degree of sensitivity, specificity and coverage.
Collapse
Affiliation(s)
- Paolo Fontana
- FEM-IASMA Research Center, San Michele all'Adige (TN), Italy
| | | | | | | | - Stefano Toppo
- Department of Biological Chemistry, University of Padova, Padova, Italy
- * E-mail:
| |
Collapse
|
22
|
Hawkins T, Chitale M, Luban S, Kihara D. PFP: Automated prediction of gene ontology functional annotations with confidence scores using protein sequence data. Proteins 2009; 74:566-82. [PMID: 18655063 DOI: 10.1002/prot.22172] [Citation(s) in RCA: 79] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
Protein function prediction is a central problem in bioinformatics, increasing in importance recently due to the rapid accumulation of biological data awaiting interpretation. Sequence data represents the bulk of this new stock and is the obvious target for consideration as input, as newly sequenced organisms often lack any other type of biological characterization. We have previously introduced PFP (Protein Function Prediction) as our sequence-based predictor of Gene Ontology (GO) functional terms. PFP interprets the results of a PSI-BLAST search by extracting and scoring individual functional attributes, searching a wide range of E-value sequence matches, and utilizing conventional data mining techniques to fill in missing information. We have shown it to be effective in predicting both specific and low-resolution functional attributes when sufficient data is unavailable. Here we describe (1) significant improvements to the PFP infrastructure, including the addition of prediction significance and confidence scores, (2) a thorough benchmark of performance and comparisons to other related prediction methods, and (3) applications of PFP predictions to genome-scale data. We applied PFP predictions to uncharacterized protein sequences from 15 organisms. Among these sequences, 60-90% could be annotated with a GO molecular function term at high confidence (>or=80%). We also applied our predictions to the protein-protein interaction network of the Malaria plasmodium (Plasmodium falciparum). High confidence GO biological process predictions (>or=90%) from PFP increased the number of fully enriched interactions in this dataset from 23% of interactions to 94%. Our benchmark comparison shows significant performance improvement of PFP relative to GOtcha, InterProScan, and PSI-BLAST predictions. This is consistent with the performance of PFP as the overall best predictor in both the AFP-SIG '05 and CASP7 function (FN) assessments. PFP is available as a web service at http://dragon.bio.purdue.edu/pfp/.
Collapse
Affiliation(s)
- Troy Hawkins
- Department of Biological Sciences, College of Science, Purdue University, West Lafayette, Indiana 47907, USA
| | | | | | | |
Collapse
|
23
|
Chen J, Zhao L, Jiang L, Meng E, Zhang Y, Xiong X, Liang S. Transcriptome analysis revealed novel possible venom components and cellular processes of the tarantula Chilobrachys jingzhao venom gland. Toxicon 2008; 52:794-806. [DOI: 10.1016/j.toxicon.2008.08.003] [Citation(s) in RCA: 39] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2008] [Revised: 08/03/2008] [Accepted: 08/12/2008] [Indexed: 11/30/2022]
|
24
|
Use of genomic DNA as an indirect reference for identifying gender-associated transcripts in morphologically identical, but chromosomally distinct, Schistosoma mansoni cercariae. PLoS Negl Trop Dis 2008; 2:e323. [PMID: 18941520 PMCID: PMC2565838 DOI: 10.1371/journal.pntd.0000323] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2008] [Accepted: 09/24/2008] [Indexed: 11/19/2022] Open
Abstract
BACKGROUND The use of DNA microarray technology to study global Schistosoma gene expression has led to the rapid identification of novel biological processes, pathways or associations. Implementation of standardized DNA microarray protocols across laboratories would assist maximal interpretation of generated datasets and extend productive application of this technology. METHODOLOGY/PRINCIPAL FINDINGS Utilizing a new Schistosoma mansoni oligonucleotide DNA microarray composed of 37,632 elements, we show that schistosome genomic DNA (gDNA) hybridizes with less variation compared to complex mixed pools of S. mansoni cDNA material (R = 0.993 for gDNA compared to R = 0.956 for cDNA during 'self versus self' hybridizations). Furthermore, these effects are species-specific, with S. japonicum or Mus musculus gDNA failing to bind significantly to S. mansoni oligonucleotide DNA microarrays (e.g R = 0.350 when S. mansoni gDNA is co-hybridized with S. japonicum gDNA). Increased median fluorescent intensities (209.9) were also observed for DNA microarray elements hybridized with S. mansoni gDNA compared to complex mixed pools of S. mansoni cDNA (112.2). Exploiting these valuable characteristics, S. mansoni gDNA was used in two-channel DNA microarray hybridization experiments as a common reference for indirect identification of gender-associated transcripts in cercariae, a schistosome life-stage in which there is no overt sexual dimorphism. This led to the identification of 2,648 gender-associated transcripts. When compared to the 780 gender-associated transcripts identified by hybridization experiments utilizing a two-channel direct method (co-hybridization of male and female cercariae cDNA), indirect methods using gDNA were far superior in identifying greater quantities of differentially expressed transcripts. Interestingly, both methods identified a concordant subset of 188 male-associated and 156 female-associated cercarial transcripts, respectively. Gene ontology classification of these differentially expressed transcripts revealed a greater diversity of categories in male cercariae. Quantitative real-time PCR analysis confirmed the DNA microarray results and supported the reliability of this platform for identifying gender-associated transcripts. CONCLUSIONS/SIGNIFICANCE Schistosome gDNA displays characteristics highly suitable for the comparison of two-channel DNA microarray results obtained from experiments conducted independently across laboratories. The schistosome transcripts identified here demonstrate, for the first time, that gender-associated patterns of expression are already well established in the morphologically identical, but chromosomally distinct, cercariae stage.
Collapse
|
25
|
Kim C, Jang CS, Kamps TL, Robertson JS, Feltus FA, Paterson AH. Transcriptome analysis of leaf tissue from Bermudagrass (Cynodon dactylon) using a normalised cDNA library. FUNCTIONAL PLANT BIOLOGY : FPB 2008; 35:585-594. [PMID: 32688814 DOI: 10.1071/fp08133] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/18/2008] [Accepted: 06/03/2008] [Indexed: 06/11/2023]
Abstract
A normalised cDNA library was constructed from Bermudagrass to gain insight into the transcriptome of Cynodon dactylon L. A total of 15 588 high-quality expressed sequence tags (ESTs) from the cDNA library were subjected to The Institute for Genomic Research Gene Indices clustering tools to produce a unigene set. A total of 9414 unigenes were obtained from the high-quality ESTs and only 39.6% of the high-quality ESTs were redundant, indicating that the normalisation procedure was effective. A large-scale comparative genomic analysis of the unigenes was carried out using publicly available tools, such as BLAST, InterProScan and Gene Ontology. The unigenes were also subjected to a search for EST-derived simple sequence repeats (EST-SSRs) and conserved-intron scanning primers (CISPs), which are useful as DNA markers. Although the candidate EST-SSRs and CISPs found in the present study need to be empirically tested, they are expected to be useful as DNA markers for many purposes, including comparative genomic studies of grass species, by virtue of their significant similarities to EST sequences from other grasses. Thus, knowledge of Cynodon ESTs will empower turfgrass research by providing homologues for genes that are thought to confer important functions in other plants.
Collapse
Affiliation(s)
- Changsoo Kim
- Center for Applied Genetic Technologies, University of Georgia, Athens, GA 30602, USA
| | - Cheol Seong Jang
- Center for Applied Genetic Technologies, University of Georgia, Athens, GA 30602, USA
| | - Terry L Kamps
- Center for Applied Genetic Technologies, University of Georgia, Athens, GA 30602, USA
| | - Jon S Robertson
- Center for Applied Genetic Technologies, University of Georgia, Athens, GA 30602, USA
| | - Frank A Feltus
- Center for Applied Genetic Technologies, University of Georgia, Athens, GA 30602, USA
| | - Andrew H Paterson
- Center for Applied Genetic Technologies, University of Georgia, Athens, GA 30602, USA
| |
Collapse
|
26
|
Götz S, García-Gómez JM, Terol J, Williams TD, Nagaraj SH, Nueda MJ, Robles M, Talón M, Dopazo J, Conesa A. High-throughput functional annotation and data mining with the Blast2GO suite. Nucleic Acids Res 2008; 36:3420-35. [PMID: 18445632 PMCID: PMC2425479 DOI: 10.1093/nar/gkn176] [Citation(s) in RCA: 2965] [Impact Index Per Article: 185.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open
Abstract
Functional genomics technologies have been widely adopted in the biological research of both model and non-model species. An efficient functional annotation of DNA or protein sequences is a major requirement for the successful application of these approaches as functional information on gene products is often the key to the interpretation of experimental results. Therefore, there is an increasing need for bioinformatics resources which are able to cope with large amount of sequence data, produce valuable annotation results and are easily accessible to laboratories where functional genomics projects are being undertaken. We present the Blast2GO suite as an integrated and biologist-oriented solution for the high-throughput and automatic functional annotation of DNA or protein sequences based on the Gene Ontology vocabulary. The most outstanding Blast2GO features are: (i) the combination of various annotation strategies and tools controlling type and intensity of annotation, (ii) the numerous graphical features such as the interactive GO-graph visualization for gene-set function profiling or descriptive charts, (iii) the general sequence management features and (iv) high-throughput capabilities. We used the Blast2GO framework to carry out a detailed analysis of annotation behaviour through homology transfer and its impact in functional genomics research. Our aim is to offer biologists useful information to take into account when addressing the task of functionally characterizing their sequence data.
Collapse
Affiliation(s)
- Stefan Götz
- Bioinformatics Department, Centro de Investigación Principe Felipe, Valencia, Spain
| | | | | | | | | | | | | | | | | | | |
Collapse
|
27
|
Lazzari B, Caprera A, Vecchietti A, Merelli I, Barale F, Milanesi L, Stella A, Pozzi C. Version VI of the ESTree db: an improved tool for peach transcriptome analysis. BMC Bioinformatics 2008; 9 Suppl 2:S9. [PMID: 18387211 PMCID: PMC2323672 DOI: 10.1186/1471-2105-9-s2-s9] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
Background The ESTree database (db) is a collection of Prunus persica and Prunus dulcis EST sequences that in its current version encompasses 75,404 sequences from 3 almond and 19 peach libraries. Nine peach genotypes and four peach tissues are represented, from four fruit developmental stages. The aim of this work was to implement the already existing ESTree db by adding new sequences and analysis programs. Particular care was given to the implementation of the web interface, that allows querying each of the database features. Results A Perl modular pipeline is the backbone of sequence analysis in the ESTree db project. Outputs obtained during the pipeline steps are automatically arrayed into the fields of a MySQL database. Apart from standard clustering and annotation analyses, version VI of the ESTree db encompasses new tools for tandem repeat identification, annotation against genomic Rosaceae sequences, and positioning on the database of oligomer sequences that were used in a peach microarray study. Furthermore, known protein patterns and motifs were identified by comparison to PROSITE. Based on data retrieved from sequence annotation against the UniProtKB database, a script was prepared to track positions of homologous hits on the GO tree and build statistics on the ontologies distribution in GO functional categories. EST mapping data were also integrated in the database. The PHP-based web interface was upgraded and extended. The aim of the authors was to enable querying the database according to all the biological aspects that can be investigated from the analysis of data available in the ESTree db. This is achieved by allowing multiple searches on logical subsets of sequences that represent different biological situations or features. Conclusions The version VI of ESTree db offers a broad overview on peach gene expression. Sequence analyses results contained in the database, extensively linked to external related resources, represent a large amount of information that can be queried via the tools offered in the web interface. Flexibility and modularity of the ESTree analysis pipeline and of the web interface allowed the authors to set up similar structures for different datasets, with limited manual intervention.
Collapse
Affiliation(s)
- Barbara Lazzari
- Parco Tecnologico Padano, Via Einstein - Località Cascina Codazza, Lodi, 26900, Italy.
| | | | | | | | | | | | | | | |
Collapse
|
28
|
Kochetov AV, Ahmad S, Ivanisenko V, Volkova OA, Kolchanov NA, Sarai A. uORFs, reinitiation and alternative translation start sites in human mRNAs. FEBS Lett 2008; 582:1293-7. [PMID: 18358843 DOI: 10.1016/j.febslet.2008.03.014] [Citation(s) in RCA: 53] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2008] [Revised: 03/11/2008] [Accepted: 03/12/2008] [Indexed: 11/15/2022]
Abstract
It is known that eukaryotic ribosomes are able to translate small ORFs and reinitiate translation at downstream start codons. However, this mechanism is widely considered to be inefficient and it is not commonly taken into account. We compiled a sample of human mRNAs containing small upstream ORFs overlapping with annotated protein coding sequences. Statistical analysis supported the hypothesis on reinitiation of translation at downstream AUG codons and functional significance of potential alternative ORFs. It may be assumed that some 5'UTR-located upstream ORFs can deliver ribosomes to alternative translation starts, and they should be taken into consideration in the prediction of human mRNA coding potential.
Collapse
Affiliation(s)
- Alex V Kochetov
- Institute of Cytology and Genetics, Lavrentieva Avenue 10, Novosibirsk 630090, Russia.
| | | | | | | | | | | |
Collapse
|
29
|
|
30
|
Hawkins T, Chitale M, Kihara D. New paradigm in protein function prediction for large scale omics analysis. MOLECULAR BIOSYSTEMS 2008; 4:223-31. [PMID: 18437265 DOI: 10.1039/b718229e] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/23/2023]
Abstract
Biological interpretation of large scale omics data, such as protein-protein interaction data and microarray gene expression data, requires that the function of many genes in a data set is annotated or predicted. Here the predicted function for a gene does not necessarily have to be a detailed biochemical function; a broad class of function, or low-resolution function, may be sufficient to understand why a set of genes shows the observed expression pattern or interaction pattern. In this Highlight, we focus on two recent approaches for function prediction which aim to provide large coverage in function prediction, namely omics data driven approaches and a thorough data mining approach on homology search results.
Collapse
Affiliation(s)
- Troy Hawkins
- Department of Biological Sciences, College of Science, Purdue University, West Lafayette, IN 47907, USA
| | | | | |
Collapse
|
31
|
Conesa A, Götz S. Blast2GO: A Comprehensive Suite for Functional Analysis in Plant Genomics. INTERNATIONAL JOURNAL OF PLANT GENOMICS 2008; 2008:619832. [PMID: 18483572 PMCID: PMC2375974 DOI: 10.1155/2008/619832] [Citation(s) in RCA: 1370] [Impact Index Per Article: 85.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/05/2007] [Accepted: 11/26/2007] [Indexed: 05/09/2023]
Abstract
Functional annotation of novel sequence data is a primary requirement for the utilization of functional genomics approaches in plant research. In this paper, we describe the Blast2GO suite as a comprehensive bioinformatics tool for functional annotation of sequences and data mining on the resulting annotations, primarily based on the gene ontology (GO) vocabulary. Blast2GO optimizes function transfer from homologous sequences through an elaborate algorithm that considers similarity, the extension of the homology, the database of choice, the GO hierarchy, and the quality of the original annotations. The tool includes numerous functions for the visualization, management, and statistical analysis of annotation results, including gene set enrichment analysis. The application supports InterPro, enzyme codes, KEGG pathways, GO direct acyclic graphs (DAGs), and GOSlim. Blast2GO is a suitable tool for plant genomics research because of its versatility, easy installation, and friendly use.
Collapse
Affiliation(s)
- Ana Conesa
- Bioinformatics Department,
Centro de Investigación Príncipe Felipe,
4012 Valencia,
Spain
- *Ana Conesa:
| | - Stefan Götz
- Bioinformatics Department,
Centro de Investigación Príncipe Felipe,
4012 Valencia,
Spain
| |
Collapse
|
32
|
A threading-based method (FINDSITE) for ligand-binding site prediction and functional annotation. Proc Natl Acad Sci U S A 2007; 105:129-34. [PMID: 18165317 DOI: 10.1073/pnas.0707684105] [Citation(s) in RCA: 240] [Impact Index Per Article: 14.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
The detection of ligand-binding sites is often the starting point for protein function identification and drug discovery. Because of inaccuracies in predicted protein structures, extant binding pocket-detection methods are limited to experimentally solved structures. Here, FINDSITE, a method for ligand-binding site prediction and functional annotation based on binding-site similarity across groups of weakly homologous template structures identified from threading, is described. For crystal structures, considering a cutoff distance of 4 A as the hit criterion, the success rate is 70.9% for identifying the best of top five predicted ligand-binding sites with a ranking accuracy of 76.0%. Both high prediction accuracy and ability to correctly rank identified binding sites are sustained when approximate protein models (<35% sequence identity to the closest template structure) are used, showing a 67.3% success rate with 75.5% ranking accuracy. In practice, FINDSITE tolerates structural inaccuracies in protein models up to a rmsd from the crystal structure of 8-10 A. This is because analysis of weakly homologous protein models reveals that about half have a rmsd from the native binding site <2 A. Furthermore, the chemical properties of template-bound ligands can be used to select ligand templates associated with the binding site. In most cases, FINDSITE can accurately assign a molecular function to the protein model.
Collapse
|
33
|
Brown DP, Krishnamurthy N, Sjölander K. Automated protein subfamily identification and classification. PLoS Comput Biol 2007; 3:e160. [PMID: 17708678 PMCID: PMC1950344 DOI: 10.1371/journal.pcbi.0030160] [Citation(s) in RCA: 86] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2006] [Accepted: 06/25/2007] [Indexed: 11/22/2022] Open
Abstract
Function prediction by homology is widely used to provide preliminary functional annotations for genes for which experimental evidence of function is unavailable or limited. This approach has been shown to be prone to systematic error, including percolation of annotation errors through sequence databases. Phylogenomic analysis avoids these errors in function prediction but has been difficult to automate for high-throughput application. To address this limitation, we present a computationally efficient pipeline for phylogenomic classification of proteins. This pipeline uses the SCI-PHY (Subfamily Classification in Phylogenomics) algorithm for automatic subfamily identification, followed by subfamily hidden Markov model (HMM) construction. A simple and computationally efficient scoring scheme using family and subfamily HMMs enables classification of novel sequences to protein families and subfamilies. Sequences representing entirely novel subfamilies are differentiated from those that can be classified to subfamilies in the input training set using logistic regression. Subfamily HMM parameters are estimated using an information-sharing protocol, enabling subfamilies containing even a single sequence to benefit from conservation patterns defining the family as a whole or in related subfamilies. SCI-PHY subfamilies correspond closely to functional subtypes defined by experts and to conserved clades found by phylogenetic analysis. Extensive comparisons of subfamily and family HMM performances show that subfamily HMMs dramatically improve the separation between homologous and non-homologous proteins in sequence database searches. Subfamily HMMs also provide extremely high specificity of classification and can be used to predict entirely novel subtypes. The SCI-PHY Web server at http://phylogenomics.berkeley.edu/SCI-PHY/ allows users to upload a multiple sequence alignment for subfamily identification and subfamily HMM construction. Biologists wishing to provide their own subfamily definitions can do so. Source code is available on the Web page. The Berkeley Phylogenomics Group PhyloFacts resource contains pre-calculated subfamily predictions and subfamily HMMs for more than 40,000 protein families and domains at http://phylogenomics.berkeley.edu/phylofacts/.
Collapse
Affiliation(s)
- Duncan P Brown
- Department of Bioengineering, University of California, Berkeley, California, United States of America
| | - Nandini Krishnamurthy
- Department of Bioengineering, University of California, Berkeley, California, United States of America
| | - Kimmen Sjölander
- Department of Bioengineering, University of California, Berkeley, California, United States of America
| |
Collapse
|
34
|
Othman RM, Deris S, Illias RM. A genetic similarity algorithm for searching the Gene Ontology terms and annotating anonymous protein sequences. J Biomed Inform 2007; 41:65-81. [PMID: 17681495 DOI: 10.1016/j.jbi.2007.05.010] [Citation(s) in RCA: 35] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2006] [Revised: 05/28/2007] [Accepted: 05/29/2007] [Indexed: 11/19/2022]
Abstract
A genetic similarity algorithm is introduced in this study to find a group of semantically similar Gene Ontology terms. The genetic similarity algorithm combines semantic similarity measure algorithm with parallel genetic algorithm. The semantic similarity measure algorithm is used to compute the similitude strength between the Gene Ontology terms. Then, the parallel genetic algorithm is employed to perform batch retrieval and to accelerate the search in large search space of the Gene Ontology graph. The genetic similarity algorithm is implemented in the Gene Ontology browser named basic UTMGO to overcome the weaknesses of the existing Gene Ontology browsers which use a conventional approach based on keyword matching. To show the applicability of the basic UTMGO, we extend its structure to develop a Gene Ontology -based protein sequence annotation tool named extended UTMGO. The objective of developing the extended UTMGO is to provide a simple and practical tool that is capable of producing better results and requires a reasonable amount of running time with low computing cost specifically for offline usage. The computational results and comparison with other related tools are presented to show the effectiveness of the proposed algorithm and tools.
Collapse
Affiliation(s)
- Razib M Othman
- Department of Software Engineering, Faculty of Computer Science and Information Systems, Universiti Teknologi Malaysia, 81310 UTM Skudai, Malaysia.
| | | | | |
Collapse
|
35
|
Cass CL, Johnson JR, Califf LL, Xu T, Hernandez HJ, Stadecker MJ, Yates JR, Williams DL. Proteomic analysis of Schistosoma mansoni egg secretions. Mol Biochem Parasitol 2007; 155:84-93. [PMID: 17644200 PMCID: PMC2077830 DOI: 10.1016/j.molbiopara.2007.06.002] [Citation(s) in RCA: 144] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2007] [Revised: 06/08/2007] [Accepted: 06/11/2007] [Indexed: 01/06/2023]
Abstract
Schistosomiasis remains a largely neglected, global health problem. The morbid pathology of the disease stems from the host's inflammatory response to parasite eggs trapped in host tissues. Long term host/parasite survival is dependent upon the successful modulation of the acute pathological response, which is induced by egg antigens. In this study, using Multidimensional Protein Identification Technology, we identified the Schistosoma mansoni egg secretome consisting of 188 proteins. Notably we identified proteins involved in redox balance, molecular chaperoning and protein folding, development and signaling, scavenging and metabolic pathways, immune response modulation, and 32 novel, previously uncharacterized schistosome proteins. We localized a subset of previously characterized schistosome proteins identified in egg secretions in this study, to the surface of live S. mansoni eggs using the circumoval precipitin reaction. The identification of proteins actively secreted by live schistosome eggs provides important new information for understanding immune modulation and the pathology of schistosomiasis.
Collapse
Affiliation(s)
- Cynthia L Cass
- Department of Biological Sciences, Illinois State University, Normal, IL 61790-4120, United States
| | | | | | | | | | | | | | | |
Collapse
|
36
|
Douglas SE, Knickle LC, Kimball J, Reith ME. Comprehensive EST analysis of Atlantic halibut (Hippoglossus hippoglossus), a commercially relevant aquaculture species. BMC Genomics 2007; 8:144. [PMID: 17547761 PMCID: PMC1924502 DOI: 10.1186/1471-2164-8-144] [Citation(s) in RCA: 66] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2007] [Accepted: 06/04/2007] [Indexed: 11/16/2022] Open
Abstract
Background An essential first step in the genomic characterisation of a new species, in this case Atlantic halibut (Hippoglossus hippoglossus), is the generation of EST information. This forms the basis for subsequent microarray design, SNP detection and the placement of novel markers on genetic linkage maps. Results Normalised directional cDNA libraries were constructed from five different larval stages (hatching, mouth-opening, midway to metamorphosis, premetamorphosis, and post-metamorphosis) and eight different adult tissues (testis, ovary, liver, head kidney, spleen, skin, gill, and intestine). Recombination efficiency of the libraries ranged from 91–98% and insert size averaged 1.4 kb. Approximately 1000 clones were sequenced from the 5'-end of each library and after trimming, 12675 good sequences were obtained. Redundancy within each library was very low and assembly of the entire EST collection into contigs resulted in 7738 unique sequences of which 6722 (87%) had matches in Genbank. Removal of ESTs and contigs that originated from bacteria or food organisms resulted in a total of 7710 unique halibut sequences. Conclusion A Unigene collection of 7710 functionally annotated ESTs has been assembled from Atlantic halibut. These have been incorporated into a publicly available, searchable database and form the basis for an oligonucleotide microarray that can be used as a tool to study gene expression in this economically important aquacultured fish.
Collapse
Affiliation(s)
- Susan E Douglas
- Institute for Marine Biosciences,1411 Oxford Street, Halifax, Nova Scotia, B3H 3Z1, Canada
| | - Leah C Knickle
- Institute for Marine Biosciences,1411 Oxford Street, Halifax, Nova Scotia, B3H 3Z1, Canada
| | - Jennifer Kimball
- Institute for Marine Biosciences,1411 Oxford Street, Halifax, Nova Scotia, B3H 3Z1, Canada
| | - Michael E Reith
- Institute for Marine Biosciences,1411 Oxford Street, Halifax, Nova Scotia, B3H 3Z1, Canada
| |
Collapse
|
37
|
Saini HK, Fischer D. FRalanyzer: a tool for functional analysis of fold-recognition sequence-structure alignments. Nucleic Acids Res 2007; 35:W499-502. [PMID: 17537819 PMCID: PMC1933221 DOI: 10.1093/nar/gkm367] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022] Open
Abstract
We describe FRalanyzer (Fold Recognition alignment analyzer), a new web tool to visually inspect sequence–structure alignments in order to predict functionally important residues in a query sequence of unknown function. This tool is aimed at helping to infer functional relationships between a query sequence and a template structure, and is particularly useful in analyzing fold recognition (FR) results. Because similar folds do not necessarily share the same function, it is not always straightforward to infer a function from an FR result alone. Manual inspection of the FR sequence-structure alignment is often required in order to search for conservation of functionally important residues. FRalanyzer automates parts of this time-consuming process. FRalanyzer takes as input a sequence–structure alignment, automatically searches annotated databases, displays functionally significant residues and highlights the functionally important positions that are identical in the alignment. FRalanyzer can also be used with sequence-structure alignments obtained by other methods, and with structure–structure alignments obtained from structural comparison of newly determined 3D-structures of unknown function. Fralanyzer is available at http://fralanyzer.cse.buffalo.edu/.
Collapse
Affiliation(s)
- Harpreet Kaur Saini
- Computer Science and Engineering Department, 201 Bell Hall University at Buffalo, Buffalo, NY 14260, USA.
| | | |
Collapse
|
38
|
Jones CE, Brown AL, Baumann U. Estimating the annotation error rate of curated GO database sequence annotations. BMC Bioinformatics 2007; 8:170. [PMID: 17519041 PMCID: PMC1892569 DOI: 10.1186/1471-2105-8-170] [Citation(s) in RCA: 101] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2006] [Accepted: 05/22/2007] [Indexed: 11/10/2022] Open
Abstract
Background Annotations that describe the function of sequences are enormously important to researchers during laboratory investigations and when making computational inferences. However, there has been little investigation into the data quality of sequence function annotations. Here we have developed a new method of estimating the error rate of curated sequence annotations, and applied this to the Gene Ontology (GO) sequence database (GOSeqLite). This method involved artificially adding errors to sequence annotations at known rates, and used regression to model the impact on the precision of annotations based on BLAST matched sequences. Results We estimated the error rate of curated GO sequence annotations in the GOSeqLite database (March 2006) at between 28% and 30%. Annotations made without use of sequence similarity based methods (non-ISS) had an estimated error rate of between 13% and 18%. Annotations made with the use of sequence similarity methodology (ISS) had an estimated error rate of 49%. Conclusion While the overall error rate is reasonably low, it would be prudent to treat all ISS annotations with caution. Electronic annotators that use ISS annotations as the basis of predictions are likely to have higher false prediction rates, and for this reason designers of these systems should consider avoiding ISS annotations where possible. Electronic annotators that use ISS annotations to make predictions should be viewed sceptically. We recommend that curators thoroughly review ISS annotations before accepting them as valid. Overall, users of curated sequence annotations from the GO database should feel assured that they are using a comparatively high quality source of information.
Collapse
Affiliation(s)
- Craig E Jones
- School of Computer Science, University of Adelaide, South Australia, 5001
- Australian Centre for Plant Functional Genomics, Waite Campus, University of Adelaide, South Australia, 5064
| | - Alfred L Brown
- School of Computer Science, University of Adelaide, South Australia, 5001
| | - Ute Baumann
- Australian Centre for Plant Functional Genomics, Waite Campus, University of Adelaide, South Australia, 5064
| |
Collapse
|
39
|
Transcriptome analysis of the venom gland of the Mexican scorpion Hadrurus gertschi (Arachnida: Scorpiones). BMC Genomics 2007; 8:119. [PMID: 17506894 PMCID: PMC1904202 DOI: 10.1186/1471-2164-8-119] [Citation(s) in RCA: 108] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2007] [Accepted: 05/16/2007] [Indexed: 11/19/2022] Open
Abstract
Background Scorpions like other venomous animals posses a highly specialized organ that produces, secretes and disposes the venom components. In these animals, the last postabdominal segment, named telson, contains a pair of venomous glands connected to the stinger. The isolation of numerous scorpion toxins, along with cDNA-based gene cloning and, more recently, proteomic analyses have provided us with a large collection of venom components sequences. However, all of them are secreted, or at least are predicted to be secretable gene products. Therefore very little is known about the cellular processes that normally take place inside the glands for production of the venom mixture. To gain insights into the scorpion venom gland biology, we have decided to perform a transcriptomic analysis by constructing a cDNA library and conducting a random sequencing screening of the transcripts. Results From the cDNA library prepared from a single venom gland of the scorpion Hadrurus gertschi, 160 expressed sequence tags (ESTs) were analyzed. These transcripts were further clustered into 68 unique sequences (20 contigs and 48 singlets), with an average length of 919 bp. Half of the ESTs can be confidentially assigned as homologues of annotated gene products. Annotation of these ESTs, with the aid of Gene Ontology terms and homology to eukaryotic orthologous groups, reveals some cellular processes important for venom gland function; including high protein synthesis, tuned posttranslational processing and trafficking. Nonetheless, the main group of the identified gene products includes ESTs similar to known scorpion toxins or other previously characterized scorpion venom components, which account for nearly 60% of the identified proteins. Conclusion To the best of our knowledge this report contains the first transcriptome analysis of genes transcribed by the venomous gland of a scorpion. The data were obtained for the species Hadrurus gertschi, belonging to the family Caraboctonidae. One hundred and sixty ESTs were analyzed, showing enrichment in genes that encode for products similar to known venom components, but also provides the first sketch of cellular components, molecular functions, biological processes and some unique sequences of the scorpion venom gland.
Collapse
|
40
|
Williams DL, Sayed AA, Bernier J, Birkeland SR, Cipriano MJ, Papa AR, McArthur AG, Taft A, Vermeire JJ, Yoshino TP. Profiling Schistosoma mansoni development using serial analysis of gene expression (SAGE). Exp Parasitol 2007; 117:246-58. [PMID: 17577588 PMCID: PMC2121609 DOI: 10.1016/j.exppara.2007.05.001] [Citation(s) in RCA: 50] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2007] [Revised: 05/02/2007] [Accepted: 05/04/2007] [Indexed: 01/11/2023]
Abstract
Despite the widespread use of chemotherapy and other control strategies over the past 50years, transmission rates for schistosomiasis have changed little. Regardless of the approach used, future control efforts will require a more complete understanding of fundamental parasite biology. Schistosomes undergo complex development involving an alteration of parasite generations within a mammalian and freshwater molluscan host in the completion of its lifecycle. Little is known about factors controlling schistosome development, but understanding these processes may facilitate the discovery of new control methods. Therefore, our goal in this study is to determine global developmentally regulated and stage-specific gene expression in Schistosoma mansoni using serial analysis of gene expression (SAGE). We present a preliminary analysis of genes expressed during development and sexual differentiation in the mammalian host and during early larval development in the snail host. A number of novel, differentially expressed genes have been identified, both within and between the different developmental stages found in the mammalian and snail hosts.
Collapse
Affiliation(s)
- David L Williams
- Department of Biological Sciences, Illinois State University, Normal, IL, USA.
| | | | | | | | | | | | | | | | | | | |
Collapse
|
41
|
Strube C, Schnieder T, von Samson-Himmelstjerna G. Differential gene expression in hypobiosis-induced and non-induced third-stage larvae of the bovine lungworm Dictyocaulus viviparus. Int J Parasitol 2007; 37:221-31. [PMID: 17112525 DOI: 10.1016/j.ijpara.2006.09.014] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2006] [Revised: 09/15/2006] [Accepted: 09/19/2006] [Indexed: 11/26/2022]
Abstract
Hypobiosis is of particular importance in overwintering of the bovine lungworm Dictyocaulus viviparus. However, in parasitic nematodes there is no information available on the genetic mechanisms of hypobiosis. Suppression subtractive hybridisation was performed to identify upregulated transcripts of hypobiosis-induced and non-induced third-stage D. viviparus larvae, respectively. Subtracted libraries containing 105 clones of the hypobiosis-induced and 104 clones of the non-induced larvae were generated. By differential screening and Southern dot blot, 26 clones of the hypobiosis-induced and 22 clones of the non-induced larvae were confirmed to be differentially expressed. Sequencing of rapid amplification of cDNA ends (RACE) and spliced-leader-1 PCR products was performed to further characterise selection of the differentially regulated gene transcripts. The genes encoding an N-methyltransferase and a superoxide dismutase were upregulated in the hypobiosis-induced and non-induced larvae, respectively. The expression patterns of these genes were validated by quantitative real-time PCR. This revealed differential gene expression, particularly for the N-methyltransferase.
Collapse
Affiliation(s)
- Christina Strube
- Institute for Parasitology, Centre for Infectious Diseases, University of Veterinary Medicine Hannover, Buenteweg 17, 30559 Hannover, Germany.
| | | | | |
Collapse
|
42
|
Wei Y, Ringe D, Wilson MA, Ondrechen MJ. Identification of functional subclasses in the DJ-1 superfamily proteins. PLoS Comput Biol 2007; 3:e10. [PMID: 17257049 PMCID: PMC1782040 DOI: 10.1371/journal.pcbi.0030010] [Citation(s) in RCA: 56] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2006] [Accepted: 12/07/2006] [Indexed: 12/02/2022] Open
Abstract
Genomics has posed the challenge of determination of protein function from sequence and/or 3-D structure. Functional assignment from sequence relationships can be misleading, and structural similarity does not necessarily imply functional similarity. Proteins in the DJ-1 family, many of which are of unknown function, are examples of proteins with both sequence and fold similarity that span multiple functional classes. THEMATICS (theoretical microscopic titration curves), an electrostatics-based computational approach to functional site prediction, is used to sort proteins in the DJ-1 family into different functional classes. Active site residues are predicted for the eight distinct DJ-1 proteins with available 3-D structures. Placement of the predicted residues onto a structural alignment for six of these proteins reveals three distinct types of active sites. Each type overlaps only partially with the others, with only one residue in common across all six sets of predicted residues. Human DJ-1 and YajL from Escherichia coli have very similar predicted active sites and belong to the same probable functional group. Protease I, a known cysteine protease from Pyrococcus horikoshii, and PfpI/YhbO from E. coli, a hypothetical protein of unknown function, belong to a separate class. THEMATICS predicts a set of residues that is typical of a cysteine protease for Protease I; the prediction for PfpI/YhbO bears some similarity. YDR533Cp from Saccharomyces cerevisiae, of unknown function, and the known chaperone Hsp31 from E. coli constitute a third group with nearly identical predicted active sites. While the first four proteins have predicted active sites at dimer interfaces, YDR533Cp and Hsp31 both have predicted sites contained within each subunit. Although YDR533Cp and Hsp31 form different dimers with different orientations between the subunits, the predicted active sites are superimposable within the monomer structures. Thus, the three predicted functional classes form four different types of quaternary structures. The computational prediction of the functional sites for protein structures of unknown function provides valuable clues for functional classification.
Collapse
Affiliation(s)
- Ying Wei
- Department of Chemistry and Chemical Biology, Northeastern University, Boston, Massachusetts, United States of America
- Institute for Complex Scientific Software, Northeastern University, Boston, Massachusetts, United States of America
| | - Dagmar Ringe
- Department of Biochemistry, Brandeis University, Waltham, Massachusetts, United States of America
- Department of Chemistry, Brandeis University, Waltham, Massachusetts, United States of America
- Rosenstiel Basic Medical Sciences Research Center, Brandeis University, Waltham, Massachusetts, United States of America
| | - Mark A Wilson
- Department of Biochemistry, Brandeis University, Waltham, Massachusetts, United States of America
- Department of Chemistry, Brandeis University, Waltham, Massachusetts, United States of America
- Rosenstiel Basic Medical Sciences Research Center, Brandeis University, Waltham, Massachusetts, United States of America
| | - Mary Jo Ondrechen
- Department of Chemistry and Chemical Biology, Northeastern University, Boston, Massachusetts, United States of America
- Institute for Complex Scientific Software, Northeastern University, Boston, Massachusetts, United States of America
| |
Collapse
|
43
|
Abstract
Understanding individual response to a drug -what determines its efficacy and tolerability -is the major bottleneck in current drug development and clinical trials. Intracellular response and metabolism, for example through cytochrome P-450 enzymes, may either enhance or decrease the effect of different drugs, dependent on the genetic variant. Microarrays offer the potential to screen the genetic composition of the individual patient However, experiments are «noisy» and must be accompanied by solid and robust data analysis. Furthermore, recent research aims at the combination of high-throughput data with methods of mathematical modeling, enabling problem-oriented assistance in the drug discovery process. This article will discuss state-of-the-art DNA array technology platforms and the basic elements of data analysis and bioinformatics research in drug discovery. Enhancing single-gene analysis, we will present a new method for interpreting gene expression changes in the context of entire pathways. Furthermore, we will introduce the concept of systems biology as a new paradigm for drug development and highlight our recent research - the development of a modeling and simulation platform for biomedical applications. We discuss the potentials of systems biology for modeling the drug response of the individual patient.
Collapse
Affiliation(s)
- Ralf Herwig
- Max Planck Institute for Molecular Genetics, Department of Vertebrate Genomics, Berlin, Germany.
| | | |
Collapse
|
44
|
Ross C, Shen QJ. Computational prediction and experimental verification of HVA1-like abscisic acid responsive promoters in rice (Oryza sativa). PLANT MOLECULAR BIOLOGY 2006; 62:233-46. [PMID: 16845480 DOI: 10.1007/s11103-006-9017-y] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/01/2006] [Accepted: 05/09/2006] [Indexed: 05/10/2023]
Abstract
Abscisic acid (ABA) is one of the central plant hormones, responsible for controlling both maturation and germination in seeds, as well as mediating adaptive responses to desiccation, injury, and pathogen infection in vegetative tissues. Thorough analyses of two barley genes, HVA1 and HVA22, indicate that their response to ABA relies on the interaction of two cis-acting elements in their promoters, an ABA response element (ABRE) and a coupling element (CE). Together, they form an ABA response promoter complex (ABRC). Comparison of promoters of barley HVA1 and it rice orthologue indicates that the structures and sequences of their ABRCs are highly similar. Prediction of ABA responsive genes in the rice genome is then tractable to a bioinformatics approach based on the structures of the well-defined barley ABRCs. Here we describe a model developed based on the consensus, inter-element spacing and orientations of experimentally determined ABREs and CEs. Our search of the rice promoter database for promoters that fit the model has generated a partial list of genes in rice that have a high likelihood of being involved in the ABA signaling network. The ABA inducibility of some of the rice genes identified was validated with quantitative reverse transcription PCR (QPCR). By limiting our input data to known enhancer modules and experimentally derived rules, we have generated a high confidence subset of ABA-regulated genes. The results suggest that the pathways by which cereals respond to biotic and abiotic stresses overlap significantly, and that regulation is not confined to the level transcription. The large fraction of putative regulatory genes carrying HVA1-like enhancer modules in their promoters suggests the ABA signal enters at multiple points into a complex regulatory network that remains largely unmapped.
Collapse
Affiliation(s)
- Christian Ross
- Bioinformatics Core, Department of Biological Sciences, University of Nevada, Las Vegas, NV 89154, USA
| | | |
Collapse
|
45
|
Ye J, Fang L, Zheng H, Zhang Y, Chen J, Zhang Z, Wang J, Li S, Li R, Bolund L, Wang J. WEGO: a web tool for plotting GO annotations. Nucleic Acids Res 2006; 34:W293-7. [PMID: 16845012 PMCID: PMC1538768 DOI: 10.1093/nar/gkl031] [Citation(s) in RCA: 2024] [Impact Index Per Article: 112.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Unified, structured vocabularies and classifications freely provided by the Gene Ontology (GO) Consortium are widely accepted in most of the large scale gene annotation projects. Consequently, many tools have been created for use with the GO ontologies. WEGO (Web Gene Ontology Annotation Plot) is a simple but useful tool for visualizing, comparing and plotting GO annotation results. Different from other commercial software for creating chart, WEGO is designed to deal with the directed acyclic graph structure of GO to facilitate histogram creation of GO annotation results. WEGO has been used widely in many important biological research projects, such as the rice genome project and the silkworm genome project. It has become one of the daily tools for downstream gene annotation analysis, especially when performing comparative genomics tasks. WEGO, along with the two other tools, namely External to GO Query and GO Archive Query, are freely available for all users at http://wego.genomics.org.cn. There are two available mirror sites at http://wego2.genomics.org.cn and http://wego.genomics.com.cn. Any suggestions are welcome at wego@genomics.org.cn.
Collapse
Affiliation(s)
- Jia Ye
- James D. Watson Institute of Genome Sciences of Zhejiang UniversityHangzhou 310008, China
| | - Lin Fang
- Beijing Genomics InstituteBeijing 101300, China
| | | | - Yong Zhang
- Beijing Genomics InstituteBeijing 101300, China
- College of Life Sciences, Peking UniversityBeijing 100871, China
| | - Jie Chen
- Beijing Genomics InstituteBeijing 101300, China
| | | | - Jing Wang
- Beijing Genomics InstituteBeijing 101300, China
| | - Shengting Li
- Beijing Genomics InstituteBeijing 101300, China
- The Institute of Human Genetics, University of AarhusDK-8000 Aarhus C, Denmark
| | - Ruiqiang Li
- Beijing Genomics InstituteBeijing 101300, China
- Department of Biochemistry and Molecular Biology, University of Southern DenmarkDK-5230, Odense M, Denmark
| | - Lars Bolund
- Beijing Genomics InstituteBeijing 101300, China
- The Institute of Human Genetics, University of AarhusDK-8000 Aarhus C, Denmark
| | - Jun Wang
- James D. Watson Institute of Genome Sciences of Zhejiang UniversityHangzhou 310008, China
- Department of Biochemistry and Molecular Biology, University of Southern DenmarkDK-5230, Odense M, Denmark
- To whom correspondence should be addressed. Tel: +86 10 80491664; Fax: +86 10 80498676;
| |
Collapse
|
46
|
Abstract
With the high number of sequences and structures streaming in from genomic projects, there is a need for more powerful and sophisticated annotation tools. Most problematic of the annotation efforts is predicting gene and protein function. Over the past few years there has been considerable progress in automated protein function prediction, using a diverse set of methods. Nevertheless, no single method reports all the information possible, and molecular biologists resort to 'shopping around' using different methods: a cumbersome and time-consuming practice. Here we present the Joined Assembly of Function Annotations, or JAFA server. JAFA queries several function prediction servers with a protein sequence and assembles the returned predictions in a legible, non-redundant format. In this manner, JAFA combines the predictions of several servers to provide a comprehensive view of what are the predicted functions of the proteins. JAFA also offers its own output, and the individual programs' predictions for further processing. JAFA is available for use from http://jafa.burnham.org.
Collapse
Affiliation(s)
- Iddo Friedberg
- Burnham Institute for Medical Research, Program in Bioinformatics and Systems Biology, 10901 North Torrey Pines Road, La Jolla, CA 92037, USA.
| | | | | |
Collapse
|
47
|
Cai Z, Mao X, Li S, Wei L. Genome comparison using Gene Ontology (GO) with statistical testing. BMC Bioinformatics 2006; 7:374. [PMID: 16901353 PMCID: PMC1569881 DOI: 10.1186/1471-2105-7-374] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2006] [Accepted: 08/11/2006] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Automated comparison of complete sets of genes encoded in two genomes can provide insight on the genetic basis of differences in biological traits between species. Gene ontology (GO) is used as a common vocabulary to annotate genes for comparison. Current approaches calculate the fold of unweighted or weighted differences between two species at the high-level GO functional categories. However, to ensure the reliability of the differences detected, it is important to evaluate their statistical significance. It is also useful to search for differences at all levels of GO. RESULTS We propose a statistical approach to find reliable differences between the complete sets of genes encoded in two genomes at all levels of GO. The genes are first assigned GO terms from BLAST searches against genes with known GO assignments, and for each GO term the abundance of genes in the two genomes is compared using a chi-squared test followed by false discovery rate (FDR) correction. We applied this method to find statistically significant differences between two cyanobacteria, Synechocystis sp. PCC6803 and Anabaena sp. PCC7120. We then studied how the set of identified differences vary when different BLAST cutoffs are used. We also studied how the results vary when only subsets of the genes were used in the comparison of human vs. mouse and that of Saccharomyces cerevisiae vs. Schizosaccharomyces pombe. CONCLUSION There is a surprising lack of statistical approaches for comparing complete genomes at all levels of GO. With the rapid increase of the number of sequenced genomes, we hope that the approach we proposed and tested can make valuable contribution to comparative genomics.
Collapse
Affiliation(s)
- Zhaotao Cai
- Center for Bioinformatics, National Laboratory of Protein Engineering and Plant Genetic Engineering, College of Life Sciences, Peking University, Beijing 100871, P.R. China
| | - Xizeng Mao
- Center for Bioinformatics, National Laboratory of Protein Engineering and Plant Genetic Engineering, College of Life Sciences, Peking University, Beijing 100871, P.R. China
| | - Songgang Li
- Center for Bioinformatics, National Laboratory of Protein Engineering and Plant Genetic Engineering, College of Life Sciences, Peking University, Beijing 100871, P.R. China
| | - Liping Wei
- Center for Bioinformatics, National Laboratory of Protein Engineering and Plant Genetic Engineering, College of Life Sciences, Peking University, Beijing 100871, P.R. China
| |
Collapse
|
48
|
DeMarco R, Oliveira KC, Venancio TM, Verjovski-Almeida S. Gender biased differential alternative splicing patterns of the transcriptional cofactor CA150 gene in Schistosoma mansoni. Mol Biochem Parasitol 2006; 150:123-31. [PMID: 16904200 DOI: 10.1016/j.molbiopara.2006.07.002] [Citation(s) in RCA: 30] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2006] [Revised: 07/05/2006] [Accepted: 07/06/2006] [Indexed: 11/19/2022]
Abstract
The complex molecular systems involved in the process of sex-differentiation and fertility in Schistosoma mansoni have not yet been completely described. Using a 4608-element cDNA microarray, we have now determined 90 and 139 genes with significantly (q-value</=0.06) higher expression levels in adult males and females, respectively. Eight out of eleven (73%) selected transcripts had their differential expression levels validated by real-time RT-PCR. One of these transcripts was extended by RT-PCR and was shown to span the intronic region between exons 9 and 11 of the S. mansoni CA150 gene, a transcriptional cofactor known in humans to interact with both RNA polymerase II and the spliceosome complex. The longer transcript probably represents a novel isoform of S. mansoni CA150. Additionally, we obtained full-length sequences for three other isoforms of the SmCA150 gene, coding for proteins of different lengths and domain compositions. Semi-quantitative RT-PCR showed different expression ratios among these isoforms between male and female. Due to the role of CA150 in RNA transcription and processing, we hypothesize that these differential expression events may be important in the generation and maintenance of the different phenotypes between male and female.
Collapse
Affiliation(s)
- Ricardo DeMarco
- Laboratory of Gene Expression in Eukaryotes, Departamento de Bioquímica, Instituto de Química, Universidade de São Paulo, Brazil
| | | | | | | |
Collapse
|
49
|
Bonnet A, Frappart PO, Dehais P, Tosser-Klopp G, Hatey F. Identification of differential gene expression in in vitro FSH treated pig granulosa cells using suppression subtractive hybridization. Reprod Biol Endocrinol 2006; 4:35. [PMID: 16827936 PMCID: PMC1533831 DOI: 10.1186/1477-7827-4-35] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/12/2006] [Accepted: 07/07/2006] [Indexed: 11/10/2022] Open
Abstract
FSH, which binds to specific receptors on granulosa cells in mammals, plays a key role in folliculogenesis. Its biological activity involves stimulation of intercellular communication and upregulation of steroidogenesis, but the entire spectrum of the genes regulated by FSH has yet to be fully characterized. In order to find new regulated transcripts, however rare, we have used a Suppression Subtractive Hybridization approach (SSH) on pig granulosa cells in primary culture treated or not with FSH. Two SSH libraries were generated and 76 clones were sequenced after selection by differential screening. Sixty four different sequences were identified, including 3 novel sequences. Experiments demonstrated the presence of 25 regulated transcripts.A gene ontology analysis of these 25 genes revealed (1) catalytic; (2) transport; (3) signal transducer; (4) binding; (5) anti-oxidant and (6) structural activities. These findings may deepen our understanding of FSH's effects. Particularly, they suggest that FSH is involved in the modulation of peroxidase activity and remodelling of chromatin.
Collapse
Affiliation(s)
- A Bonnet
- INRA laboratoire de Génétique cellulaire BP52627 chemin de borde rouge 31326 Castanet cedex, France
| | - PO Frappart
- Department of Genetic St. Jude Children's Research Hospital 332N.Lauderdale Street, Memphis TN 38105, USA
| | - P Dehais
- INRA laboratoire de Génétique cellulaire BP52627 chemin de borde rouge 31326 Castanet cedex, France
| | - G Tosser-Klopp
- INRA laboratoire de Génétique cellulaire BP52627 chemin de borde rouge 31326 Castanet cedex, France
| | - F Hatey
- INRA laboratoire de Génétique cellulaire BP52627 chemin de borde rouge 31326 Castanet cedex, France
| |
Collapse
|
50
|
Perco P, Rapberger R, Siehs C, Lukas A, Oberbauer R, Mayer G, Mayer B. Transforming omics data into context: Bioinformatics on genomics and proteomics raw data. Electrophoresis 2006; 27:2659-75. [PMID: 16739231 DOI: 10.1002/elps.200600064] [Citation(s) in RCA: 32] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Differential gene expression analysis and proteomics have exerted significant impact on the elucidation of concerted cellular processes, as simultaneous measurement of hundreds to thousands of individual objects on the level of RNA and protein ensembles became technically feasible. The availability of such data sets has promised a profound understanding of phenomena on an aggregate level, expressed as the phenotypic response (observables) of cells, e.g., in the presence of drugs, or characterization of cells and tissue displaying distinct patho-physiological states. However, the step of transforming these data into context, i.e., linking distinct expression or abundance patterns with phenotypic observables - and furthermore enabling a sound biological interpretation on the level of reaction networks and concerted pathways, is still a major shortcoming. This finding is certainly based on the enormous complexity embedded in cellular reaction networks, but a variety of computational approaches have been developed over the last few years to overcome these issues. This review provides an overview on computational procedures for analysis of genomic and proteomic data introducing a sequential analysis workflow: Explorative statistics for deriving a first, from the purely statistical viewpoint, relevant candidate gene/protein list, followed by co-regulation and network analysis to biologically expand this core list toward functional networks and pathways. The review on these procedures is complemented by example applications tailored at identification of disease-associated proteins. Optimization of computational procedures involved, in conjunction with the continuous increase in additional biological data, clearly has the potential of boosting our understanding of processes on a cell-wide level.
Collapse
Affiliation(s)
- Paul Perco
- Department of Nephrology, Medical University of Vienna, Austria
| | | | | | | | | | | | | |
Collapse
|