51
|
Abstract
Multiple drug strategies for many cancer types are now readily available and there is a clear need for tools to inform decision making on therapy selection. Although there is still a long way to go before pharmacogenomics achieves the goal of individualized selection of cancer treatment, promising progress is being made. Genetic testing for thiopurine methyltransferase (TPMT) variant alleles in patients prior to mercaptopurine administration, and for UGT1A1*28 in patients prior to administration of irinotecan therapy, along with the instigation of genotype-guided clinical trials (e.g. TYMS) are important advances in cancer pharmacogenomics. Markers for the toxicity and efficacy of many oncology drugs remain unknown; however, the examples highlighted here suggest progress is being made towards the incorporation of pharmacogenomics into clinical practice in oncology.
Collapse
Affiliation(s)
- Sharon Marsh
- Division of Oncology, Washington University School of Medicine, St Louis, Missouri 63110, USA.
| |
Collapse
|
52
|
Skowronek K, Boniecki MJ, Kluge B, Bujnicki JM. Rational engineering of sequence specificity in R.MwoI restriction endonuclease. Nucleic Acids Res 2012; 40:8579-92. [PMID: 22735699 PMCID: PMC3458533 DOI: 10.1093/nar/gks570] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/15/2023] Open
Abstract
R.MwoI is a Type II restriction endonucleases enzyme (REase), which specifically recognizes a palindromic interrupted DNA sequence 5′-GCNNNNNNNGC-3′ (where N indicates any nucleotide), and hydrolyzes the phosphodiester bond in the DNA between the 7th and 8th base in both strands. R.MwoI exhibits remote sequence similarity to R.BglI, a REase with known structure, which recognizes an interrupted palindromic target 5′-GCCNNNNNGGC-3′. A homology model of R.MwoI in complex with DNA was constructed and used to predict functionally important amino acid residues that were subsequently targeted by mutagenesis. The model, together with the supporting experimental data, revealed regions important for recognition of the common bases in DNA sequences recognized by R.BglI and R.MwoI. Based on the bioinformatics analysis, we designed substitutions of the S310 residue in R.MwoI to arginine or glutamic acid, which led to enzyme variants with altered sequence selectivity compared with the wild-type enzyme. The S310R variant of R.MwoI preferred the 5′-GCCNNNNNGGC-3′ sequence as a target, similarly to R.BglI, whereas the S310E variant preferentially cleaved a subset of the MwoI sites, depending on the identity of the 3rd and 9th nucleotide residues. Our results represent a case study of a REase sequence specificity alteration by a single amino acid substitution, based on a theoretical model in the absence of a crystal structure.
Collapse
Affiliation(s)
- Krzysztof Skowronek
- Laboratory of Bioinformatics and Protein Engineering, International Institute of Molecular and Cell Biology, Trojdena 4, 02-109 Warsaw, Poland.
| | | | | | | |
Collapse
|
53
|
Jamil HM. A natural language interface plug-in for cooperative query answering in biological databases. BMC Genomics 2012; 13 Suppl 3:S4. [PMID: 22759613 PMCID: PMC3323828 DOI: 10.1186/1471-2164-13-s3-s4] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
Background One of the many unique features of biological databases is that the mere existence of a ground data item is not always a precondition for a query response. It may be argued that from a biologist's standpoint, queries are not always best posed using a structured language. By this we mean that approximate and flexible responses to natural language like queries are well suited for this domain. This is partly due to biologists' tendency to seek simpler interfaces and partly due to the fact that questions in biology involve high level concepts that are open to interpretations computed using sophisticated tools. In such highly interpretive environments, rigidly structured databases do not always perform well. In this paper, our goal is to propose a semantic correspondence plug-in to aid natural language query processing over arbitrary biological database schema with an aim to providing cooperative responses to queries tailored to users' interpretations. Results Natural language interfaces for databases are generally effective when they are tuned to the underlying database schema and its semantics. Therefore, changes in database schema become impossible to support, or a substantial reorganization cost must be absorbed to reflect any change. We leverage developments in natural language parsing, rule languages and ontologies, and data integration technologies to assemble a prototype query processor that is able to transform a natural language query into a semantically equivalent structured query over the database. We allow knowledge rules and their frequent modifications as part of the underlying database schema. The approach we adopt in our plug-in overcomes some of the serious limitations of many contemporary natural language interfaces, including support for schema modifications and independence from underlying database schema. Conclusions The plug-in introduced in this paper is generic and facilitates connecting user selected natural language interfaces to arbitrary databases using a semantic description of the intended application. We demonstrate the feasibility of our approach with a practical example.
Collapse
Affiliation(s)
- Hasan M Jamil
- Department of Computer Science, Wayne State University, Michigan, USA.
| |
Collapse
|
54
|
Abstract
Ultraconserved elements (UCEs) are DNA sequences that are 100% identical (no base substitutions, insertions, or deletions) and located in syntenic positions in at least two genomes. Although hundreds of UCEs have been found in animal genomes, little is known about the incidence of ultraconservation in plant genomes. Using an alignment-free information-retrieval approach, we have comprehensively identified all long identical multispecies elements (LIMEs), which include both syntenic and nonsyntenic regions, of at least 100 identical base pairs shared by at least two genomes. Among six animal genomes, we found the previously known syntenic UCEs as well as previously undescribed nonsyntenic elements. In contrast, among six plant genomes, we only found nonsyntenic LIMEs. LIMEs can also be classified as either simple (repetitive) or complex (nonrepetitive), they may occur in multiple copies in a genome, and they are often spread across multiple chromosomes. Although complex LIMEs were found in both animal and plant genomes, they differed significantly in their composition and copy number. Further analyses of plant LIMEs revealed their functional diversity, encompassing elements found near rRNA and enzyme-coding genes, as well as those found in transposons and noncoding DNA. We conclude that despite the common presence of LIMEs in both animal and plant lineages, the evolutionary processes involved in the creation and maintenance of these elements differ in the two groups and are likely attributable to several mechanisms, including transfer of genetic material from organellar to nuclear genomes, de novo sequence manufacturing, and purifying selection.
Collapse
|
55
|
Phan HTT, Sternberg MJE. PINALOG: a novel approach to align protein interaction networks--implications for complex detection and function prediction. ACTA ACUST UNITED AC 2012; 28:1239-45. [PMID: 22419782 PMCID: PMC3338015 DOI: 10.1093/bioinformatics/bts119] [Citation(s) in RCA: 74] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Motivation: Analysis of protein–protein interaction networks (PPINs) at the system level has become increasingly important in understanding biological processes. Comparison of the interactomes of different species not only provides a better understanding of species evolution but also helps with detecting conserved functional components and in function prediction. Method and Results: Here we report a PPIN alignment method, called PINALOG, which combines information from protein sequence, function and network topology. Alignment of human and yeast PPINs reveals several conserved subnetworks between them that participate in similar biological processes, notably the proteasome and transcription related processes. PINALOG has been tested for its power in protein complex prediction as well as function prediction. Comparison with PSI-BLAST in predicting protein function in the twilight zone also shows that PINALOG is valuable in predicting protein function. Availability and implementation: The PINALOG web-server is freely available from http://www.sbg.bio.ic.ac.uk/~pinalog. The PINALOG program and associated data are available from the Download section of the web-server. Contact:m.sternberg@imperial.ac.uk Supplementary information:Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Hang T T Phan
- Division of Molecular Biosciences, Faculty of Natural Sciences, Imperial College, London, UK
| | | |
Collapse
|
56
|
Meyer M, Schneckener S, Ludewig B, Kuepfer L, Lippert J. Using expression data for quantification of active processes in physiologically based pharmacokinetic modeling. Drug Metab Dispos 2012; 40:892-901. [PMID: 22293118 DOI: 10.1124/dmd.111.043174] [Citation(s) in RCA: 68] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/04/2023] Open
Abstract
Active processes involved in drug metabolization and distribution mediated by enzymes, transporters, or binding partners mostly occur simultaneously in various organs. However, a quantitative description of active processes is difficult because of limited experimental accessibility of tissue-specific protein activity in vivo. In this work, we present a novel approach to estimate in vivo activity of such enzymes or transporters that have an influence on drug pharmacokinetics. Tissue-specific mRNA expression is used as a surrogate for protein abundance and activity and is integrated into physiologically based pharmacokinetic (PBPK) models that already represent detailed anatomical and physiological information. The new approach was evaluated using three publicly available databases: whole-genome expression microarrays from ArrayExpress, reverse transcription-polymerase chain reaction-derived gene expression estimates collected from the literature, and expressed sequence tags from UniGene. Expression data were preprocessed and stored in a customized database that was then used to build PBPK models for pravastatin in humans. These models represented drug uptake by organic anion-transporting polypeptide 1B1 and organic anion transporter 3, active efflux by multidrug resistance protein 2, and metabolization by sulfotransferases in liver, kidney, and/or intestine. Benchmarking of PBPK models based on gene expression data against alternative models with either a less complex model structure or randomly assigned gene expression values clearly demonstrated the superior model performance of the former. Besides accurate prediction of drug pharmacokinetics, integration of relative gene expression data in PBPK models offers the unique possibility to simultaneously investigate drug-drug interactions in all relevant organs because of the physiological representation of protein-mediated processes.
Collapse
Affiliation(s)
- Michaela Meyer
- Systems Biology and Computational Solutions, Bayer Technology Services GmbH, Building 9115, 51368 Leverkusen, Germany
| | | | | | | | | |
Collapse
|
57
|
Lee-Liu D, Almonacid LI, Faunes F, Melo F, Larrain J. Transcriptomics using next generation sequencing technologies. Methods Mol Biol 2012; 917:293-317. [PMID: 22956096 DOI: 10.1007/978-1-61779-992-1_18] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023]
Abstract
Next generation sequencing technologies may now be applied to the study of transcriptomics. RNA-Seq or RNA sequencing employs high-throughput sequencing of complementary DNA fragments delivering a transcriptional profile. In this chapter, we aim to provide a starting point for Xenopus researchers planning on starting an RNA-Seq transcriptomics study. We begin by providing a section on template isolation and library preparation. The next section comprises the main bioinformatics procedures that need to be performed for raw data processing, normalization, and differential gene expression. Finally, we have included a section on studying deep sequencing results in Xenopus, which offers general guidance as to what can be done in this model.
Collapse
Affiliation(s)
- Dasfne Lee-Liu
- Center for Aging and Regeneration and Millennium Nucleus in Regenerative Biology, Pontificia Universidad Catolica de Chile, Santiago, Chile
| | | | | | | | | |
Collapse
|
58
|
Abstract
BACKGROUND A phylogenetic tree, showing ancestral relations among organisms, is commonly represented as a rooted tree with sets of bifurcating branches (dichotomies) for simplicity, although polytomies (multifurcating branches) may reflect more accurate evolutionary relationships. To represent the true evolutionary relationships, it is important to systematically identify the polytomies from a bifurcating tree and generate a taxonomy-compatible multifurcating tree. For this purpose we propose a novel approach, "PolyPhy", which would classify a set of bifurcating branches of a phylogenetic tree into a set of branches with dichotomies and polytomies by considering genome distances among genomes and tree topological properties. RESULTS PolyPhy employs a machine learning technique, BLR (Bayesian logistic regression) classifier, to identify possible bifurcating subtrees as polytomies from the trees resulted from ComPhy. Other than considering genome-scale distances between all pairs of species, PolyPhy also takes into account different properties of tree topology between dichotomy and polytomy, such as long-branch retraction and short-branch contraction, and quantifies these properties into comparable rates among different sub-branches. We extract three tree topological features, 'LR' (Leaf rate), 'IntraR' (Intra-subset branch rate) and 'InterR' (Inter-subset branch rate), all of which are calculated from bifurcating tree branch sets for classification. We have achieved F-measure (balanced measure between precision and recall) of 81% with about 0.9 area under the curve (AUC) of ROC. CONCLUSIONS PolyPhy is a fast and robust method to identify polytomies from phylogenetic trees based on genome-wide inference of evolutionary relationships among genomes. The software package and test data can be downloaded from http://digbio.missouri.edu/ComPhy/phyloTreeBiNonBi-1.0.zip.
Collapse
Affiliation(s)
- Guan Ning Lin
- Department of Computer Science and C.S. Bond Life Sciences Center, University of Missouri, Columbia, MO 65211, USA
- Department of Psychiatry, University of California, San Diego, CA 92093, USA
| | - Chao Zhang
- Department of Computer Science and C.S. Bond Life Sciences Center, University of Missouri, Columbia, MO 65211, USA
| | - Dong Xu
- Department of Computer Science and C.S. Bond Life Sciences Center, University of Missouri, Columbia, MO 65211, USA
| |
Collapse
|
59
|
OEHMEN CHRISTOPHERS, STRAATSMA TJERKP, ANDERSON GORDONA, ORR GALYA, WEBB-ROBERTSON BOBBIEJOM, TAYLOR RONALDC, MOONEY RYANW, BAXTER DOUGJ, JONES DONALDR, DIXON DAVIDA. NEW CHALLENGES FACING INTEGRATIVE BIOLOGICAL SCIENCE IN THE POST-GENOMIC ERA. J BIOL SYST 2011. [DOI: 10.1142/s0218339006001805] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
The future of biology will be increasingly driven by the fundamental paradigm shift from hypothesis-driven research to data-driven discovery research employing the growing volume of biological data coupled to experimental testing of new discoveries. But hardware and software limitations in the current workflow infrastructure make it impossible or intractible to use real data from disparate sources for large-scale biological research. We identify key technological developments needed to enable this paradigm shift involving (1) the ability to store and manage extremely large datasets which are dispersed over a wide geographical area, (2) development of novel analysis and visualization tools which are capable of operating on enormous data resources without overwhelming researchers with unusable information, and (3) formalisms for integrating mathematical models of biosystems from the molecular level to the organism population level. This will require the development of algorithms and tools which efficiently utilize high-performance compute power and large storage infrastructures. The end result will be the ability of a researcher to integrate complex data from many different sources with simulations to analyze a given system at a wide range of temporal and spatial scales in a single conceptual model.
Collapse
Affiliation(s)
| | | | | | - GALYA ORR
- Pacific Northwest National Laboratory, Richland, WA 99352, USA
| | | | | | - RYAN W. MOONEY
- Pacific Northwest National Laboratory, Richland, WA 99352, USA
| | - DOUG J. BAXTER
- Pacific Northwest National Laboratory, Richland, WA 99352, USA
| | - DONALD R. JONES
- Pacific Northwest National Laboratory, Richland, WA 99352, USA
| | - DAVID A. DIXON
- Department of Chemistry, University of Alabama, Tuscaloosa, AL 35487-0336, USA
| |
Collapse
|
60
|
Benmoyal-Segal L, Soreq L, Ben-Shaul Y, Ben-Ari S, Ben-Moshe T, Aviel S, Bergman H, Soreq H. Adaptive alternative splicing correlates with less environmental risk of parkinsonism. NEURODEGENER DIS 2011; 9:87-98. [PMID: 22042332 DOI: 10.1159/000331328] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2011] [Accepted: 07/27/2011] [Indexed: 01/02/2023] Open
Abstract
BACKGROUND/OBJECTIVE Environmental exposure to anti-acetylcholinesterases (AChEs) aggravates the risk of Parkinsonism due to currently unclear mechanism(s). We explored the possibility that the brain's capacity to induce a widespread adaptive alternative splicing response to such exposure may be involved. METHODS Following exposure to the dopaminergic neurotoxin 1-methyl-4-phenyl-1,2,3,6-tetrahydropyridine (MPTP), brain region transcriptome profiles were tested. RESULTS Changes in transcript profiles, alternative splicing patterns and splicing-related gene categories were identified. Engineered mice over-expressing the protective AChE-R splice variant showed less total changes but more splicing-related ones than hypersensitive AChE-S over-expressors with similarly increased hydrolytic activities. Following MPTP exposure, the substantia nigra and prefrontal cortex (PFC) of both strains showed a nuclear increase in the splicing factor ASF/SF2 protein. Furthermore, intravenous injection with highly purified recombinant human AChE-R changed transcript profiles in the striatum. CONCLUSIONS Our findings are compatible with the working hypothesis that inherited or acquired alternative splicing deficits may promote parkinsonism, and we propose adaptive alternative splicing as a strategy for attenuating its progression.
Collapse
Affiliation(s)
- Liat Benmoyal-Segal
- Department of Biological Chemistry, Life Sciences Institute, Hebrew University of Jerusalem, Jerusalem, Israel
| | | | | | | | | | | | | | | |
Collapse
|
61
|
The roles and evolutionary patterns of intronless genes in deuterostomes. Comp Funct Genomics 2011; 2011:680673. [PMID: 21860604 PMCID: PMC3155783 DOI: 10.1155/2011/680673] [Citation(s) in RCA: 33] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2010] [Revised: 04/13/2011] [Accepted: 06/22/2011] [Indexed: 12/26/2022] Open
Abstract
Genes without introns are a characteristic feature of prokaryotes, but there are still a number of intronless genes in eukaryotes. To study these eukaryotic genes that have prokaryotic architecture could help to understand the evolutionary patterns of related genes and genomes. Our analyses revealed a number of intronless genes that reside in 6 deuterostomes (sea urchin, sea squirt, zebrafish, chicken, platypus, and human). We also determined the conservation for each intronless gene in archaea, bacteria, fungi, plants, metazoans, and other eukaryotes. Proportions of intronless genes that are inherited from the common ancestor of archaea, bacteria, and eukaryotes in these species were consistent with their phylogenetic positions, with more proportions of ancient intronless genes residing in more primitive species. In these species, intronless genes belong to different cellular roles and gene ontology (GO) categories, and some of these functions are very basic. Part of intronless genes is derived from other intronless genes or multiexon genes in each species. In conclusion, we showed that a varying number and proportion of intronless genes reside in these 6 deuterostomes, and some of them function importantly. These genes are good candidates for subsequent functional and evolutionary analyses specifically.
Collapse
|
62
|
Intra J, Perotti ME, Pasini ME. Cloning, sequence identification and expression profile analysis of α-L-fucosidase gene from the Mediterranean fruit fly Ceratitis capitata. JOURNAL OF INSECT PHYSIOLOGY 2011; 57:452-461. [PMID: 21272587 DOI: 10.1016/j.jinsphys.2011.01.007] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/23/2010] [Revised: 01/14/2011] [Accepted: 01/14/2011] [Indexed: 05/30/2023]
Abstract
The Mediterranean fruit fly Ceratitis capitata (Diptera: Tephritidae) is one of the most destructive agricultural pests, a polyphagus insect of relevant economic importance and is widespread in many regions around the world. It is the best-studied fruit fly pest at genetic and molecular level and much has been learned on its ecology and behaviour. An α-L-fucosidase has been recently hypothesized to be involved in sperm-egg interactions in Drosophila melanogaster and in other Drosophila species. Here, a complete cDNA encoding a putative α-L-fucosidase of the medfly was amplified using the reverse polymerase chain reaction (RT-PCR) with degenerate based on the conserved coding sequence information of several insect α-L-fucosidases, cloned and sequenced (GenBank accession no. FJ177429). The coding region consisted of 1482 bp which encoded a 485-residues protein (named CcFUCA) with a predicted molecular mass of 56.1 kDa. The deduced protein sequence showed 75% amino acid identity to D. melanogaster α-L-fucosidase, and in fact the phylogenetic tree analysis revealed that CcFUCA had closer relationships with the α-L-fucosidases of drosophilid species. The tissue expression analysis indicated that CcFuca was expressed in a single transcript in all tissues, suggesting a ubiquitous localization pattern of the encoded protein. Our findings provide novel insights on a gene encoding a protein potentially involved in primary gamete interactions in C. capitata.
Collapse
Affiliation(s)
- Jari Intra
- Department of Biomolecular Sciences and Biotechnology, University of Milano, via Celoria 26, 20133 Milano, Italy
| | | | | |
Collapse
|
63
|
Evans MR, Fink RC, Vazquez-Torres A, Porwollik S, Jones-Carson J, McClelland M, Hassan HM. Analysis of the ArcA regulon in anaerobically grown Salmonella enterica sv. Typhimurium. BMC Microbiol 2011; 11:58. [PMID: 21418628 PMCID: PMC3075218 DOI: 10.1186/1471-2180-11-58] [Citation(s) in RCA: 54] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2010] [Accepted: 03/21/2011] [Indexed: 12/18/2022] Open
Abstract
Background Salmonella enterica serovar Typhimurium (S. Typhimurium) is a Gram-negative pathogen that must successfully adapt to the broad fluctuations in the concentration of dissolved dioxygen encountered in the host. In Escherichia coli, ArcA (Aerobic Respiratory Control) helps the cells to sense and respond to the presence of dioxygen. The global role of ArcA in E. coli is well characterized; however, little is known about its role in anaerobically grown S. Typhimurium. Results We compared the transcriptional profiles of the virulent wild-type (WT) strain (ATCC 14028s) and its isogenic arcA mutant grown under anaerobic conditions. We found that ArcA directly or indirectly regulates 392 genes (8.5% of the genome); of these, 138 genes are poorly characterized. Regulation by ArcA in S. Typhimurium is similar, but distinct from that in E. coli. Thus, genes/operons involved in core metabolic pathways (e.g., succinyl-CoA, fatty acid degradation, cytochrome oxidase complexes, flagellar biosynthesis, motility, and chemotaxis) were regulated similarly in the two organisms. However, genes/operons present in both organisms, but regulated differently by ArcA in S. Typhimurium included those coding for ethanolamine utilization, lactate transport and metabolism, and succinate dehydrogenases. Salmonella-specific genes/operons regulated by ArcA included those required for propanediol utilization, flagellar genes (mcpAC, cheV), Gifsy-1 prophage genes, and three SPI-3 genes (mgtBC, slsA, STM3784). In agreement with our microarray data, the arcA mutant was non-motile, lacked flagella, and was as virulent in mice as the WT. Additionally, we identified a set of 120 genes whose regulation was shared with the anaerobic redox regulator, Fnr. Conclusion(s) We have identified the ArcA regulon in anaerobically grown S. Typhimurium. Our results demonstrated that in S. Typhimurium, ArcA serves as a transcriptional regulator coordinating cellular metabolism, flagella biosynthesis, and motility. Furthermore, ArcA and Fnr share in the regulation of 120 S. Typhimurium genes.
Collapse
Affiliation(s)
- Matthew R Evans
- Department of Microbiology, North Carolina State University, Raleigh, North Carolina 27695-7615, USA
| | | | | | | | | | | | | |
Collapse
|
64
|
Immune-induced evolutionary selection focused on a single reading frame in overlapping hepatitis B virus proteins. J Virol 2011; 85:4558-66. [PMID: 21307195 DOI: 10.1128/jvi.02142-10] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
Viruses employ various means to evade immune detection. Reduction of CD8(+) T cell epitopes is one of the common strategies used for this purpose. Hepatitis B virus (HBV), a member of the Hepadnaviridae family, has four open reading frames, with about 50% overlap between the genes they encode. We computed the CD8(+) T cell epitope density within HBV proteins and the mutations within the epitopes. Our results suggest that HBV accumulates escape mutations that reduce the number of epitopes. These mutations are not equally distributed among genes and reading frames. While the highly expressed core and X proteins are selected to have low epitope density, polymerase, which is expressed at low levels, does not undergo the same selection. In overlapping regions, mutations in one protein-coding sequence also affect the other protein-coding sequence. We show that mutations lead to the removal of epitopes in X and surface proteins even at the expense of the addition of epitopes in polymerase. The total escape mutation rate for overlapping regions is lower than that for nonoverlapping regions. The lower epitope replacement rate for overlapping regions slows the evolutionary escape rate of these regions but leads to the accumulation of mutations more robust in the transfer between hosts, such as mutations preventing proteasomal cleavage into epitopes.
Collapse
|
65
|
Yang JO, Oh S, Ko G, Park SJ, Kim WY, Lee B, Lee S. VnD: a structure-centric database of disease-related SNPs and drugs. Nucleic Acids Res 2011; 39:D939-44. [PMID: 21051351 PMCID: PMC3013797 DOI: 10.1093/nar/gkq957] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2010] [Accepted: 09/30/2010] [Indexed: 11/13/2022] Open
Abstract
Numerous genetic variations have been found to be related to human diseases. Significant portion of those affect the drug response as well by changing the protein structure and function. Therefore, it is crucial to understand the trilateral relationship among genomic variations, diseases and drugs. We present the variations and drugs (VnD), a consolidated database containing information on diseases, related genes and genetic variations, protein structures and drug information. VnD was built in three steps. First, we integrated various resources systematically to deduce catalogs of disease-related genes, single nucleotide polymorphisms (SNPs), protein mutations and relevant drugs. VnD contains 137,195 disease-related gene records (13,940 distinct genes) and 16,586 genetic variation records (1790 distinct variations). Next, we carried out structure modeling and docking simulation for wild-type and mutant proteins to examine the structural and functional consequences of non-synonymous SNPs in the drug-related genes. Conformational changes in 590 wild-type and 4437 mutant proteins from drug-related genes were included in our database. Finally, we investigated the structural and biochemical properties relevant to drug binding such as the distribution of SNPs in proximal protein pockets, thermo-chemical stability, interactions with drugs and physico-chemical properties. The VnD database, available at http://vnd.kobic.re.kr:8080/VnD/ or vandd.org, would be a useful platform for researchers studying the underlying mechanism for association among genetic variations, diseases and drugs.
Collapse
Affiliation(s)
- Jin Ok Yang
- Korean BioInformation Center (KOBIC), Korea Research Institute of Bioscience and Biotechnology (KRIBB), 111 Gwahangno, Yuseong-gu, Daejeon 305–806 and Ewha Research Center for Systems Biology, Division of Life and Pharmaceutical Sciences, Ewha Womans University, Seoul 120–750, Korea
| | - Sangho Oh
- Korean BioInformation Center (KOBIC), Korea Research Institute of Bioscience and Biotechnology (KRIBB), 111 Gwahangno, Yuseong-gu, Daejeon 305–806 and Ewha Research Center for Systems Biology, Division of Life and Pharmaceutical Sciences, Ewha Womans University, Seoul 120–750, Korea
| | - Gunhwan Ko
- Korean BioInformation Center (KOBIC), Korea Research Institute of Bioscience and Biotechnology (KRIBB), 111 Gwahangno, Yuseong-gu, Daejeon 305–806 and Ewha Research Center for Systems Biology, Division of Life and Pharmaceutical Sciences, Ewha Womans University, Seoul 120–750, Korea
| | - Seong-Jin Park
- Korean BioInformation Center (KOBIC), Korea Research Institute of Bioscience and Biotechnology (KRIBB), 111 Gwahangno, Yuseong-gu, Daejeon 305–806 and Ewha Research Center for Systems Biology, Division of Life and Pharmaceutical Sciences, Ewha Womans University, Seoul 120–750, Korea
| | - Woo-Yeon Kim
- Korean BioInformation Center (KOBIC), Korea Research Institute of Bioscience and Biotechnology (KRIBB), 111 Gwahangno, Yuseong-gu, Daejeon 305–806 and Ewha Research Center for Systems Biology, Division of Life and Pharmaceutical Sciences, Ewha Womans University, Seoul 120–750, Korea
| | - Byungwook Lee
- Korean BioInformation Center (KOBIC), Korea Research Institute of Bioscience and Biotechnology (KRIBB), 111 Gwahangno, Yuseong-gu, Daejeon 305–806 and Ewha Research Center for Systems Biology, Division of Life and Pharmaceutical Sciences, Ewha Womans University, Seoul 120–750, Korea
| | - Sanghyuk Lee
- Korean BioInformation Center (KOBIC), Korea Research Institute of Bioscience and Biotechnology (KRIBB), 111 Gwahangno, Yuseong-gu, Daejeon 305–806 and Ewha Research Center for Systems Biology, Division of Life and Pharmaceutical Sciences, Ewha Womans University, Seoul 120–750, Korea
| |
Collapse
|
66
|
Abstract
The rich collection of known genetic information and the recent completion of rice genome sequencing project provided the cereal plant researchers a useful tool to investigate the roles of genes and genomic organization that contribute to numerous agronomic traits. Gramene ( http://www.gramene.org ) is a unique database where users are allowed to query and explore the power of genomic colinearity and comparative genomics for genetic and genomic studies on plant genomes. Gramene presents a wholesome perspective by assimilating data from a broad range of publicly available data sources for cereals like rice, sorghum, maize, wild rice, wheat, oats, barley, and other agronomically important crop plants such as poplar and grape, and the model plant Arabidopsis. As part of the process, it preserves the original data, but also reanalyzes for integration into several knowledge domains of maps, markers, genes, proteins, pathways, phenotypes, including Quantitative Trait Loci (QTL) and genetic diversity/natural variation. This allows researchers to use this information resource to decipher the known and predicted interactions between the components of biological systems, and how these interactions regulate plant development. Using examples from rice, this article describes how the database can be helpful to researchers representing an array of knowledge domains ranging from plant biology, plant breeding, molecular biology, genomics, biochemistry, genetics, bioinformatics, and phylogenomics.
Collapse
Affiliation(s)
- Pankaj Jaiswal
- Department of Botany and Plant Pathology, Oregon State University, Corvallis, OR, USA
| |
Collapse
|
67
|
Kim JS, Kim SJ, Lee SY, Han J, An YR, Kim AR, Hwang SY. Array2GO: a simple web-based tool to search gene ontology for analysis of multi genes expression. BIOCHIP JOURNAL 2010. [DOI: 10.1007/s13206-010-4410-7] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/24/2023]
|
68
|
Pasini ME, Intra J, Gomulski LM, Calvenzani V, Petroni K, Briani F, Perotti ME. Identification and expression profiling of Ceratitis capitata genes coding for β-hexosaminidases. Gene 2010; 473:44-56. [PMID: 21094225 DOI: 10.1016/j.gene.2010.11.003] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2010] [Revised: 11/05/2010] [Accepted: 11/08/2010] [Indexed: 10/18/2022]
Abstract
The goal of this study was to identify the genes coding for β-N-acetylhexosaminidases in the Mediterranean fruit fly (medfly) Ceratitis capitata, one of the most destructive agricultural pests, belonging to the Tephritidae family, order Diptera. Two dimeric β-N-acetylhexosaminidases, HEXA and HEXB, have been recently identified on Drosophila sperm. These enzymes are involved in egg binding through interactions with complementary carbohydrates on the surface of the egg shell. Three genes, Hexosaminidase 1 (Hexo1), Hexosaminidase 2 (Hexo2) and fused lobes (fdl), encode for HEXA and HEXB subunits. The availability of C. capitata EST libraries derived from embryos and adult heads allowed us to identify three sequences homologous to the D. melanogaster Hexo1, Hexo2 and fdl genes. Here, we report the expression profile analysis of CcHexo1, CcHexo2 and Ccfdld in several tissues, organs and stages. Ccfdl expression was highest in heads of both sexes and in whole adult females. In the testis and ovary the three genes showed distinct spatial and temporal expression patterns. All the mRNAs were detectable in early stages of spermatogenesis; CcHexo2 and Ccfdl were also expressed in early elongating spermatid cysts. All three genes are expressed in the ovarian nurse cells. CcHexo1 and Ccfdl are stage specific, since they have been observed in stages 12 and 13 during oocyte growth, when programmed cell death occurs in nurse cells. The expression pattern of the three genes in medfly gonads suggests that, as their Drosophila counterparts, they may encode for proteins involved in gametogenesis and fertilization.
Collapse
Affiliation(s)
- Maria E Pasini
- Department of Biomolecular Sciences and Biotechnology, University of Milano, Milano, Italy.
| | | | | | | | | | | | | |
Collapse
|
69
|
Siwek M, Slawinska A, Nieuwland M, Witkowski A, Zieba G, Minozzi G, Knol EF, Bednarczyk M. A quantitative trait locus for a primary antibody response to keyhole limpet hemocyanin on chicken chromosome 14--confirmation and candidate gene approach. Poult Sci 2010; 89:1850-7. [PMID: 20709969 DOI: 10.3382/ps.2010-00755] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
A QTL involved in the primary antibody response toward keyhole limpet hemocyanin (KLH) was detected on chicken chromosome 14 in the experimental population, which was created by crossing commercial White Leghorn and a Polish native chicken breed (green-legged partridgelike). The current QTL location is a validation of previous experiments pointing to the same genomic location for the QTL linked to a primary antibody response to KLH. An experimental population was typed with microsatellite markers distributed over the chicken chromosome 14. Titers of antibodies binding KLH were measured for all individuals by ELISA. Statistical models applied in the Grid QTL Web-based software were used to analyze the data: a half-sib model, a line-cross model, and combined analysis in a linkage disequilibrium and linkage analysis model. Candidate genes that have been proposed were genotyped with SNP located in genes exons. Statistical analyses of single SNP associations were performed pointing out 2 SNP of an axis inhibitor protein (AXIN1) gene as significantly associated with the trait of an interest.
Collapse
Affiliation(s)
- M Siwek
- Department of Animal Biotechnology, University of Technology and Life Sciences, Mazowiecka 28, 85-225 Bydgoszcz, Poland.
| | | | | | | | | | | | | | | |
Collapse
|
70
|
Tay DMM, Govindarajan KR, Khan AM, Ong TYR, Samad HM, Soh WW, Tong M, Zhang F, Tan TW. T3SEdb: data warehousing of virulence effectors secreted by the bacterial Type III Secretion System. BMC Bioinformatics 2010; 11 Suppl 7:S4. [PMID: 21106126 PMCID: PMC2957687 DOI: 10.1186/1471-2105-11-s7-s4] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022] Open
Abstract
Background Effectors of Type III Secretion System (T3SS) play a pivotal role in establishing and maintaining pathogenicity in the host and therefore the identification of these effectors is important in understanding virulence. However, the effectors display high level of sequence diversity, therefore making the identification a difficult process. There is a need to collate and annotate existing effector sequences in public databases to enable systematic analyses of these sequences for development of models for screening and selection of putative novel effectors from bacterial genomes that can be validated by a smaller number of key experiments. Results Herein, we present T3SEdb http://effectors.bic.nus.edu.sg/T3SEdb, a specialized database of annotated T3SS effector (T3SE) sequences containing 1089 records from 46 bacterial species compiled from the literature and public protein databases. Procedures have been defined for i) comprehensive annotation of experimental status of effectors, ii) submission and curation review of records by users of the database, and iii) the regular update of T3SEdb existing and new records. Keyword fielded and sequence searches (BLAST, regular expression) are supported for both experimentally verified and hypothetical T3SEs. More than 171 clusters of T3SEs were detected based on sequence identity comparisons (intra-cluster difference up to ~60%). Owing to this high level of sequence diversity of T3SEs, the T3SEdb provides a large number of experimentally known effector sequences with wide species representation for creation of effector predictors. We created a reliable effector prediction tool, integrated into the database, to demonstrate the application of the database for such endeavours. Conclusions T3SEdb is the first specialised database reported for T3SS effectors, enriched with manual annotations that facilitated systematic construction of a reliable prediction model for identification of novel effectors. The T3SEdb represents a platform for inclusion of additional annotations of metadata for future developments of sophisticated effector prediction models for screening and selection of putative novel effectors from bacterial genomes/proteomes that can be validated by a small number of key experiments.
Collapse
Affiliation(s)
- Daniel Ming Ming Tay
- Department of Biochemistry, Yong Loo Lin School of Medicine, National University of Singapore, Singapore
| | | | | | | | | | | | | | | | | |
Collapse
|
71
|
Sims D, Bursteinas B, Jain E, Gao Q, Baum B, Zvelebil M. The FLIGHT Drosophila RNAi database: 2010 update. Fly (Austin) 2010; 4:344-8. [PMID: 20855970 PMCID: PMC3174485 DOI: 10.4161/fly.4.4.13303] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2010] [Revised: 08/10/2010] [Accepted: 08/10/2010] [Indexed: 11/19/2022] Open
Abstract
FLIGHT (http://flight.icr.ac.uk/) is an online resource compiling data from high-throughput Drosophila in vivo and in vitro RNAi screens. FLIGHT includes details of RNAi reagents and their predicted off-target effects, alongside RNAi screen hits, scores and phenotypes, including images from high-content screens. The latest release of FLIGHT is designed to enable users to upload, analyze, integrate and share their own RNAi screens. Users can perform multiple normalizations, view quality control plots, detect and assign screen hits and compare hits from multiple screens using a variety of methods including hierarchical clustering. FLIGHT integrates RNAi screen data with microarray gene expression as well as genomic annotations and genetic/physical interaction datasets to provide a single interface for RNAi screen analysis and data-mining in Drosophila.
Collapse
Affiliation(s)
- David Sims
- Breakthrough Breast Cancer Research Centre, The Institute of Cancer Research, London, UK.
| | | | | | | | | | | |
Collapse
|
72
|
Lin YP, Chen LR, Chen CF, Liou JF, Chen YL, Yang JR, Shiue YL. Identification of early transcripts related to male development in chicken embryos. Theriogenology 2010; 74:1161-1178.e1-8. [PMID: 20728927 DOI: 10.1016/j.theriogenology.2010.05.017] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2009] [Revised: 05/08/2010] [Accepted: 05/15/2010] [Indexed: 01/21/2023]
Abstract
Early transcripts related to male development in chicken embryos and their expression profiles were examined. A total of 89 and 127 candidate male development transcripts that represented 83 known and 119 unknown non-redundant sequences, respectively, were characterized in an embryonic day 3 (E3; Hamburger and Hamilton Stage 20: HH20) male-subtract-female complementary DNA library. Of 35 selected transcripts, quantitative reverse transcription-polymerase chain reaction validated that the expression levels of 25 transcripts were higher in male E3 whole embryos than in females (P < 0.05). Twelve of these transcripts mapped to the Z chromosome. At 72 wk of age, 20 and 4 transcripts were expressed at higher levels in the testes and brains of male than in the ovaries and brains of female chickens (P < 0.05), respectively. Whole mount and frozen cross-section in situ hybridization, as well as Western blotting analysis further corroborated that riboflavin kinase (RFK), WD repeat domain 36 (WDR36), and EY505808 transcripts; RFK and WDR36 protein products were predominantly expressed in E7 male gonads. Treatment with an aromatase inhibitor formestane at E4 affected the expression levels at E7 of the coatomer protein complex (subunit beta 1), solute carrier family 35 member F1, LOC427316 and EY505812 transcripts across both sexes (P < 0.05), similar to what was observed for the doublesex and mab-3 related transcription factor 1 gene. The interaction effects of sex by formestane treatment were observed in 15 candidate male development transcripts (P < 0.05). Taken together, we identified a panel of potentially candidate male development transcripts during early chicken embryogenesis; some might be regulated by sex hormones.
Collapse
Affiliation(s)
- Yuan-Ping Lin
- Institute of Biomedical Science, National Sun Yat-sen University, Kaohsiung, Taiwan
| | | | | | | | | | | | | |
Collapse
|
73
|
Rauwerda H, de Jong M, de Leeuw WC, Spaink HP, Breit TM. Integrating heterogeneous sequence information for transcriptome-wide microarray design; a Zebrafish example. BMC Res Notes 2010; 3:192. [PMID: 20626891 PMCID: PMC2913925 DOI: 10.1186/1756-0500-3-192] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2010] [Accepted: 07/13/2010] [Indexed: 11/10/2022] Open
Abstract
Background A complete gene-expression microarray should preferably detect all genomic sequences that can be expressed as RNA in an organism, i.e. the transcriptome. However, our knowledge of a transcriptome of any organism still is incomplete and transcriptome information is continuously being updated. Here, we present a strategy to integrate heterogeneous sequence information that can be used as input for an up-to-date microarray design. Findings Our algorithm consists of four steps. In the first step transcripts from different resources are grouped into Transcription Clusters (TCs) by looking at the similarity of all transcripts. TCs are groups of transcripts with a similar length. If a transcript is much smaller than a TC to which it is highly similar, it will be annotated as a subsequence of that TC and is used for probe design only if the probe designed for the TC does not query the subsequence. Secondly, all TCs are mapped to a genome assembly and gene information is added to the design. Thirdly TC members are ranked according to their trustworthiness and the most reliable sequence is used for the probe design. The last step is the actual array design. We have used this strategy to build an up-to-date zebrafish microarray. Conclusions With our strategy and the software developed, it is possible to use a set of heterogeneous transcript resources for microarray design, reduce the number of candidate target sequences on which the design is based and reduce redundancy. By changing the parameters in the procedure it is possible to control the similarity within the TCs and thus the amount of candidate sequences for the design. The annotation of the microarray is carried out simultaneously with the design.
Collapse
Affiliation(s)
- Han Rauwerda
- Microarray Department & Integrative Bioinformatics Unit, Swammerdam Institute for Life Sciences, Faculty of Science, University of Amsterdam, Amsterdam, The Netherlands.
| | | | | | | | | |
Collapse
|
74
|
Bellott DW, Skaletsky H, Pyntikova T, Mardis ER, Graves T, Kremitzki C, Brown LG, Rozen S, Warren WC, Wilson RK, Page DC. Convergent evolution of chicken Z and human X chromosomes by expansion and gene acquisition. Nature 2010; 466:612-6. [PMID: 20622855 PMCID: PMC2943333 DOI: 10.1038/nature09172] [Citation(s) in RCA: 154] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2010] [Accepted: 05/13/2010] [Indexed: 11/09/2022]
Abstract
In birds, as in mammals, one pair of chromosomes differs between the sexes. In birds, males are ZZ and females ZW. In mammals, males are XY and females XX. Like the mammalian XY pair, the avian ZW pair is believed to have evolved from autosomes, with most change occurring in the chromosomes found in only one sex--the W and Y chromosomes. By contrast, the sex chromosomes found in both sexes--the Z and X chromosomes--are assumed to have diverged little from their autosomal progenitors. Here we report findings that challenge this assumption for both the chicken Z chromosome and the human X chromosome. The chicken Z chromosome, which we sequenced essentially to completion, is less gene-dense than chicken autosomes but contains a massive tandem array containing hundreds of duplicated genes expressed in testes. A comprehensive comparison of the chicken Z chromosome with the finished sequence of the human X chromosome demonstrates that each evolved independently from different portions of the ancestral genome. Despite this independence, the chicken Z and human X chromosomes share features that distinguish them from autosomes: the acquisition and amplification of testis-expressed genes, and a low gene density resulting from an expansion of intergenic regions. These features were not present on the autosomes from which the Z and X chromosomes originated but were instead acquired during the evolution of Z and X as sex chromosomes. We conclude that the avian Z and mammalian X chromosomes followed convergent evolutionary trajectories, despite their evolving with opposite (female versus male) systems of heterogamety. More broadly, in birds and mammals, sex chromosome evolution involved not only gene loss in sex-specific chromosomes, but also marked expansion and gene acquisition in sex chromosomes common to males and females.
Collapse
Affiliation(s)
- Daniel W Bellott
- Department of Biology, Massachusetts Institute of Technology, 9 Cambridge Center, Cambridge, Massachusetts 02142, USA
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
75
|
Kockel L, Kerr KS, Melnick M, Brückner K, Hebrok M, Perrimon N. Dynamic switch of negative feedback regulation in Drosophila Akt-TOR signaling. PLoS Genet 2010; 6:e1000990. [PMID: 20585550 PMCID: PMC2887466 DOI: 10.1371/journal.pgen.1000990] [Citation(s) in RCA: 52] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2009] [Accepted: 05/18/2010] [Indexed: 01/24/2023] Open
Abstract
Akt represents a nodal point between the Insulin receptor and TOR signaling, and its activation by phosphorylation controls cell proliferation, cell size, and metabolism. The activity of Akt must be carefully balanced, as increased Akt signaling is frequently associated with cancer and as insufficient Akt signaling is linked to metabolic disease and diabetes mellitus. Using a genome-wide RNAi screen in Drosophila cells in culture, and in vivo analyses in the third instar wing imaginal disc, we studied the regulatory circuitries that define dAkt activation. We provide evidence that negative feedback regulation of dAkt occurs during normal Drosophila development in vivo. Whereas in cell culture dAkt is regulated by S6 Kinase (S6K)–dependent negative feedback, this feedback inhibition only plays a minor role in vivo. In contrast, dAkt activation under wild-type conditions is defined by feedback inhibition that depends on TOR Complex 1 (TORC1), but is S6K–independent. This feedback inhibition is switched from TORC1 to S6K only in the context of enhanced TORC1 activity, as triggered by mutations in tsc2. These results illustrate how the Akt–TOR pathway dynamically adapts the routing of negative feedback in response to the activity load of its signaling circuit in vivo. The development of multi-cellular organisms depends on the precise choreography of a diverse array of signal transduction pathways. This requires balanced regulation by activating as well as repressing signals. Negative feedback, defined as a signaling response counteracting the stimulus, is a frequently used mechanism to dampen signaling pathway activity. Accordingly, loss of negative feedback is often observed during progression of cancer, while constitutive engagement of negative feedback contributes to chronic loss-of-function phenotypes. Ectopic activation of the Akt–TOR pathway is frequently associated with tumor susceptibility and cancer and contributes to obesity-induced metabolic disease and type II diabetes. Using Drosophila cell culture and the developing fly, we dissect the regulatory circuitry defining negative feedback regulation of dAkt. Our work shows that dAkt activity is regulated by two qualitatively different negative feedback mechanisms and that the activity level of the dAkt pathway dictates which feedback mechanism is utilized. Under normal physiological activity conditions, we observe a feedback mechanism that is dependent on TOR complex 1, but independent of S6K. Under conditions of pathological high pathway activity, we observe an S6K–dependent negative feedback mechanism. Our identification of a quantitative-to-qualitative switch in dAkt–TOR negative feedback signaling might have important implications in the biology of cancer and metabolic diseases.
Collapse
Affiliation(s)
- Lutz Kockel
- Department of Genetics and Howard Hughes Medical Institute, Harvard Medical School, Boston, Massachusetts, United States of America
- Diabetes Center, Department of Medicine, University of California San Francisco, San Francisco, California, United States of America
- * E-mail: (NP); (LK)
| | - Kimberly S. Kerr
- Diabetes Center, Department of Medicine, University of California San Francisco, San Francisco, California, United States of America
| | - Michael Melnick
- Cell Signaling Technology, Beverley, Massachusetts, United States of America
| | - Katja Brückner
- Department of Cell and Tissue Biology, University of California San Francisco, San Francisco, California, United States of America
| | - Matthias Hebrok
- Diabetes Center, Department of Medicine, University of California San Francisco, San Francisco, California, United States of America
| | - Norbert Perrimon
- Department of Genetics and Howard Hughes Medical Institute, Harvard Medical School, Boston, Massachusetts, United States of America
- * E-mail: (NP); (LK)
| |
Collapse
|
76
|
Groeneveld LF, Lenstra JA, Eding H, Toro MA, Scherf B, Pilling D, Negrini R, Finlay EK, Jianlin H, Groeneveld E, Weigend S. Genetic diversity in farm animals--a review. Anim Genet 2010; 41 Suppl 1:6-31. [PMID: 20500753 DOI: 10.1111/j.1365-2052.2010.02038.x] [Citation(s) in RCA: 290] [Impact Index Per Article: 20.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
Abstract
Domestication of livestock species and a long history of migrations, selection and adaptation have created an enormous variety of breeds. Conservation of these genetic resources relies on demographic characterization, recording of production environments and effective data management. In addition, molecular genetic studies allow a comparison of genetic diversity within and across breeds and a reconstruction of the history of breeds and ancestral populations. This has been summarized for cattle, yak, water buffalo, sheep, goats, camelids, pigs, horses, and chickens. Further progress is expected to benefit from advances in molecular technology.
Collapse
Affiliation(s)
- L F Groeneveld
- Institute of Farm Animal Genetics, Friedrich-Loeffler-Institut, Hoeltystr. 10, 31535 Neustadt, Germany
| | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
77
|
Jiang H, Wang F, Dyer NP, Wong WH. CisGenome Browser: a flexible tool for genomic data visualization. Bioinformatics 2010; 26:1781-2. [PMID: 20513664 DOI: 10.1093/bioinformatics/btq286] [Citation(s) in RCA: 33] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
SUMMARY We present an open source, platform independent tool, called CisGenome Browser, which can work together with any other data analysis program to serve as a flexible component for genomic data visualization. It can also work by itself as a standalone genome browser. By working as a light-weight web server, CisGenome Browser is a convenient tool for data sharing between labs. It has features that are specifically designed for ultra high-throughput sequencing data visualization. AVAILABILITY http://biogibbs.stanford.edu/ approximately jiangh/browser/
Collapse
Affiliation(s)
- Hui Jiang
- Department of Statistics, Stanford University, Stanford, CA 94305, USA.
| | | | | | | |
Collapse
|
78
|
CALVO ERIC, SANCHEZ-VARGAS IRMA, KOTSYFAKIS MICHALIS, FAVREAU AMANDAJ, BARBIAN KENTD, PHAM VANM, OLSON KENNETHE, RIBEIRO JOSÉMC. The salivary gland transcriptome of the eastern tree hole mosquito, Ochlerotatus triseriatus. JOURNAL OF MEDICAL ENTOMOLOGY 2010; 47:376-86. [PMID: 20496585 PMCID: PMC3394432 DOI: 10.1603/me09226] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/25/2023]
Abstract
Saliva of blood-sucking arthropods contains a complex mixture of peptides that affect their host's hemostasis, inflammation, and immunity. These activities can also modify the site of pathogen delivery and increase disease transmission. Saliva also induces hosts to mount an antisaliva immune response that can lead to skin allergies or even anaphylaxis. Accordingly, knowledge of the salivary repertoire, or sialome, of a mosquito is useful to provide a knowledge platform to mine for novel pharmacological activities, to develop novel vaccine targets for vector-borne diseases, and to develop epidemiological markers of vector exposure and candidate desensitization vaccines. The mosquito Ochlerotatus triseriatus is a vector of La Crosse virus and produces allergy in humans. In this work, a total of 1,575 clones randomly selected from an adult female O. triseriatus salivary gland cDNA library was sequenced and used to assemble a database that yielded 731 clusters of related sequences, 560 of which were singletons. Primer extension experiments were performed in selected clones to further extend sequence coverage, allowing for the identification of 159 protein sequences, 66 of which code for putative secreted proteins. Supplemental spreadsheets containing these data are available at http://exon.niaid.nih.gov/transcriptome/Ochlerotatus_triseriatus/S1/Ot-S1.xls and http://exon.niaid. nih.gov/transcriptome/Ochlerotatus_triseriatus/S2/Ot-S2.xls.
Collapse
Affiliation(s)
- ERIC CALVO
- Section of Vector Biology, Laboratory of Malaria and Vector Research, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Rockville, MD 20852
| | - IRMA SANCHEZ-VARGAS
- Department of Microbiology, Immunology, and Pathology, Colorado State University, Fort Collins, CO 80523
| | - MICHALIS KOTSYFAKIS
- Section of Vector Biology, Laboratory of Malaria and Vector Research, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Rockville, MD 20852
| | - AMANDA J. FAVREAU
- Genomics Unit, Research Technologies Section, Rocky Mountain Laboratories, Hamilton, MT 59840
| | - KENT D. BARBIAN
- Genomics Unit, Research Technologies Section, Rocky Mountain Laboratories, Hamilton, MT 59840
| | - VAN M. PHAM
- Section of Vector Biology, Laboratory of Malaria and Vector Research, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Rockville, MD 20852
| | - KENNETH E. OLSON
- Department of Microbiology, Immunology, and Pathology, Colorado State University, Fort Collins, CO 80523
| | - JOSÉ M. C. RIBEIRO
- Section of Vector Biology, Laboratory of Malaria and Vector Research, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Rockville, MD 20852
| |
Collapse
|
79
|
Li J, Yu L, Yang J, Dong L, Tian B, Yu Z, Liang L, Zhang Y, Wang X, Zhang K. New insights into the evolution of subtilisin-like serine protease genes in Pezizomycotina. BMC Evol Biol 2010; 10:68. [PMID: 20211028 PMCID: PMC2848655 DOI: 10.1186/1471-2148-10-68] [Citation(s) in RCA: 54] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2009] [Accepted: 03/09/2010] [Indexed: 11/28/2022] Open
Abstract
Background Subtilisin-like serine proteases play an important role in pathogenic fungi during the penetration and colonization of their hosts. In this study, we perform an evolutionary analysis of the subtilisin-like serine protease genes of subphylum Pezizomycotina to find if there are similar pathogenic mechanisms among the pathogenic fungi with different life styles, which utilize subtilisin-like serine proteases as virulence factors. Within Pezizomycotina, nematode-trapping fungi are unique because they capture soil nematodes using specialized trapping devices. Increasing evidence suggests subtilisin-like serine proteases from nematode-trapping fungi are involved in the penetration and digestion of nematode cuticles. Here we also conduct positive selection analysis on the subtilisin-like serine protease genes from nematode-trapping fungi. Results Phylogenetic analysis of 189 subtilisin-like serine protease genes from Pezizomycotina suggests five strongly-supported monophyletic clades. The subtilisin-like serine protease genes previously identified or presumed as endocellular proteases were clustered into one clade and diverged the earliest in the phylogeny. In addition, the cuticle-degrading protease genes from entomopathogenic and nematode-parasitic fungi were clustered together, indicating that they might have overlapping pathogenic mechanisms against insects and nematodes. Our experimental bioassays supported this conclusion. Interestingly, although they both function as cuticle-degrading proteases, the subtilisin-like serine protease genes from nematode-trapping fungi and nematode-parasitic fungi were not grouped together in the phylogenetic tree. Our evolutionary analysis revealed evidence for positive selection on the subtilisin-like serine protease genes of the nematode-trapping fungi. Conclusions Our study provides new insights into the evolution of subtilisin-like serine protease genes in Pezizomycotina. Pezizomycotina subtilisins most likely evolved from endocellular to extracellular proteases. The entomopathogenic and nematode-parasitic fungi likely share similar properties in parasitism. In addition, our data provided better understanding about the duplications and subsequent functional divergence of subtilisin-like serine protease genes in Pezizomycotina. The evidence of positive selection detected in the subtilisin-like serine protease genes of nematode-trapping fungi in the present study suggests that the subtilisin-like serine proteases may have played important roles during the evolution of pathogenicity of nematode-trapping fungi against nematodes.
Collapse
Affiliation(s)
- Juan Li
- Laboratory for Conservation and Utilization of Bio-resources, and Key Laboratory for Microbial Resources of the Ministry of Education, Yunnan University, Kunming, 650091, PR China
| | | | | | | | | | | | | | | | | | | |
Collapse
|
80
|
A knowledge-driven probabilistic framework for the prediction of protein-protein interaction networks. Comput Biol Med 2010; 40:306-17. [PMID: 20138613 DOI: 10.1016/j.compbiomed.2010.01.002] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2008] [Revised: 10/30/2009] [Accepted: 01/01/2010] [Indexed: 11/21/2022]
Abstract
This study applied a knowledge-driven data integration framework for the inference of protein-protein interactions (PPI). Evidence from diverse genomic features is integrated using a knowledge-driven Bayesian network (KD-BN). Receiver operating characteristic (ROC) curves may not be the optimal assessment method to evaluate a classifier's performance in PPI prediction as the majority of the area under the curve (AUC) may not represent biologically meaningful results. It may be of benefit to interpret the AUC of a partial ROC curve whereby biologically interesting results are represented. Therefore, the novel application of the assessment method referred to as the partial ROC has been employed in this study to assess predictive performance of PPI predictions along with calculating the True positive/false positive rate and true positive/positive rate. By incorporating domain knowledge into the construction of the KD-BN, we demonstrate improvement in predictive performance compared with previous studies based upon the Naive Bayesian approach.
Collapse
|
81
|
Zhou X, Chen S, Liu B, Zhang R, Wang Y, Li P, Guo Y, Zhang H, Gao Z, Yan X. Development of traditional Chinese medicine clinical data warehouse for medical knowledge discovery and decision support. Artif Intell Med 2010; 48:139-52. [PMID: 20122820 DOI: 10.1016/j.artmed.2009.07.012] [Citation(s) in RCA: 154] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2008] [Revised: 07/22/2009] [Accepted: 07/23/2009] [Indexed: 01/14/2023]
Abstract
OBJECTIVE Traditional Chinese medicine (TCM) is a scientific discipline, which develops the related theories from the long-term clinical practices. The large-scale clinical data are the core empirical knowledge source for TCM research. This paper introduces a clinical data warehouse (CDW) system, which incorporates the structured electronic medical record (SEMR) data for medical knowledge discovery and TCM clinical decision support (CDS). MATERIALS AND METHODS We have developed the clinical reference information model (RIM) and physical data model to manage the various information entities and their relationships in TCM clinical data. An extraction-transformation-loading (ETL) tool is implemented to integrate and normalize the clinical data from different operational data sources. The CDW includes online analytical processing (OLAP) and complex network analysis (CNA) components to explore the various clinical relationships. Furthermore, the data mining and CNA methods are used to discover the valuable clinical knowledge from the data. RESULTS The CDW has integrated 20,000 TCM inpatient data and 20,000 outpatient data, which contains manifestations (e.g. symptoms, physical examinations and laboratory test results), diagnoses and prescriptions as the main information components. We propose a practical solution to accomplish the large-scale clinical data integration and preprocessing tasks. Meanwhile, we have developed over 400 OLAP reports to enable the multidimensional analysis of clinical data and the case-based CDS. We have successfully conducted several interesting data mining applications. Particularly, we use various classification methods, namely support vector machine, decision tree and Bayesian network, to discover the knowledge of syndrome differentiation. Furthermore, we have applied association rule and CNA to extract the useful acupuncture point and herb combination patterns from the clinical prescriptions. CONCLUSION A CDW system consisting of TCM clinical RIM, ETL, OLAP and data mining as the core components has been developed to facilitate the tasks of TCM knowledge discovery and CDS. We have conducted several OLAP and data mining tasks to explore the empirical knowledge from the TCM clinical data. The CDW platform would be a promising infrastructure to make full use of the TCM clinical data for scientific hypothesis generation, and promote the development of TCM from individualized empirical knowledge to large-scale evidence-based medicine.
Collapse
Affiliation(s)
- Xuezhong Zhou
- School of Computer and Information Technology, Beijing Jiaotong University, Beijing 100044, China
| | | | | | | | | | | | | | | | | | | |
Collapse
|
82
|
The IFITM proteins mediate cellular resistance to influenza A H1N1 virus, West Nile virus, and dengue virus. Cell 2010; 139:1243-54. [PMID: 20064371 PMCID: PMC2824905 DOI: 10.1016/j.cell.2009.12.017] [Citation(s) in RCA: 998] [Impact Index Per Article: 71.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2009] [Revised: 12/01/2009] [Accepted: 12/09/2009] [Indexed: 12/14/2022]
Abstract
Influenza viruses exploit host cell machinery to replicate, resulting in epidemics of respiratory illness. In turn, the host expresses antiviral restriction factors to defend against infection. To find host cell modifiers of influenza A H1N1 viral infection, we used a functional genomic screen and identified over 120 influenza A virus-dependency factors with roles in endosomal acidification, vesicular trafficking, mitochondrial metabolism, and RNA splicing. We discovered that the interferon-inducible transmembrane proteins IFITM1, 2, and 3 restrict an early step in influenza A viral replication. The IFITM proteins confer basal resistance to influenza A virus but are also inducible by interferons type I and II and are critical for interferon's virustatic actions. Further characterization revealed that the IFITM proteins inhibit the early replication of flaviviruses, including dengue virus and West Nile virus. Collectively this work identifies a family of antiviral restriction factors that mediate cellular innate immunity to at least three major human pathogens.
Collapse
|
83
|
Calvo E, Sanchez-Vargas I, Favreau AJ, Barbian KD, Pham VM, Olson KE, Ribeiro JM. An insight into the sialotranscriptome of the West Nile mosquito vector, Culex tarsalis. BMC Genomics 2010; 11:51. [PMID: 20089177 PMCID: PMC2823692 DOI: 10.1186/1471-2164-11-51] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2009] [Accepted: 01/20/2010] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Saliva of adult female mosquitoes help sugar and blood feeding by providing enzymes and polypeptides that help sugar digestion, control microbial growth and counteract their vertebrate host hemostasis and inflammation. Mosquito saliva also potentiates the transmission of vector borne pathogens, including arboviruses. Culex tarsalis is a bird feeding mosquito vector of West Nile Virus closely related to C. quinquefasciatus, a mosquito relatively recently adapted to feed on humans, and the only mosquito of the genus Culex to have its sialotranscriptome so far described. RESULTS A total of 1,753 clones randomly selected from an adult female C. tarsalis salivary glands (SG) cDNA library were sequenced and used to assemble a database that yielded 809 clusters of related sequences, 675 of which were singletons. Primer extension experiments were performed in selected clones to further extend sequence coverage, allowing for the identification of 283 protein sequences, 80 of which code for putative secreted proteins. CONCLUSION Comparison of the C. tarsalis sialotranscriptome with that of C. quinquefasciatus reveals accelerated evolution of salivary proteins as compared to housekeeping proteins. The average amino acid identity among salivary proteins is 70.1%, while that for housekeeping proteins is 91.2% (P < 0.05), and the codon volatility of secreted proteins is significantly higher than those of housekeeping proteins. Several protein families previously found exclusive of mosquitoes, including only in the Aedes genus have been identified in C. tarsalis. Interestingly, a protein family so far unique to C. quinquefasciatus, with 30 genes, is also found in C. tarsalis, indicating it was not a specific C. quinquefasciatus acquisition in its evolution to optimize mammal blood feeding.
Collapse
Affiliation(s)
- Eric Calvo
- Section of Vector Biology, Laboratory of Malaria and Vector Research, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Rockville, MD 20852, USA
| | | | | | | | | | | | | |
Collapse
|
84
|
Martinez-Esteso M, Sellés-Marchart S, Vera-Urbina J, Pedreño M, Bru-Martinez R. Changes of defense proteins in the extracellular proteome of grapevine (Vitis vinifera cv. Gamay) cell cultures in response to elicitors. J Proteomics 2009; 73:331-41. [DOI: 10.1016/j.jprot.2009.10.001] [Citation(s) in RCA: 69] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2009] [Revised: 09/23/2009] [Accepted: 10/01/2009] [Indexed: 10/20/2022]
|
85
|
Andreassen R, Lunner S, Høyheim B. Characterization of full-length sequenced cDNA inserts (FLIcs) from Atlantic salmon (Salmo salar). BMC Genomics 2009; 10:502. [PMID: 19878547 PMCID: PMC2774873 DOI: 10.1186/1471-2164-10-502] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2008] [Accepted: 10/30/2009] [Indexed: 01/08/2023] Open
Abstract
Background Sequencing of the Atlantic salmon genome is now being planned by an international research consortium. Full-length sequenced inserts from cDNAs (FLIcs) are an important tool for correct annotation and clustering of the genomic sequence in any species. The large amount of highly similar duplicate sequences caused by the relatively recent genome duplication in the salmonid ancestor represents a particular challenge for the genome project. FLIcs will therefore be an extremely useful resource for the Atlantic salmon sequencing project. In addition to be helpful in order to distinguish between duplicate genome regions and in determining correct gene structures, FLIcs are an important resource for functional genomic studies and for investigation of regulatory elements controlling gene expression. In contrast to the large number of ESTs available, including the ESTs from 23 developmental and tissue specific cDNA libraries contributed by the Salmon Genome Project (SGP), the number of sequences where the full-length of the cDNA insert has been determined has been small. Results High quality full-length insert sequences from 560 pre-smolt white muscle tissue specific cDNAs were generated, accession numbers [GenBank: BT043497 - BT044056]. Five hundred and ten (91%) of the transcripts were annotated using Gene Ontology (GO) terms and 440 of the FLIcs are likely to contain a complete coding sequence (cCDS). The sequence information was used to identify putative paralogs, characterize salmon Kozak motifs, polyadenylation signal variation and to identify motifs likely to be involved in the regulation of particular genes. Finally, conserved 7-mers in the 3'UTRs were identified, of which some were identical to miRNA target sequences. Conclusion This paper describes the first Atlantic salmon FLIcs from a tissue and developmental stage specific cDNA library. We have demonstrated that many FLIcs contained a complete coding sequence (cCDS). This suggests that the remaining cDNA libraries generated by SGP represent a valuable cCDS FLIc source. The conservation of 7-mers in 3'UTRs indicates that these motifs are functionally important. Identity between some of these 7-mers and miRNA target sequences suggests that they are miRNA targets in Salmo salar transcripts as well.
Collapse
Affiliation(s)
- Rune Andreassen
- BasAM-Genetics, Norwegian School of Veterinary Science, PO Box 8146 DEP, NO-0033 Oslo, Norway.
| | | | | |
Collapse
|
86
|
Co-evolution of KIR2DL3 with HLA-C in a human population retaining minimal essential diversity of KIR and HLA class I ligands. Proc Natl Acad Sci U S A 2009; 106:18692-7. [PMID: 19837691 DOI: 10.1073/pnas.0906051106] [Citation(s) in RCA: 95] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Natural killer (NK) cells contribute to immunity and reproduction. Guiding these functions, and NK cell education, are killer cell Ig-like receptors (KIR), NK cell receptors that recognize HLA class I. In most human populations, these highly polymorphic receptors and ligands combine with extraordinary diversity. To assess how much of this diversity is necessary, we studied KIR and HLA class I at high resolution in the Yucpa, a small South Amerindian population that survived an approximate 15,000-year history of population bottleneck and epidemic infection, including recent viral hepatitis. The Yucpa retain the three major HLA epitopes recognized by KIR. Through balancing selection on a few divergent haplotypes the Yucpa maintain much of the KIR variation found worldwide. HLA-C*07, the strongest educator of C1-specific NK cells, has reached unusually high frequency in the Yucpa. Concomitantly, weaker variants of the C1 receptor, KIR2DL3, were selected and have largely replaced the form of KIR2DL3 brought by the original migrants from Asia. HLA-C1 and KIR2DL3 homozygosity has previously been correlated with resistance to viral hepatitis. Selection of weaker forms of KIR2DL3 in the Yucpa can be seen as compensation for the high frequency of the potent HLA-C*07 ligand. This study provides an estimate of the minimal KIR-HLA system essential for long-term survival of a human population. That it contains all functional elements of KIR diversity worldwide, attests to the competitive advantage it provides, not only for surviving epidemic infections, but also for rebuilding populations once infection has passed.
Collapse
|
87
|
Prakash A, Tompa M. Assessing the discordance of multiple sequence alignments. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2009; 6:542-551. [PMID: 19875854 DOI: 10.1109/tcbb.2007.70271] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/28/2023]
Abstract
Multiple sequence alignments have wide applicability in many areas of computational biology, including comparative genomics, functional annotation of proteins, gene finding, and modeling evolutionary processes. Because of the computational difficulty of multiple sequence alignment and the availability of numerous tools, it is critical to be able to assess the reliability of multiple alignments. We present a tool called StatSigMA to assess whether multiple alignments of nucleotide or amino acid sequences are contaminated with one or more unrelated sequences. There are numerous applications for which StatSigMA can be used. Two such applications are to distinguish homologous sequences from nonhomologous ones and to compare alignments produced by various multiple alignment tools. We present examples of both types of applications.
Collapse
Affiliation(s)
- Amol Prakash
- Biomarker Research Initiative in Mass Spectrometry Center, Thermo, 790 Memorial Drive, Suite 201, Cambridge, MA 02139, USA.
| | | |
Collapse
|
88
|
Zhao Y, Wei W, Lee IM, Shao J, Suo X, Davis RE. Construction of an interactive online phytoplasma classification tool, iPhyClassifier, and its application in analysis of the peach X-disease phytoplasma group (16SrIII). Int J Syst Evol Microbiol 2009; 59:2582-93. [PMID: 19622670 PMCID: PMC2884932 DOI: 10.1099/ijs.0.010249-0] [Citation(s) in RCA: 122] [Impact Index Per Article: 8.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Phytoplasmas, the causal agents of numerous plant diseases, are insect-vector-transmitted, cell-wall-less bacteria descended from ancestral low-G+C-content Gram-positive bacteria in the Bacillus-Clostridium group. Despite their monophyletic origin, widely divergent phytoplasma lineages have evolved in adaptation to specific ecological niches. Classification and taxonomic assignment of phytoplasmas have been based primarily on molecular analysis of 16S rRNA gene sequences because of the inaccessibility of measurable phenotypic characters suitable for conventional microbial characterization. In the present study, an interactive online tool, iPhyClassifier, was developed to expand the efficacy and capacity of the current 16S rRNA gene sequence-based phytoplasma classification system. iPhyClassifier performs sequence similarity analysis, simulates laboratory restriction enzyme digestions and subsequent gel electrophoresis and generates virtual restriction fragment length polymorphism (RFLP) profiles. Based on calculated RFLP pattern similarity coefficients and overall sequence similarity scores, iPhyClassifier makes instant suggestions on tentative phytoplasma 16Sr group/subgroup classification status and 'Candidatus Phytoplasma' species assignment. Using iPhyClassifier, we revised and updated the classification of strains affiliated with the peach X-disease phytoplasma group. The online tool can be accessed at http://www.ba.ars.usda.gov/data/mppl/iPhyClassifier.html.
Collapse
Affiliation(s)
- Yan Zhao
- Molecular Plant Pathology Laboratory, USDA-Agricultural Research Service, Beltsville, MD 20705, USA.
| | | | | | | | | | | |
Collapse
|
89
|
Systematic and single cell analysis of Xenopus Piwi-interacting RNAs and Xiwi. EMBO J 2009; 28:2945-58. [PMID: 19713941 DOI: 10.1038/emboj.2009.237] [Citation(s) in RCA: 73] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2009] [Accepted: 07/22/2009] [Indexed: 12/25/2022] Open
Abstract
Piwi proteins and Piwi-interacting RNAs (piRNAs) are essential for germ cell development, but analysis of the molecular mechanisms of these ribonucleoproteins remains challenging in most animal germ cells. To address this challenge, we systematically characterized Xiwi, a Xenopus Piwi homologue, and piRNAs from Xenopus eggs and oocytes. We used the large size of Xenopus eggs to analyze small RNAs at the single cell level, and find abundant piRNAs and large piRNA clusters in the Xenopus tropicalis genome, some of which resemble the Drosophila piRNA-generating flamenco locus. Although most piRNA clusters are expressed simultaneously in an egg, individual frogs show distinct profiles of cluster expression. Xiwi is associated with microtubules and the meiotic spindle, and is localized to the germ plasm--a cytoplasmic determinant of germ cell formation. Xiwi associates with translational regulators in an RNA-dependent manner, but Xenopus tudor interacts with Xiwi independently of RNA. Our study adds insight to piRNA transcription regulation by showing that individual animals can have differential piRNA expression profiles. We suggest that in addition to regulating transposable elements, Xiwi may function in specifying RNA localization in vertebrate oocytes.
Collapse
|
90
|
Abstract
In the present study we have examined human-mouse homologous intronless disease and non-disease genes alongside their extent of sequence conservation, tissue expression, domain and gene ontology composition to get an idea regarding evolutionary and functional attributes. We show that selection has significantly discriminated between the two groups and the disease associated genes in particular exhibit lower K(a) and K(a)/K(s) while K(s) although smaller is not significantly different. Our analyses suggest that majority of disease related intronless human genes have homology limited to eukaryotic genomes and their expression is localized. Also we observed that different classes of intronless disease related genes have experienced diverse selective pressures and are enriched for higher level functionality that is essentially needed for developmental processes in complex organisms. It is expected that these insights will enhance our understanding of the nature of these genes and also improve our ability to identify disease related intronless genes.
Collapse
Affiliation(s)
- Subhash Mohan Agarwal
- Center for Computational Biology and Bioinformatics, School of Information Technology, Jawaharlal Nehru University, New Delhi 110067, India.
| | | |
Collapse
|
91
|
Minervini G, Evangelista G, Villanova L, Slanzi D, De Lucrezia D, Poli I, Luisi PL, Polticelli F. Massive non-natural proteins structure prediction using grid technologies. BMC Bioinformatics 2009; 10 Suppl 6:S22. [PMID: 19534748 PMCID: PMC2697646 DOI: 10.1186/1471-2105-10-s6-s22] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The number of natural proteins represents a small fraction of all the possible protein sequences and there is an enormous number of proteins never sampled by nature, the so called "never born proteins" (NBPs). A fundamental question in this regard is if the ensemble of natural proteins possesses peculiar chemical and physical properties or if it is just the product of contingency coupled to functional selection. A key feature of natural proteins is their ability to form a well defined three-dimensional structure. Thus, the structural study of NBPs can help to understand if natural protein sequences were selected for their peculiar properties or if they are just one of the possible stable and functional ensembles. METHODS The structural characterization of a huge number of random proteins cannot be approached experimentally, thus the problem has been tackled using a computational approach. A large random protein sequences library (2 x 10(4) sequences) was generated, discarding amino acid sequences with significant similarity to natural proteins, and the corresponding structures were predicted using Rosetta. Given the highly computational demanding problem, Rosetta was ported in grid and a user friendly job submission environment was developed within the GENIUS Grid Portal. Protein structures generated were analysed in terms of net charge, secondary structure content, surface/volume ratio, hydrophobic core composition, etc. RESULTS The vast majority of NBPs, according to the Rosetta model, are characterized by a compact three-dimensional structure with a high secondary structure content. Structure compactness and surface polarity are comparable to those of natural proteins, suggesting similar stability and solubility. Deviations are observed in alpha helix-beta strands relative content and in hydrophobic core composition, as NBPs appear to be richer in helical structure and aromatic amino acids with respect to natural proteins. CONCLUSION The results obtained suggest that the ability to form a compact, ordered and water-soluble structure is an intrinsic property of polypeptides. The tendency of random sequences to adopt alpha helical folds indicate that all-alpha proteins may have emerged early in pre-biotic evolution. Further, the lower percentage of aromatic residues observed in natural proteins has important evolutionary implications as far as tolerance to mutations is concerned.
Collapse
Affiliation(s)
- Giovanni Minervini
- Department of Biology, University Roma Tre, Viale G, Marconi 446, Rome, I-00146, Italy.
| | | | | | | | | | | | | | | |
Collapse
|
92
|
Benita Y, Kikuchi H, Smith AD, Zhang MQ, Chung DC, Xavier RJ. An integrative genomics approach identifies Hypoxia Inducible Factor-1 (HIF-1)-target genes that form the core response to hypoxia. Nucleic Acids Res 2009; 37:4587-602. [PMID: 19491311 PMCID: PMC2724271 DOI: 10.1093/nar/gkp425] [Citation(s) in RCA: 351] [Impact Index Per Article: 23.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023] Open
Abstract
The transcription factor Hypoxia-inducible factor 1 (HIF-1) plays a central role in the transcriptional response to oxygen flux. To gain insight into the molecular pathways regulated by HIF-1, it is essential to identify the downstream-target genes. We report here a strategy to identify HIF-1-target genes based on an integrative genomic approach combining computational strategies and experimental validation. To identify HIF-1-target genes microarrays data sets were used to rank genes based on their differential response to hypoxia. The proximal promoters of these genes were then analyzed for the presence of conserved HIF-1-binding sites. Genes were scored and ranked based on their response to hypoxia and their HIF-binding site score. Using this strategy we recovered 41% of the previously confirmed HIF-1-target genes that responded to hypoxia in the microarrays and provide a catalogue of predicted HIF-1 targets. We present experimental validation for ANKRD37 as a novel HIF-1-target gene. Together these analyses demonstrate the potential to recover novel HIF-1-target genes and the discovery of mammalian-regulatory elements operative in the context of microarray data sets.
Collapse
Affiliation(s)
- Yair Benita
- Center for Computational and Integrative Biology, Massachusetts General Hospital, Harvard Medical School, Boston, MA 02114, USA
| | | | | | | | | | | |
Collapse
|
93
|
Antezana E, Egaña M, Blondé W, Illarramendi A, Bilbao I, De Baets B, Stevens R, Mironov V, Kuiper M. The Cell Cycle Ontology: an application ontology for the representation and integrated analysis of the cell cycle process. Genome Biol 2009; 10:R58. [PMID: 19480664 PMCID: PMC2718524 DOI: 10.1186/gb-2009-10-5-r58] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2008] [Revised: 04/17/2009] [Accepted: 05/29/2009] [Indexed: 01/26/2023] Open
Abstract
A software resource for the analysis of cell cycle related molecular networks. The Cell Cycle Ontology ( is an application ontology that automatically captures and integrates detailed knowledge on the cell cycle process. Cell Cycle Ontology is enabled by semantic web technologies, and is accessible via the web for browsing, visualizing, advanced querying, and computational reasoning. Cell Cycle Ontology facilitates a detailed analysis of cell cycle-related molecular network components. Through querying and automated reasoning, it may provide new hypotheses to help steer a systems biology approach to biological network building.
Collapse
Affiliation(s)
- Erick Antezana
- Department of Plant Systems Biology, VIB, Technologiepark 927, B-9052 Gent, Belgium.
| | | | | | | | | | | | | | | | | |
Collapse
|
94
|
Forsythe IJ, Wishart DS. Exploring human metabolites using the human metabolome database. ACTA ACUST UNITED AC 2009; Chapter 14:14.8.1-14.8.45. [PMID: 19274632 DOI: 10.1002/0471250953.bi1408s25] [Citation(s) in RCA: 34] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
The Human Metabolome Database (HMDB) is a Web-based bioinformatic/cheminformatic resource with detailed information about human metabolites and metabolic enzymes. It can be used for fields of study including metabolomics, biochemistry, clinical chemistry, biomarker discovery, medicine, nutrition, and general education. In addition to its comprehensive literature-derived data, the HMDB contains an extensive collection of experimental metabolite concentration data for plasma, urine, CSF, and/or other biofluids The HMDB is fully searchable, with many tools for viewing, sorting and extracting metabolite names, chemical structures, biofluid concentrations, enzymes, genes, NMR or MS spectra, and disease information. Each metabolite entry in the HMDB contains an average of 90 separate data fields including a comprehensive compound description, names and synonyms, chemical structure information, physico-chemical data, reference NMR and MS spectra, normal and abnormal biofluid concentrations, tissue locations, disease associations, pathway information, enzyme data, gene sequence data, and SNP and mutation data, as well as extensive links to images, references and other public databases.
Collapse
Affiliation(s)
- Ian J Forsythe
- Genome Alberta, Department of Computing Science, University of Alberta, Edmonton, Alberta, Canada
| | | |
Collapse
|
95
|
Koo QY, Khan AM, Jung KO, Ramdas S, Miotto O, Tan TW, Brusic V, Salmon J, August JT. Conservation and variability of West Nile virus proteins. PLoS One 2009; 4:e5352. [PMID: 19401763 PMCID: PMC2670515 DOI: 10.1371/journal.pone.0005352] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2009] [Accepted: 03/10/2009] [Indexed: 12/02/2022] Open
Abstract
West Nile virus (WNV) has emerged globally as an increasingly important pathogen for humans and domestic animals. Studies of the evolutionary diversity of the virus over its known history will help to elucidate conserved sites, and characterize their correspondence to other pathogens and their relevance to the immune system. We describe a large-scale analysis of the entire WNV proteome, aimed at identifying and characterizing evolutionarily conserved amino acid sequences. This study, which used 2,746 WNV protein sequences collected from the NCBI GenPept database, focused on analysis of peptides of length 9 amino acids or more, which are immunologically relevant as potential T-cell epitopes. Entropy-based analysis of the diversity of WNV sequences, revealed the presence of numerous evolutionarily stable nonamer positions across the proteome (entropy value of ≤1). The representation (frequency) of nonamers variant to the predominant peptide at these stable positions was, generally, low (≤10% of the WNV sequences analyzed). Eighty-eight fragments of length 9–29 amino acids, representing ∼34% of the WNV polyprotein length, were identified to be identical and evolutionarily stable in all analyzed WNV sequences. Of the 88 completely conserved sequences, 67 are also present in other flaviviruses, and several have been associated with the functional and structural properties of viral proteins. Immunoinformatic analysis revealed that the majority (78/88) of conserved sequences are potentially immunogenic, while 44 contained experimentally confirmed human T-cell epitopes. This study identified a comprehensive catalogue of completely conserved WNV sequences, many of which are shared by other flaviviruses, and majority are potential epitopes. The complete conservation of these immunologically relevant sequences through the entire recorded WNV history suggests they will be valuable as components of peptide-specific vaccines or other therapeutic applications, for sequence-specific diagnosis of a wide-range of Flavivivirus infections, and for studies of homologous sequences among other flaviviruses.
Collapse
Affiliation(s)
- Qi Ying Koo
- Department of Biochemistry, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore
| | - Asif M. Khan
- Department of Biochemistry, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore
| | - Keun-Ok Jung
- Department of Pharmacology and Molecular Sciences, The Johns Hopkins University School of Medicine, Baltimore, Maryland, United States of America
| | - Shweta Ramdas
- Department of Biochemistry, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore
| | - Olivo Miotto
- Department of Biochemistry, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore
- MRC Centre for Genomics and Global Health, University of Oxford, Oxford, United Kingdom
- Mahidol-Oxford Research Unit, Faculty of Tropical Medicine, Mahidol University, Bangkok, Thailand
| | - Tin Wee Tan
- Department of Biochemistry, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore
| | - Vladimir Brusic
- Cancer Vaccine Center, Dana-Farber Cancer Institute, Boston, Massachusetts, United States of America
| | - Jerome Salmon
- Department of Pharmacology and Molecular Sciences, The Johns Hopkins University School of Medicine, Baltimore, Maryland, United States of America
| | - J. Thomas August
- Department of Pharmacology and Molecular Sciences, The Johns Hopkins University School of Medicine, Baltimore, Maryland, United States of America
- * E-mail:
| |
Collapse
|
96
|
Bordoli L, Kiefer F, Arnold K, Benkert P, Battey J, Schwede T. Protein structure homology modeling using SWISS-MODEL workspace. Nat Protoc 2009; 4:1-13. [PMID: 19131951 DOI: 10.1038/nprot.2008.197] [Citation(s) in RCA: 912] [Impact Index Per Article: 60.8] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Homology modeling aims to build three-dimensional protein structure models using experimentally determined structures of related family members as templates. SWISS-MODEL workspace is an integrated Web-based modeling expert system. For a given target protein, a library of experimental protein structures is searched to identify suitable templates. On the basis of a sequence alignment between the target protein and the template structure, a three-dimensional model for the target protein is generated. Model quality assessment tools are used to estimate the reliability of the resulting models. Homology modeling is currently the most accurate computational method to generate reliable structural models and is routinely used in many biological applications. Typically, the computational effort for a modeling project is less than 2 h. However, this does not include the time required for visualization and interpretation of the model, which may vary depending on personal experience working with protein structures.
Collapse
Affiliation(s)
- Lorenza Bordoli
- Biozentrum, University of Basel, Klingelbergstrasse 50-70, CH 4056 Basel, Switzerland
| | | | | | | | | | | |
Collapse
|
97
|
Nikolaou E, Agrafioti I, Stumpf M, Quinn J, Stansfield I, Brown AJP. Phylogenetic diversity of stress signalling pathways in fungi. BMC Evol Biol 2009; 9:44. [PMID: 19232129 PMCID: PMC2666651 DOI: 10.1186/1471-2148-9-44] [Citation(s) in RCA: 143] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2008] [Accepted: 02/21/2009] [Indexed: 01/05/2023] Open
Abstract
Background Microbes must sense environmental stresses, transduce these signals and mount protective responses to survive in hostile environments. In this study we have tested the hypothesis that fungal stress signalling pathways have evolved rapidly in a niche-specific fashion that is independent of phylogeny. To test this hypothesis we have compared the conservation of stress signalling molecules in diverse fungal species with their stress resistance. These fungi, which include ascomycetes, basidiomycetes and microsporidia, occupy highly divergent niches from saline environments to plant or mammalian hosts. Results The fungi displayed significant variation in their resistance to osmotic (NaCl and sorbitol), oxidative (H2O2 and menadione) and cell wall stresses (Calcofluor White and Congo Red). There was no strict correlation between fungal phylogeny and stress resistance. Rather, the human pathogens tended to be more resistant to all three types of stress, an exception being the sensitivity of Candida albicans to the cell wall stress, Calcofluor White. In contrast, the plant pathogens were relatively sensitive to oxidative stress. The degree of conservation of osmotic, oxidative and cell wall stress signalling pathways amongst the eighteen fungal species was examined. Putative orthologues of functionally defined signalling components in Saccharomyces cerevisiae were identified by performing reciprocal BLASTP searches, and the percent amino acid identities of these orthologues recorded. This revealed that in general, central components of the osmotic, oxidative and cell wall stress signalling pathways are relatively well conserved, whereas the sensors lying upstream and transcriptional regulators lying downstream of these modules have diverged significantly. There was no obvious correlation between the degree of conservation of stress signalling pathways and the resistance of a particular fungus to the corresponding stress. Conclusion Our data are consistent with the hypothesis that fungal stress signalling components have undergone rapid recent evolution to tune the stress responses in a niche-specific fashion.
Collapse
Affiliation(s)
- Elissavet Nikolaou
- Aberdeen Fungal Group, School of Medical Sciences, University of Aberdeen, Institute of Medical Sciences, Foresterhill, Aberdeen, AB25 2ZD, UK.
| | | | | | | | | | | |
Collapse
|
98
|
Faunes F, Sánchez N, Castellanos J, Vergara IA, Melo F, Larraín J. Identification of novel transcripts with differential dorso-ventral expression in Xenopus gastrula using serial analysis of gene expression. Genome Biol 2009; 10:R15. [PMID: 19210784 PMCID: PMC2688288 DOI: 10.1186/gb-2009-10-2-r15] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2008] [Revised: 11/25/2008] [Accepted: 02/11/2009] [Indexed: 11/12/2022] Open
Abstract
Comparison of dorsal and ventral transcriptomes of Xenopus tropicalis gastrulae using serial analysis of gene expression provides at least 86 novel differentially expressed transcripts. Background Recent evidence from global studies of gene expression indicates that transcriptomes are more complex than expected. Xenopus has been typically used as a model organism to study early embryonic development, particularly dorso-ventral patterning. In order to identify novel transcripts involved in dorso-ventral patterning, we compared dorsal and ventral transcriptomes of Xenopus tropicalis at the gastrula stage using serial analysis of gene expression (SAGE). Results Of the experimental tags, 54.5% were confidently mapped to transcripts and 125 showed a significant difference in their frequency of occurrence between dorsal and ventral libraries. We selected 20 differentially expressed tags and assigned them to specific transcripts using bioinformatics and reverse SAGE. Five mapped to transcripts with known dorso-ventral expression and the frequency of appearance for these tags in each library is in agreement with the expression described by other methods. The other 15 tags mapped to transcripts with no previously described asymmetric expression along the dorso-ventral axis. The differential expression of ten of these novel transcripts was validated by in situ hybridization and/or RT-PCR. We can estimate that this SAGE experiment provides a list of at least 86 novel transcripts with differential expression along the dorso-ventral axis. Interestingly, the expression of some novel transcripts was independent of β-catenin. Conclusions Our SAGE analysis provides a list of novel transcripts with differential expression in the dorso-ventral axis and a large number of orphan tags that can be used to identify novel transcripts and to improve the current annotation of the X. tropicalis genome.
Collapse
Affiliation(s)
- Fernando Faunes
- Center for Cell Regulation and Pathology and Center for Aging and Regeneration, Facultad de Ciencias Biológicas, Pontificia Universidad Católica de Chile, Alameda 340, Santiago, 8331150, Chile
| | | | | | | | | | | |
Collapse
|
99
|
Bernthaler A, Mühlberger I, Fechete R, Perco P, Lukas A, Mayer B. A dependency graph approach for the analysis of differential gene expression profiles. MOLECULAR BIOSYSTEMS 2009; 5:1720-31. [PMID: 19585005 DOI: 10.1039/b903109j] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Affiliation(s)
- Andreas Bernthaler
- Theory and Logics Group, Institute of Computer Languages, Vienna University of Technology, Favoritenstrasse 9-11, A-1040 Vienna, Austria.
| | | | | | | | | | | |
Collapse
|
100
|
Roca AI, Almada AE, Abajian AC. ProfileGrids as a new visual representation of large multiple sequence alignments: a case study of the RecA protein family. BMC Bioinformatics 2008; 9:554. [PMID: 19102758 PMCID: PMC2663765 DOI: 10.1186/1471-2105-9-554] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2008] [Accepted: 12/22/2008] [Indexed: 01/12/2023] Open
Abstract
Background Multiple sequence alignments are a fundamental tool for the comparative analysis of proteins and nucleic acids. However, large data sets are no longer manageable for visualization and investigation using the traditional stacked sequence alignment representation. Results We introduce ProfileGrids that represent a multiple sequence alignment as a matrix color-coded according to the residue frequency occurring at each column position. JProfileGrid is a Java application for computing and analyzing ProfileGrids. A dynamic interaction with the alignment information is achieved by changing the ProfileGrid color scheme, by extracting sequence subsets at selected residues of interest, and by relating alignment information to residue physical properties. Conserved family motifs can be identified by the overlay of similarity plot calculations on a ProfileGrid. Figures suitable for publication can be generated from the saved spreadsheet output of the colored matrices as well as by the export of conservation information for use in the PyMOL molecular visualization program. We demonstrate the utility of ProfileGrids on 300 bacterial homologs of the RecA family – a universally conserved protein involved in DNA recombination and repair. Careful attention was paid to curating the collected RecA sequences since ProfileGrids allow the easy identification of rare residues in an alignment. We relate the RecA alignment sequence conservation to the following three topics: the recently identified DNA binding residues, the unexplored MAW motif, and a unique Bacillus subtilis RecA homolog sequence feature. Conclusion ProfileGrids allow large protein families to be visualized more effectively than the traditional stacked sequence alignment form. This new graphical representation facilitates the determination of the sequence conservation at residue positions of interest, enables the examination of structural patterns by using residue physical properties, and permits the display of rare sequence features within the context of an entire alignment. JProfileGrid is free for non-commercial use and is available from . Furthermore, we present a curated RecA protein collection that is more diverse than previous data sets; and, therefore, this RecA ProfileGrid is a rich source of information for nanoanatomy analysis.
Collapse
Affiliation(s)
- Alberto I Roca
- Department of Molecular Biology and Biochemistry, 560 Steinhaus Hall, University of California, Irvine, California 92697-3900, USA.
| | | | | |
Collapse
|