1
|
The circadian clock and darkness control natural competence in cyanobacteria. Nat Commun 2020; 11:1688. [PMID: 32245943 PMCID: PMC7125226 DOI: 10.1038/s41467-020-15384-9] [Citation(s) in RCA: 52] [Impact Index Per Article: 10.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2019] [Accepted: 03/05/2020] [Indexed: 11/15/2022] Open
Abstract
The cyanobacterium Synechococcus elongatus is a model organism for the study of circadian rhythms. It is naturally competent for transformation—that is, it takes up DNA from the environment, but the underlying mechanisms are unclear. Here, we use a genome-wide screen to identify genes required for natural transformation in S. elongatus, including genes encoding a conserved Type IV pilus, genes known to be associated with competence in other bacteria, and others. Pilus biogenesis occurs daily in the morning, while natural transformation is maximal when the onset of darkness coincides with the dusk circadian peak. Thus, the competence state in cyanobacteria is regulated by the circadian clock and can adapt to seasonal changes of day length. The cyanobacterium Synechococcus elongatus is a model organism for the study of circadian rhythms, and is naturally competent for transformation. Here, Taton et al. identify genes required for natural transformation in this organism, and show that the coincidence of circadian dusk and darkness regulates the competence state in different day lengths.
Collapse
|
2
|
Elhai J, Khudyakov I. Ancient association of cyanobacterial multicellularity with the regulator HetR and an RGSGR pentapeptide-containing protein (PatX). Mol Microbiol 2018; 110:931-954. [PMID: 29885033 DOI: 10.1111/mmi.14003] [Citation(s) in RCA: 25] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 06/04/2018] [Indexed: 12/14/2022]
Abstract
One simple model to explain biological pattern postulates the existence of a stationary regulator of differentiation that positively affects its own expression, coupled with a diffusible suppressor of differentiation that inhibits the regulator's expression. The first has been identified in the filamentous, heterocyst-forming cyanobacterium, Anabaena PCC 7120 as the transcriptional regulator, HetR and the second as the small protein, PatS, which contains a critical RGSGR motif that binds to HetR. HetR is present in almost all filamentous cyanobacteria, but only a subset of heterocyst-forming strains carry proteins similar to PatS. We identified a third protein, PatX that also carries the RGSGR motif and is coextensive with HetR. Amino acid sequences of PatX contain two conserved regions: the RGSGR motif and a hydrophobic N-terminus. Within 69 nt upstream from all instances of the gene is a DIF1 motif correlated in Anabaena with promoter induction in developing heterocysts, preceded in heterocyst-forming strains by an apparent NtcA-binding site, associated with regulation by nitrogen-status. Consistent with a role in the simple model, PatX is expressed dependent on HetR and acts to inhibit differentiation. The acquisition of the PatX/HetR pair preceded the appearance of both PatS and heterocysts, dating back to the beginnings of multicellularity.
Collapse
Affiliation(s)
- Jeff Elhai
- Center for the Study of Biological Complexity, Virginia Commonwealth University, Richmond, VA, 23284, USA
| | - Ivan Khudyakov
- All-Russia Research Institute for Agricultural Microbiology, Saint-Petersburg, 196608, Russia
| |
Collapse
|
3
|
Khomtchouk BB, Weitz E, Karp PD, Wahlestedt C. How the strengths of Lisp-family languages facilitate building complex and flexible bioinformatics applications. Brief Bioinform 2018; 19:537-543. [PMID: 28040748 PMCID: PMC5952920 DOI: 10.1093/bib/bbw130] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2016] [Revised: 11/16/2016] [Indexed: 11/14/2022] Open
Abstract
We present a rationale for expanding the presence of the Lisp family of programming languages in bioinformatics and computational biology research. Put simply, Lisp-family languages enable programmers to more quickly write programs that run faster than in other languages. Languages such as Common Lisp, Scheme and Clojure facilitate the creation of powerful and flexible software that is required for complex and rapidly evolving domains like biology. We will point out several important key features that distinguish languages of the Lisp family from other programming languages, and we will explain how these features can aid researchers in becoming more productive and creating better code. We will also show how these features make these languages ideal tools for artificial intelligence and machine learning applications. We will specifically stress the advantages of domain-specific languages (DSLs): languages that are specialized to a particular area, and thus not only facilitate easier research problem formulation, but also aid in the establishment of standards and best programming practices as applied to the specific research field at hand. DSLs are particularly easy to build in Common Lisp, the most comprehensive Lisp dialect, which is commonly referred to as the 'programmable programming language'. We are convinced that Lisp grants programmers unprecedented power to build increasingly sophisticated artificial intelligence systems that may ultimately transform machine learning and artificial intelligence research in bioinformatics and computational biology.
Collapse
Affiliation(s)
- Bohdan B Khomtchouk
- Center for Therapeutic Innovation and Department of Psychiatry and Behavioral Sciences, University of Miami Miller School of Medicine, 1120 NW 14th St., Miami, FL, USA
| | - Edmund Weitz
- Center for Therapeutic Innovation and Department of Psychiatry and Behavioral Sciences, University of Miami Miller School of Medicine, 1120 NW 14th St., Miami, FL, USA
| | - Peter D Karp
- Center for Therapeutic Innovation and Department of Psychiatry and Behavioral Sciences, University of Miami Miller School of Medicine, 1120 NW 14th St., Miami, FL, USA
| | - Claes Wahlestedt
- Center for Therapeutic Innovation and Department of Psychiatry and Behavioral Sciences, University of Miami Miller School of Medicine, 1120 NW 14th St., Miami, FL, USA
| |
Collapse
|
4
|
|
5
|
Peter AP, Lakshmanan K, Mohandass S, Varadharaj S, Thilagar S, Abdul Kareem KA, Dharmar P, Gopalakrishnan S, Lakshmanan U. Cyanobacterial KnowledgeBase (CKB), a Compendium of Cyanobacterial Genomes and Proteomes. PLoS One 2015; 10:e0136262. [PMID: 26305368 PMCID: PMC4549288 DOI: 10.1371/journal.pone.0136262] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2014] [Accepted: 08/03/2015] [Indexed: 12/18/2022] Open
Abstract
Cyanobacterial KnowledgeBase (CKB) is a free access database that contains the genomic and proteomic information of 74 fully sequenced cyanobacterial genomes belonging to seven orders. The database also contains tools for sequence analysis. The Species report and the gene report provide details about each species and gene (including sequence features and gene ontology annotations) respectively. The database also includes cyanoBLAST, an advanced tool that facilitates comparative analysis, among cyanobacterial genomes and genomes of E. coli (prokaryote) and Arabidopsis (eukaryote). The database is developed and maintained by the Sub-Distributed Informatics Centre (sponsored by the Department of Biotechnology, Govt. of India) of the National Facility for Marine Cyanobacteria, a facility dedicated to marine cyanobacterial research. CKB is freely available at http://nfmc.res.in/ckb/index.html.
Collapse
Affiliation(s)
- Arul Prakasam Peter
- National Facility for Marine Cyanobacteria, Sub-Distributed Bioinformatics Centre (sponsored by Department of Biotechnology, Govt. of India), Department of Marine Biotechnology, School of Marine Sciences, Bharathidasan University, Tiruchirappalli, Tamil Nadu, India
| | - Karthick Lakshmanan
- National Facility for Marine Cyanobacteria, Sub-Distributed Bioinformatics Centre (sponsored by Department of Biotechnology, Govt. of India), Department of Marine Biotechnology, School of Marine Sciences, Bharathidasan University, Tiruchirappalli, Tamil Nadu, India
| | - Shylajanaciyar Mohandass
- National Facility for Marine Cyanobacteria, Sub-Distributed Bioinformatics Centre (sponsored by Department of Biotechnology, Govt. of India), Department of Marine Biotechnology, School of Marine Sciences, Bharathidasan University, Tiruchirappalli, Tamil Nadu, India
| | - Sangeetha Varadharaj
- National Facility for Marine Cyanobacteria, Sub-Distributed Bioinformatics Centre (sponsored by Department of Biotechnology, Govt. of India), Department of Marine Biotechnology, School of Marine Sciences, Bharathidasan University, Tiruchirappalli, Tamil Nadu, India
| | - Sivasudha Thilagar
- Department of Environmental Biotechnology, Bharathidasan University, Tiruchirappalli, Tamil Nadu, India
| | | | - Prabaharan Dharmar
- National Facility for Marine Cyanobacteria, Sub-Distributed Bioinformatics Centre (sponsored by Department of Biotechnology, Govt. of India), Department of Marine Biotechnology, School of Marine Sciences, Bharathidasan University, Tiruchirappalli, Tamil Nadu, India
| | - Subramanian Gopalakrishnan
- National Facility for Marine Cyanobacteria, Sub-Distributed Bioinformatics Centre (sponsored by Department of Biotechnology, Govt. of India), Department of Marine Biotechnology, School of Marine Sciences, Bharathidasan University, Tiruchirappalli, Tamil Nadu, India
| | - Uma Lakshmanan
- National Facility for Marine Cyanobacteria, Sub-Distributed Bioinformatics Centre (sponsored by Department of Biotechnology, Govt. of India), Department of Marine Biotechnology, School of Marine Sciences, Bharathidasan University, Tiruchirappalli, Tamil Nadu, India
- * E-mail:
| |
Collapse
|
6
|
Elhai J. Highly Iterated Palindromic Sequences (HIPs) and Their Relationship to DNA Methyltransferases. Life (Basel) 2015; 5:921-48. [PMID: 25789551 PMCID: PMC4390886 DOI: 10.3390/life5010921] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/01/2015] [Revised: 02/24/2015] [Accepted: 03/09/2015] [Indexed: 11/16/2022] Open
Abstract
The sequence GCGATCGC (Highly Iterated Palindrome, HIP1) is commonly found in high frequency in cyanobacterial genomes. An important clue to its function may be the presence of two orphan DNA methyltransferases that recognize internal sequences GATC and CGATCG. An examination of genomes from 97 cyanobacteria, both free-living and obligate symbionts, showed that there are exceptional cases in which HIP1 is at a low frequency or nearly absent. In some of these cases, it appears to have been replaced by a different GC-rich palindromic sequence, alternate HIPs. When HIP1 is at a high frequency, GATC- and CGATCG-specific methyltransferases are generally present in the genome. When an alternate HIP is at high frequency, a methyltransferase specific for that sequence is present. The pattern of 1-nt deviations from HIP1 sequences is biased towards the first and last nucleotides, i.e., those distinguish CGATCG from HIP1. Taken together, the results point to a role of DNA methylation in the creation or functioning of HIP sites. A model is presented that postulates the existence of a GmeC-dependent mismatch repair system whose activity creates and maintains HIP sequences.
Collapse
Affiliation(s)
- Jeff Elhai
- Center for the Study of Biological Complexity, Virginia Commonwealth University, Richmond, VA 23284, USA.
| |
Collapse
|
7
|
Taton A, Unglaub F, Wright NE, Zeng WY, Paz-Yepes J, Brahamsha B, Palenik B, Peterson TC, Haerizadeh F, Golden SS, Golden JW. Broad-host-range vector system for synthetic biology and biotechnology in cyanobacteria. Nucleic Acids Res 2014; 42:e136. [PMID: 25074377 PMCID: PMC4176158 DOI: 10.1093/nar/gku673] [Citation(s) in RCA: 104] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/02/2022] Open
Abstract
Inspired by the developments of synthetic biology and the need for improved genetic tools to exploit cyanobacteria for the production of renewable bioproducts, we developed a versatile platform for the construction of broad-host-range vector systems. This platform includes the following features: (i) an efficient assembly strategy in which modules released from 3 to 4 donor plasmids or produced by polymerase chain reaction are assembled by isothermal assembly guided by short GC-rich overlap sequences. (ii) A growing library of molecular devices categorized in three major groups: (a) replication and chromosomal integration; (b) antibiotic resistance; (c) functional modules. These modules can be assembled in different combinations to construct a variety of autonomously replicating plasmids and suicide plasmids for gene knockout and knockin. (iii) A web service, the CYANO-VECTOR assembly portal, which was built to organize the various modules, facilitate the in silico construction of plasmids, and encourage the use of this system. This work also resulted in the construction of an improved broad-host-range replicon derived from RSF1010, which replicates in several phylogenetically distinct strains including a new experimental model strain Synechocystis sp. WHSyn, and the characterization of nine antibiotic cassettes, four reporter genes, four promoters, and a ribozyme-based insulator in several diverse cyanobacterial strains.
Collapse
Affiliation(s)
- Arnaud Taton
- Division of Biological Sciences, University of California San Diego, 9500 Gilman Dr., La Jolla, CA 92093, USA
| | - Federico Unglaub
- Division of Biological Sciences, University of California San Diego, 9500 Gilman Dr., La Jolla, CA 92093, USA
| | - Nicole E Wright
- Division of Biological Sciences, University of California San Diego, 9500 Gilman Dr., La Jolla, CA 92093, USA
| | - Wei Yue Zeng
- Division of Biological Sciences, University of California San Diego, 9500 Gilman Dr., La Jolla, CA 92093, USA
| | - Javier Paz-Yepes
- Scripps Institution of Oceanography, University of California San Diego, 9500 Gilman Dr., La Jolla, CA 92093, USA Institut de Biologie de I'Ecole Normale Supérieure, CNRS, UMR 8197, 46 rue d'Ulm, 75230 Paris, France
| | - Bianca Brahamsha
- Scripps Institution of Oceanography, University of California San Diego, 9500 Gilman Dr., La Jolla, CA 92093, USA
| | - Brian Palenik
- Scripps Institution of Oceanography, University of California San Diego, 9500 Gilman Dr., La Jolla, CA 92093, USA
| | - Todd C Peterson
- Synthetic Biology Division, Life Technologies Corporation, 5791 Van Allen Way, Carlsbad, CA 92008, USA
| | - Farzad Haerizadeh
- Synthetic Biology Division, Life Technologies Corporation, 5791 Van Allen Way, Carlsbad, CA 92008, USA
| | - Susan S Golden
- Division of Biological Sciences, University of California San Diego, 9500 Gilman Dr., La Jolla, CA 92093, USA
| | - James W Golden
- Division of Biological Sciences, University of California San Diego, 9500 Gilman Dr., La Jolla, CA 92093, USA
| |
Collapse
|
8
|
Subcellular localization and clues for the function of the HetN factor influencing heterocyst distribution in Anabaena sp. strain PCC 7120. J Bacteriol 2014; 196:3452-60. [PMID: 25049089 DOI: 10.1128/jb.01922-14] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
In the filamentous cyanobacterium Anabaena sp. strain PCC 7120, heterocysts are formed in the absence of combined nitrogen, following a specific distribution pattern along the filament. The PatS and HetN factors contribute to the heterocyst pattern by inhibiting the formation of consecutive heterocysts. Thus, inactivation of any of these factors produces the multiple contiguous heterocyst (Mch) phenotype. Upon N stepdown, a HetN protein with its C terminus fused to a superfolder version of green fluorescent protein (sf-GFP) or to GFP-mut2 was observed, localized first throughout the whole area of differentiating cells and later specifically on the peripheries and in the polar regions of mature heterocysts, coinciding with the location of the thylakoids. Polar localization required an N-terminal stretch comprising residues 2 to 27 that may represent an unconventional signal peptide. Anabaena strains expressing a version of HetN lacking this fragment from a mutant gene placed at the native hetN locus exhibited a mild Mch phenotype. In agreement with previous results, deletion of an internal ERGSGR sequence, which is identical to the C-terminal sequence of PatS, also led to the Mch phenotype. The subcellular localization in heterocysts of fluorescence resulting from the fusion of GFP to the C terminus of HetN suggests that a full HetN protein is present in these cells. Furthermore, the full HetN protein is more conserved among cyanobacteria than the internal ERGSGR sequence. These observations suggest that HetN anchored to thylakoid membranes in heterocysts may serve a function besides that of generating a regulatory (ERGSGR) peptide.
Collapse
|
9
|
Hernández-Prieto MA, Semeniuk TA, Futschik ME. Toward a systems-level understanding of gene regulatory, protein interaction, and metabolic networks in cyanobacteria. Front Genet 2014; 5:191. [PMID: 25071821 PMCID: PMC4079066 DOI: 10.3389/fgene.2014.00191] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2014] [Accepted: 06/11/2014] [Indexed: 12/21/2022] Open
Abstract
Cyanobacteria are essential primary producers in marine ecosystems, playing an important role in both carbon and nitrogen cycles. In the last decade, various genome sequencing and metagenomic projects have generated large amounts of genetic data for cyanobacteria. This wealth of data provides researchers with a new basis for the study of molecular adaptation, ecology and evolution of cyanobacteria, as well as for developing biotechnological applications. It also facilitates the use of multiplex techniques, i.e., expression profiling by high-throughput technologies such as microarrays, RNA-seq, and proteomics. However, exploration and analysis of these data is challenging, and often requires advanced computational methods. Also, they need to be integrated into our existing framework of knowledge to use them to draw reliable biological conclusions. Here, systems biology provides important tools. Especially, the construction and analysis of molecular networks has emerged as a powerful systems-level framework, with which to integrate such data, and to better understand biological relevant processes in these organisms. In this review, we provide an overview of the advances and experimental approaches undertaken using multiplex data from genomic, transcriptomic, proteomic, and metabolomic studies in cyanobacteria. Furthermore, we summarize currently available web-based tools dedicated to cyanobacteria, i.e., CyanoBase, CyanoEXpress, ProPortal, Cyanorak, CyanoBIKE, and CINPER. Finally, we present a case study for the freshwater model cyanobacteria, Synechocystis sp. PCC6803, to show the power of meta-analysis, and the potential to extrapolate acquired knowledge to the ecologically important marine cyanobacteria genus, Prochlorococcus.
Collapse
Affiliation(s)
| | - Trudi A Semeniuk
- Systems Biology and Bioinformatics Laboratory, IBB-CBME, University of Algarve Faro, Portugal
| | - Matthias E Futschik
- Systems Biology and Bioinformatics Laboratory, IBB-CBME, University of Algarve Faro, Portugal ; Centre of Marine Sciences, University of Algarve Faro, Portugal
| |
Collapse
|
10
|
Altman T, Travers M, Kothari A, Caspi R, Karp PD. A systematic comparison of the MetaCyc and KEGG pathway databases. BMC Bioinformatics 2013; 14:112. [PMID: 23530693 PMCID: PMC3665663 DOI: 10.1186/1471-2105-14-112] [Citation(s) in RCA: 101] [Impact Index Per Article: 8.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2012] [Accepted: 03/04/2013] [Indexed: 01/06/2023] Open
Abstract
BACKGROUND The MetaCyc and KEGG projects have developed large metabolic pathway databases that are used for a variety of applications including genome analysis and metabolic engineering. We present a comparison of the compound, reaction, and pathway content of MetaCyc version 16.0 and a KEGG version downloaded on Feb-27-2012 to increase understanding of their relative sizes, their degree of overlap, and their scope. To assess their overlap, we must know the correspondences between compounds, reactions, and pathways in MetaCyc, and those in KEGG. We devoted significant effort to computational and manual matching of these entities, and we evaluated the accuracy of the correspondences. RESULTS KEGG contains 179 module pathways versus 1,846 base pathways in MetaCyc; KEGG contains 237 map pathways versus 296 super pathways in MetaCyc. KEGG pathways contain 3.3 times as many reactions on average as do MetaCyc pathways, and the databases employ different conceptualizations of metabolic pathways. KEGG contains 8,692 reactions versus 10,262 for MetaCyc. 6,174 KEGG reactions are components of KEGG pathways versus 6,348 for MetaCyc. KEGG contains 16,586 compounds versus 11,991 for MetaCyc. 6,912 KEGG compounds act as substrates in KEGG reactions versus 8,891 for MetaCyc. MetaCyc contains a broader set of database attributes than does KEGG, such as relationships from a compound to enzymes that it regulates, identification of spontaneous reactions, and the expected taxonomic range of metabolic pathways. MetaCyc contains many pathways not found in KEGG, from plants, fungi, metazoa, and actinobacteria; KEGG contains pathways not found in MetaCyc, for xenobiotic degradation, glycan metabolism, and metabolism of terpenoids and polyketides. MetaCyc contains fewer unbalanced reactions, which facilitates metabolic modeling such as using flux-balance analysis. MetaCyc includes generic reactions that may be instantiated computationally. CONCLUSIONS KEGG contains significantly more compounds than does MetaCyc, whereas MetaCyc contains significantly more reactions and pathways than does KEGG, in particular KEGG modules are quite incomplete. The number of reactions occurring in pathways in the two DBs are quite similar.
Collapse
Affiliation(s)
- Tomer Altman
- Bioinformatics Research Group, SRI International, Menlo Park, USA
| | | | | | | | | |
Collapse
|
11
|
Elhai J, Liu H, Taton A. Detection of horizontal transfer of individual genes by anomalous oligomer frequencies. BMC Genomics 2012; 13:245. [PMID: 22702893 PMCID: PMC3497702 DOI: 10.1186/1471-2164-13-245] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2011] [Accepted: 05/18/2012] [Indexed: 11/10/2022] Open
Abstract
Background Understanding the history of life requires that we understand the transfer of genetic material across phylogenetic boundaries. Detecting genes that were acquired by means other than vertical descent is a basic step in that process. Detection by discordant phylogenies is computationally expensive and not always definitive. Many have used easily computed compositional features as an alternative procedure. However, different compositional methods produce different predictions, and the effectiveness of any method is not well established. Results The ability of octamer frequency comparisons to detect genes artificially seeded in cyanobacterial genomes was markedly increased by using as a training set those genes that are highly conserved over all bacteria. Using a subset of octamer frequencies in such tests also increased effectiveness, but this depended on the specific target genome and the source of the contaminating genes. The presence of high frequency octamers and the GC content of the contaminating genes were important considerations. A method comprising best practices from these tests was devised, the Core Gene Similarity (CGS) method, and it performed better than simple octamer frequency analysis, codon bias, or GC contrasts in detecting seeded genes or naturally occurring transposons. From a comparison of predictions with phylogenetic trees, it appears that the effectiveness of the method is confined to horizontal transfer events that have occurred recently in evolutionary time. Conclusions The CGS method may be an improvement over existing surrogate methods to detect genes of foreign origin.
Collapse
Affiliation(s)
- Jeff Elhai
- Center for the Study of Biological Complexity, Virginia Commonwealth University, Richmond, VA 23284, USA.
| | | | | |
Collapse
|
12
|
Jamil HM. A natural language interface plug-in for cooperative query answering in biological databases. BMC Genomics 2012; 13 Suppl 3:S4. [PMID: 22759613 PMCID: PMC3323828 DOI: 10.1186/1471-2164-13-s3-s4] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
Background One of the many unique features of biological databases is that the mere existence of a ground data item is not always a precondition for a query response. It may be argued that from a biologist's standpoint, queries are not always best posed using a structured language. By this we mean that approximate and flexible responses to natural language like queries are well suited for this domain. This is partly due to biologists' tendency to seek simpler interfaces and partly due to the fact that questions in biology involve high level concepts that are open to interpretations computed using sophisticated tools. In such highly interpretive environments, rigidly structured databases do not always perform well. In this paper, our goal is to propose a semantic correspondence plug-in to aid natural language query processing over arbitrary biological database schema with an aim to providing cooperative responses to queries tailored to users' interpretations. Results Natural language interfaces for databases are generally effective when they are tuned to the underlying database schema and its semantics. Therefore, changes in database schema become impossible to support, or a substantial reorganization cost must be absorbed to reflect any change. We leverage developments in natural language parsing, rule languages and ontologies, and data integration technologies to assemble a prototype query processor that is able to transform a natural language query into a semantically equivalent structured query over the database. We allow knowledge rules and their frequent modifications as part of the underlying database schema. The approach we adopt in our plug-in overcomes some of the serious limitations of many contemporary natural language interfaces, including support for schema modifications and independence from underlying database schema. Conclusions The plug-in introduced in this paper is generic and facilitates connecting user selected natural language interfaces to arbitrary databases using a semantic description of the intended application. We demonstrate the feasibility of our approach with a practical example.
Collapse
Affiliation(s)
- Hasan M Jamil
- Department of Computer Science, Wayne State University, Michigan, USA.
| |
Collapse
|
13
|
Selection of suitable reference genes for RT-qPCR analyses in cyanobacteria. PLoS One 2012; 7:e34983. [PMID: 22496882 PMCID: PMC3319621 DOI: 10.1371/journal.pone.0034983] [Citation(s) in RCA: 108] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2012] [Accepted: 03/12/2012] [Indexed: 02/02/2023] Open
Abstract
Cyanobacteria are a group of photosynthetic prokaryotes that have a diverse morphology, minimal nutritional requirements and metabolic plasticity that has made them attractive organisms to use in biotechnological applications. The use of these organisms as cell factories requires the knowledge of their physiology and metabolism at a systems level. For the quantification of gene transcripts real-time quantitative polymerase chain reaction (RT-qPCR) is the standard technique. However, to obtain reliable RT-qPCR results the use and validation of reference genes is mandatory. Towards this goal we have selected and analyzed twelve candidate reference genes from three morphologically distinct cyanobacteria grown under routinely used laboratory conditions. The six genes exhibiting less variation in each organism were evaluated in terms of their expression stability using geNorm, NormFinder and BestKeeper. In addition, the minimum number of reference genes required for normalization was determined. Based on the three algorithms, we provide a list of genes for cyanobacterial RT-qPCR data normalization. To our knowledge, this is the first work on the validation of reference genes for cyanobacteria constituting a valuable starting point for future works.
Collapse
|
14
|
Holford ME, McCusker JP, Cheung KH, Krauthammer M. A semantic web framework to integrate cancer omics data with biological knowledge. BMC Bioinformatics 2012; 13 Suppl 1:S10. [PMID: 22373303 PMCID: PMC3471346 DOI: 10.1186/1471-2105-13-s1-s10] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022] Open
Abstract
BACKGROUND The RDF triple provides a simple linguistic means of describing limitless types of information. Triples can be flexibly combined into a unified data source we call a semantic model. Semantic models open new possibilities for the integration of variegated biological data. We use Semantic Web technology to explicate high throughput clinical data in the context of fundamental biological knowledge. We have extended Corvus, a data warehouse which provides a uniform interface to various forms of Omics data, by providing a SPARQL endpoint. With the querying and reasoning tools made possible by the Semantic Web, we were able to explore quantitative semantic models retrieved from Corvus in the light of systematic biological knowledge. RESULTS For this paper, we merged semantic models containing genomic, transcriptomic and epigenomic data from melanoma samples with two semantic models of functional data - one containing Gene Ontology (GO) data, the other, regulatory networks constructed from transcription factor binding information. These two semantic models were created in an ad hoc manner but support a common interface for integration with the quantitative semantic models. Such combined semantic models allow us to pose significant translational medicine questions. Here, we study the interplay between a cell's molecular state and its response to anti-cancer therapy by exploring the resistance of cancer cells to Decitabine, a demethylating agent. CONCLUSIONS We were able to generate a testable hypothesis to explain how Decitabine fights cancer - namely, that it targets apoptosis-related gene promoters predominantly in Decitabine-sensitive cell lines, thus conveying its cytotoxic effect by activating the apoptosis pathway. Our research provides a framework whereby similar hypotheses can be developed easily.
Collapse
|
15
|
Chen Y, Holtman CK, Taton A, Golden SS. Functional Analysis of the Synechococcus elongatus PCC 7942 Genome. FUNCTIONAL GENOMICS AND EVOLUTION OF PHOTOSYNTHETIC SYSTEMS 2012. [DOI: 10.1007/978-94-007-1533-2_5] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
|
16
|
Denning EJ, Priyakumar UD, Nilsson L, MacKerell AD. Impact of 2'-hydroxyl sampling on the conformational properties of RNA: update of the CHARMM all-atom additive force field for RNA. J Comput Chem 2011; 32:1929-43. [PMID: 21469161 PMCID: PMC3082605 DOI: 10.1002/jcc.21777] [Citation(s) in RCA: 313] [Impact Index Per Article: 22.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2010] [Revised: 01/24/2011] [Accepted: 01/30/2011] [Indexed: 01/02/2023]
Abstract
Here, we present an update of the CHARMM27 all-atom additive force field for nucleic acids that improves the treatment of RNA molecules. The original CHARMM27 force field parameters exhibit enhanced Watson-Crick base pair opening which is not consistent with experiment, whereas analysis of molecular dynamics (MD) simulations show the 2'-hydroxyl moiety to almost exclusively sample the O3' orientation. Quantum mechanical (QM) studies of RNA related model compounds indicate the energy minimum associated with the O3' orientation to be too favorable, consistent with the MD results. Optimization of the dihedral parameters dictating the energy of the 2'-hydroxyl proton targeting the QM data yielded several parameter sets, which sample both the base and O3' orientations of the 2'-hydroxyl to varying degrees. Selection of the final dihedral parameters was based on reproduction of hydration behavior as related to a survey of crystallographic data and better agreement with experimental NMR J-coupling values. Application of the model, designated CHARMM36, to a collection of canonical and noncanonical RNA molecules reveals overall improved agreement with a range of experimental observables as compared to CHARMM27. The results also indicate the sensitivity of the conformational heterogeneity of RNA to the orientation of the 2'-hydroxyl moiety and support a model whereby the 2'-hydroxyl can enhance the probability of conformational transitions in RNA.
Collapse
Affiliation(s)
- Elizabeth J. Denning
- Department of Pharmaceutical Sciences, School of Pharmacy, University
of Maryland, Baltimore, MD 21201
| | - U. Deva Priyakumar
- Department of Pharmaceutical Sciences, School of Pharmacy, University
of Maryland, Baltimore, MD 21201
| | - Lennart Nilsson
- Department of Pharmaceutical Sciences, School of Pharmacy, University
of Maryland, Baltimore, MD 21201
| | - Alexander D. MacKerell
- Department of Pharmaceutical Sciences, School of Pharmacy, University
of Maryland, Baltimore, MD 21201
| |
Collapse
|
17
|
Elhai J. Humans, Computers, and the Route to Biological Insights: Regaining Our Capacity for Surprise. J Comput Biol 2011; 18:867-78. [DOI: 10.1089/cmb.2010.0194] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Affiliation(s)
- Jeff Elhai
- Center for the Study of Biological Complexity, Virginia Commonwealth University, Richmond, Virginia
| |
Collapse
|
18
|
Holmquist PC, Holmquist GP, Summers ML. Comparing binding site information to binding affinity reveals that Crp/DNA complexes have several distinct binding conformers. Nucleic Acids Res 2011; 39:6813-24. [PMID: 21586590 PMCID: PMC3159480 DOI: 10.1093/nar/gkr369] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/24/2023] Open
Abstract
We show that the cAMP receptor protein (Crp) binds to DNA as several different conformers. This situation has precluded discovering a high correlation between any sequence property and binding affinity for proteins that bend DNA. Experimentally quantified affinities of Synechocystis sp. PCC 6803 cAMP receptor protein (SyCrp1), the Escherichia coli Crp (EcCrp, also CAP) and DNA were analyzed to mathematically describe, and make human-readable, the relationship of DNA sequence and binding affinity in a given system. Here, sequence logos and weight matrices were built to model SyCrp1 binding sequences. Comparing the weight matrix model to binding affinity revealed several distinct binding conformations. These Crp/DNA conformations were asymmetrical (non-palindromic).
Collapse
Affiliation(s)
- Peter C Holmquist
- Department of Biology, California State University Northridge, 18111 Nordhoff St. Northridge, CA 91330, USA.
| | | | | |
Collapse
|
19
|
An overview of the BioExtract Server: a distributed, Web-based system for genomic analysis. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2011. [PMID: 20865520 DOI: 10.1007/978-1-4419-5913-3_41] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register]
Abstract
Genome research is becoming increasingly dependent on access to multiple, distributed data sources, and bioinformatic tools. The importance of integration across distributed databases and Web services will continue to grow as the number of requisite resources expands. Use of bioinformatic workflows has seen considerable growth in recent years as scientific research becomes increasingly dependent on the analysis of large sets of data and the use of distributed resources. The BioExtract Server (http://bioextract.org) is a Web-based system designed to aid researchers in the analysis of distributed genomic data by providing a platform to facilitate the creation of bioinformatic workflows. Scientific workflows are created within the system by recording the analytic tasks preformed by researchers. These steps may include querying multiple data sources, saving query results as searchable data extracts, and executing local and Web-accessible analytic tools. The series of recorded tasks can be saved as a computational workflow simply by providing a name and description.
Collapse
|
20
|
Liu HL, Zhu J. Analysis of the 3' ends of tRNA as the cause of insertion sites of foreign DNA in Prochlorococcus. J Zhejiang Univ Sci B 2011; 11:708-18. [PMID: 20803775 DOI: 10.1631/jzus.b0900417] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
The purpose of this study was to investigate the characteristics of transfer RNA (tRNA) responsible for the association between tRNA genes and genes of apparently foreign origin (genomic islands) in five high-light adapted Prochlorococcus strains. Both bidirectional best BLASTP (basic local alignment search tool for proteins) search and the conservation of gene order against each other were utilized to identify genomic islands, and 7 genomic islands were found to be immediately adjacent to tRNAs in Prochlorococcus marinus AS9601, 11 in P. marinus MIT9515, 8 in P. marinus MED4, 6 in P. marinus MIT9301, and 6 in P. marinus MIT9312. Monte Carlo simulation showed that tRNA genes are hotspots for the integration of genomic islands in Prochlorococcus strains. The tRNA genes associated with genomic islands showed the following characteristics: (1) the association was biased towards a specific subset of all iso-accepting tRNA genes; (2) the codon usages of genes within genomic islands appear to be unrelated to the codons recognized by associated tRNAs; and, (3) the majority of the 3' ends of associated tRNAs lack CCA ends. These findings contradict previous hypotheses concerning the molecular basis for the frequent use of tRNA as the insertion site for foreign genetic materials. The analysis of a genomic island associated with a tRNA-Asn gene in P. marinus MIT9301 suggests that foreign genetic material is inserted into the host genomes by means of site-specific recombination, with the 3' end of the tRNA as the target, and during the process, a direct repeat of the 3' end sequence of a boundary tRNA (namely, a scar from the process of insertion) is formed elsewhere in the genomic island. Through the analysis of the sequences of these targets, it can be concluded that a region characterized by both high GC content and a palindromic structure is the preferred insertion site.
Collapse
Affiliation(s)
- Hai-Lan Liu
- Institute of Bioinformatics, College of Agriculture and Biotechnology, Zhejiang University, Hangzhou 310029, China
| | | |
Collapse
|
21
|
|