1
|
Deciphering microbial gene function using natural language processing. Nat Commun 2022; 13:5731. [PMID: 36175448 PMCID: PMC9523054 DOI: 10.1038/s41467-022-33397-4] [Citation(s) in RCA: 13] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2022] [Accepted: 09/16/2022] [Indexed: 11/08/2022] Open
Abstract
Revealing the function of uncharacterized genes is a fundamental challenge in an era of ever-increasing volumes of sequencing data. Here, we present a concept for tackling this challenge using deep learning methodologies adopted from natural language processing (NLP). We repurpose NLP algorithms to model "gene semantics" based on a biological corpus of more than 360 million microbial genes within their genomic context. We use the language models to predict functional categories for 56,617 genes and find that out of 1369 genes associated with recently discovered defense systems, 98% are inferred correctly. We then systematically evaluate the "discovery potential" of different functional categories, pinpointing those with the most genes yet to be characterized. Finally, we demonstrate our method's ability to discover systems associated with microbial interaction and defense. Our results highlight that combining microbial genomics and language models is a promising avenue for revealing gene functions in microbes.
Collapse
|
2
|
Furuta Y, Miura F, Ichise T, Nakayama SMM, Ikenaka Y, Zorigt T, Tsujinouchi M, Ishizuka M, Ito T, Higashi H. A GCDGC-specific DNA (cytosine-5) methyltransferase that methylates the GCWGC sequence on both strands and the GCSGC sequence on one strand. PLoS One 2022; 17:e0265225. [PMID: 35312710 PMCID: PMC8936443 DOI: 10.1371/journal.pone.0265225] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/24/2021] [Accepted: 02/24/2022] [Indexed: 11/18/2022] Open
Abstract
5-Methylcytosine is one of the major epigenetic marks of DNA in living organisms. Some bacterial species possess DNA methyltransferases that modify cytosines on both strands to produce fully-methylated sites or on either strand to produce hemi-methylated sites. In this study, we characterized a DNA methyltransferase that produces two sequences with different methylation patterns: one methylated on both strands and another on one strand. M.BatI is the orphan DNA methyltransferase of Bacillus anthracis coded in one of the prophages on the chromosome. Analysis of M.BatI modified DNA by bisulfite sequencing revealed that the enzyme methylates the first cytosine in sequences of 5ʹ-GCAGC-3ʹ, 5ʹ-GCTGC-3ʹ, and 5ʹ-GCGGC-3ʹ, but not of 5ʹ-GCCGC-3ʹ. This resulted in the production of fully-methylated 5ʹ-GCWGC-3ʹ and hemi-methylated 5ʹ-GCSGC-3ʹ. M.BatI also showed toxicity when expressed in E. coli, which was caused by a mechanism other than DNA modification activity. Homologs of M.BatI were found in other Bacillus species on different prophage like regions, suggesting the spread of the gene by several different phages. The discovery of the DNA methyltransferase with unique modification target specificity suggested unrevealed diversity of target sequences of bacterial cytosine DNA methyltransferase.
Collapse
Affiliation(s)
- Yoshikazu Furuta
- Division of Infection and Immunity, International Institute for Zoonosis Control, Hokkaido University, Sapporo, Japan
- * E-mail:
| | - Fumihito Miura
- Department of Biochemistry, Kyushu University Graduate School of Medical Sciences, Fukuoka, Japan
| | - Takahiro Ichise
- Laboratory of Toxicology, Department of Environmental Veterinary Sciences, School of Veterinary Medicine, Hokkaido University, Sapporo, Japan
| | - Shouta M. M. Nakayama
- Laboratory of Toxicology, Department of Environmental Veterinary Sciences, School of Veterinary Medicine, Hokkaido University, Sapporo, Japan
| | - Yoshinori Ikenaka
- Laboratory of Toxicology, Department of Environmental Veterinary Sciences, School of Veterinary Medicine, Hokkaido University, Sapporo, Japan
- Water Research Group, Unit for Environmental Sciences and Management, North-West University, Potchefstroom, South Africa
| | - Tuvshinzaya Zorigt
- Division of Infection and Immunity, International Institute for Zoonosis Control, Hokkaido University, Sapporo, Japan
| | - Mai Tsujinouchi
- Division of Infection and Immunity, International Institute for Zoonosis Control, Hokkaido University, Sapporo, Japan
| | - Mayumi Ishizuka
- Laboratory of Toxicology, Department of Environmental Veterinary Sciences, School of Veterinary Medicine, Hokkaido University, Sapporo, Japan
| | - Takashi Ito
- Department of Biochemistry, Kyushu University Graduate School of Medical Sciences, Fukuoka, Japan
| | - Hideaki Higashi
- Division of Infection and Immunity, International Institute for Zoonosis Control, Hokkaido University, Sapporo, Japan
| |
Collapse
|
3
|
Structural and functional diversity among Type III restriction-modification systems that confer host DNA protection via methylation of the N4 atom of cytosine. PLoS One 2021; 16:e0253267. [PMID: 34228724 PMCID: PMC8259958 DOI: 10.1371/journal.pone.0253267] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2021] [Accepted: 06/01/2021] [Indexed: 11/19/2022] Open
Abstract
We report a new subgroup of Type III Restriction-Modification systems that use m4C methylation for host protection. Recognition specificities for six such systems, each recognizing a novel motif, have been determined using single molecule real-time DNA sequencing. In contrast to all previously characterized Type III systems which modify adenine to m6A, protective methylation of the host genome in these new systems is achieved by the N4-methylation of a cytosine base in one strand of an asymmetric 4 to 6 base pair recognition motif. Type III systems are heterotrimeric enzyme complexes containing a single copy of an ATP-dependent restriction endonuclease-helicase (Res) and a dimeric DNA methyltransferase (Mod). The Type III Mods are beta-class amino-methyltransferases, examples of which form either N6-methyl adenine or N4-methyl cytosine in Type II RM systems. The Type III m4C Mod and Res proteins are diverged, suggesting ancient origin or that m4C modification has arisen from m6A MTases multiple times in diverged lineages. Two of the systems, from thermophilic organisms, required expression of both Mod and Res to efficiently methylate an E. coli host, unlike previous findings that Mod alone is proficient at modification, suggesting that the division of labor between protective methylation and restriction activities is atypical in these systems. Two of the characterized systems, and many homologous putative systems, appear to include a third protein; a conserved putative helicase/ATPase subunit of unknown function and located 5’ of the mod gene. The function of this additional ATPase is not yet known, but close homologs co-localize with the typical Mod and Res genes in hundreds of putative Type III systems. Our findings demonstrate a rich diversity within Type III RM systems.
Collapse
|
4
|
An investigation of Burkholderia cepacia complex methylomes via SMRT sequencing and mutant analysis. J Bacteriol 2021; 203:e0068320. [PMID: 33753468 DOI: 10.1128/jb.00683-20] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022] Open
Abstract
Bacterial genomes can be methylated at particular motifs by methyltransferases (M). This DNA modification allows restriction endonucleases (R) to discriminate between self and foreign DNA. While the accepted primary function of such restriction modification (RM) systems is to degrade incoming foreign DNA, other roles of RM systems and lone R or M components have been found in genome protection, stability and the regulation of various phenotypes. The Burkholderia cepacia complex (Bcc) is a group of closely related opportunistic pathogens with biotechnological potential. Here, we constructed and analysed mutants lacking various RM components in the clinical Bcc isolate Burkholderia cenocepacia H111 and used SMRT sequencing of single mutants to assign the B. cenocepacia H111 Ms to their cognate motifs. DNA methylation is shown to affect biofilm formation, cell shape, motility, siderophore production and membrane vesicle production. Moreover, DNA methylation had a large effect on the maintenance of the Bcc virulence megaplasmid pC3. Our data also suggest that the gp51 M-encoding gene, which is essential in H111 and is located within a prophage, is required for maintaining the bacteriophage in a lysogenic state, thereby ensuring a constant, low level of phage production within the bacterial population.ImportanceWhile genome sequence determines an organism's proteins, methylation of the nucleotides themselves can confer additional properties. In bacteria, Ms modify specific nucleotide motifs to allow discrimination of 'self' from 'non-self' DNA, e.g. from bacteriophages. Restriction enzymes detect 'non-self' methylation patterns and cut foreign DNA. Furthermore, methylation of promoter regions can influence gene expression and hence affect various phenotypes. In this study, we determined the methylated motifs of four strains from the Burkholderia cepacia complex of opportunistic pathogens. We deleted all genes encoding the restriction and modification components in one of these strains, Burkholderia cenocepacia H111. It is shown that DNA methylation affects various phenotypic traits, the most noteworthy being lysogenicity of a bacteriophage and maintenance of a virulence megaplasmid.
Collapse
|
5
|
Flodman K, Corrêa IR, Dai N, Weigele P, Xu SY. In vitro Type II Restriction of Bacteriophage DNA With Modified Pyrimidines. Front Microbiol 2020; 11:604618. [PMID: 33193286 PMCID: PMC7653180 DOI: 10.3389/fmicb.2020.604618] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2020] [Accepted: 10/05/2020] [Indexed: 01/09/2023] Open
Abstract
To counteract host-encoded restriction systems, bacteriophages (phages) incorporate modified bases in their genomes. For example, phages carry in their genomes modified pyrimidines such as 5-hydroxymethyl-cytosine (5hmC) in T4gt deficient in α- and β-glycosyltransferases, glucosylated-5-hydroxymethylcytosine (5gmC) in T4, 5-methylcytosine (5mC) in Xp12, and 5-hydroxymethyldeoxyuridine (5hmdU) in SP8. In this work we sequenced phage Xp12 and SP8 genomes and examined Type II restriction of T4gt, T4, Xp12, and SP8 phage DNAs. T4gt, T4, and Xp12 genomes showed resistance to 81.9% (186 out of 227 enzymes tested), 94.3% (214 out of 227 enzymes tested), and 89.9% (196 out of 218 enzymes tested), respectively, commercially available Type II restriction endonucleases (REases). The SP8 genome, however, was resistant to only ∼8.3% of these enzymes (17 out of 204 enzymes tested). SP8 DNA could be further modified by adenine DNA methyltransferases (MTases) such as M.Dam and M.EcoGII as well as a number of cytosine DNA MTases, such as CpG methylase. The 5hmdU base in SP8 DNA was phosphorylated by treatment with a 5hmdU DNA kinase to achieve ∼20% phosphorylated 5hmdU, resulting resistance or partially resistant to more Type II restriction. This work provides a convenient reference for molecular biologists working with modified pyrimidines and using REases. The genomic sequences of phage Xp12 and SP8 lay the foundation for further studies on genetic pathways for 5mC and 5hmdU DNA base modifications and for comparative phage genomics.
Collapse
Affiliation(s)
| | - Ivan R Corrêa
- New England Biolabs, Inc., Ipswich, MA, United States
| | - Nan Dai
- New England Biolabs, Inc., Ipswich, MA, United States
| | - Peter Weigele
- New England Biolabs, Inc., Ipswich, MA, United States
| | | |
Collapse
|
6
|
Flodman K, Tsai R, Xu MY, Corrêa IR, Copelas A, Lee YJ, Xu MQ, Weigele P, Xu SY. Type II Restriction of Bacteriophage DNA With 5hmdU-Derived Base Modifications. Front Microbiol 2019; 10:584. [PMID: 30984133 PMCID: PMC6449724 DOI: 10.3389/fmicb.2019.00584] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2019] [Accepted: 03/07/2019] [Indexed: 11/13/2022] Open
Abstract
To counteract bacterial defense systems, bacteriophages (phages) make extensive base modifications (substitutions) to block endonuclease restriction. Here we evaluated Type II restriction of three thymidine (T or 5-methyldeoxyuridine, 5mdU) modified phage genomes: Pseudomonas phage M6 with 5-(2-aminoethyl)deoxyuridine (5-NedU), Salmonella phage ViI (Vi1) with 5-(2-aminoethoxy)methyldeoxyuridine (5-NeOmdU) and Delftia phage phi W-14 (a.k.a. ΦW-14) with α-putrescinylthymidine (putT). Among >200 commercially available restriction endonucleases (REases) tested, phage M6, ViI, and phi W-14 genomic DNAs (gDNA) show resistance against 48.4, 71.0, and 68.8% of Type II restrictions, respectively. Inspection of the resistant sites indicates the presence of conserved dinucleotide TG or TC (TS, S=C, or G), implicating the specificity of TS sequence as the target that is converted to modified base in the genomes. We also tested a number of DNA methyltransferases (MTases) on these phage DNAs and found some MTases can fully or partially modify the DNA to confer more resistance to cleavage by REases. Phage M6 restriction fragments can be efficiently ligated by T4 DNA ligase. Phi W-14 restriction fragments show apparent reduced rate in E. coli exonuclease III degradation. This work extends previous studies that hypermodified T derived from 5hmdU provides additional resistance to host-encoded restrictions, in parallel to modified cytosines, guanine, and adenine in phage genomes. The results reported here provide a general guidance to use REases to map and clone phage DNA with hypermodified thymidine.
Collapse
Affiliation(s)
| | - Rebecca Tsai
- New England Biolabs, Inc., Ipswich, MA, United States
| | - Michael Y Xu
- New England Biolabs, Inc., Ipswich, MA, United States
| | - Ivan R Corrêa
- New England Biolabs, Inc., Ipswich, MA, United States
| | | | - Yan-Jiun Lee
- New England Biolabs, Inc., Ipswich, MA, United States
| | - Ming-Qun Xu
- New England Biolabs, Inc., Ipswich, MA, United States
| | - Peter Weigele
- New England Biolabs, Inc., Ipswich, MA, United States
| | | |
Collapse
|
7
|
Makart L, Gillis A, Hinnekens P, Mahillon J. A novel T4SS-mediated DNA transfer used by pXO16, a conjugative plasmid from Bacillus thuringiensis serovar israelensis. Environ Microbiol 2018; 20:1550-1561. [PMID: 29488309 DOI: 10.1111/1462-2920.14084] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2018] [Revised: 02/19/2018] [Accepted: 02/23/2018] [Indexed: 12/25/2022]
Abstract
The entomopathogenic Bacillus thuringiensis serovar israelensis displays peculiar conjugative transfer capabilities, accounted for by the large conjugative plasmid pXO16 (350 kb). The efficient and fast conjugative transfers are accompanied by a macroscopic aggregation of bacterial partners. Moreover, pXO16 has proven capable of effective mobilization and the retro-transfer of both mobilizable and 'non-mobilizable' plasmids. In this work, the aggregation phenomenon is shown to promote pXO16 transfer while not being mandatory for transfer. Transfer of pXO16 to B. thuringiensis recipient strains that do not display aggregation is observed as well, hence enlarging the previously defined host range. The use of variant calling analysis of transconjugants allowed for observation of up to 791 kb chromosomal regions mobilization. Previous analysis of pXO16 did not reveal any Type IV Secretion System (T4SS) homologs, which suggested the presence of an unusual conjugative system. A FtsK/SpOIIIE ATPase gene proved here to be necessary for conjugative transfer. Additionally, the analysis of natural restriction-modification systems in both conjugative partners gave credit to a ssDNA transfer mechanism. A 'transfer israelensis plasmid' (tip) region containing this ATPase gene was shown to code for other potential T4SS proteins, illustrating a conjugative system distantly related to the other known Gram-positive T4SSs.
Collapse
Affiliation(s)
- Lionel Makart
- Laboratory of Food and Environmental Microbiology, Earth and Life Institute, Université catholique de Louvain, Louvain-la-Neuve, B-1348, Belgium
| | - Annika Gillis
- Laboratory of Food and Environmental Microbiology, Earth and Life Institute, Université catholique de Louvain, Louvain-la-Neuve, B-1348, Belgium
| | - Pauline Hinnekens
- Laboratory of Food and Environmental Microbiology, Earth and Life Institute, Université catholique de Louvain, Louvain-la-Neuve, B-1348, Belgium
| | - Jacques Mahillon
- Laboratory of Food and Environmental Microbiology, Earth and Life Institute, Université catholique de Louvain, Louvain-la-Neuve, B-1348, Belgium
| |
Collapse
|
8
|
Bai H, Deng A, Liu S, Cui D, Qiu Q, Wang L, Yang Z, Wu J, Shang X, Zhang Y, Wen T. A Novel Tool for Microbial Genome Editing Using the Restriction-Modification System. ACS Synth Biol 2018; 7:98-106. [PMID: 28968490 DOI: 10.1021/acssynbio.7b00254] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/21/2023]
Abstract
Scarless genetic manipulation of genomes is an essential tool for biological research. The restriction-modification (R-M) system is a defense system in bacteria that protects against invading genomes on the basis of its ability to distinguish foreign DNA from self DNA. Here, we designed an R-M system-mediated genome editing (RMGE) technique for scarless genetic manipulation in different microorganisms. For bacteria with Type IV REase, an RMGE technique using the inducible DNA methyltransferase gene, bceSIIM (RMGE-bceSIIM), as the counter-selection cassette was developed to edit the genome of Escherichia coli. For bacteria without Type IV REase, an RMGE technique based on a restriction endonuclease (RMGE-mcrA) was established in Bacillus subtilis. These techniques were successfully used for gene deletion and replacement with nearly 100% counter-selection efficiencies, which were higher and more stable compared to conventional methods. Furthermore, precise point mutation without limiting sites was achieved in E. coli using RMGE-bceSIIM to introduce a single base mutation of A128C into the rpsL gene. In addition, the RMGE-mcrA technique was applied to delete the CAN1 gene in Saccharomyces cerevisiae DAY414 with 100% counter-selection efficiency. The effectiveness of the RMGE technique in E. coli, B. subtilis, and S. cerevisiae suggests the potential universal usefulness of this technique for microbial genome manipulation.
Collapse
Affiliation(s)
- Hua Bai
- CAS
Key Laboratory of Pathogenic Microbiology and Immunology, Institute
of Microbiology, Chinese Academy of Sciences, Beijing 100101, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Aihua Deng
- CAS
Key Laboratory of Pathogenic Microbiology and Immunology, Institute
of Microbiology, Chinese Academy of Sciences, Beijing 100101, China
| | - Shuwen Liu
- CAS
Key Laboratory of Pathogenic Microbiology and Immunology, Institute
of Microbiology, Chinese Academy of Sciences, Beijing 100101, China
| | - Di Cui
- CAS
Key Laboratory of Pathogenic Microbiology and Immunology, Institute
of Microbiology, Chinese Academy of Sciences, Beijing 100101, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Qidi Qiu
- CAS
Key Laboratory of Pathogenic Microbiology and Immunology, Institute
of Microbiology, Chinese Academy of Sciences, Beijing 100101, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Laiyou Wang
- CAS
Key Laboratory of Pathogenic Microbiology and Immunology, Institute
of Microbiology, Chinese Academy of Sciences, Beijing 100101, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Zhao Yang
- CAS
Key Laboratory of Pathogenic Microbiology and Immunology, Institute
of Microbiology, Chinese Academy of Sciences, Beijing 100101, China
| | - Jie Wu
- CAS
Key Laboratory of Pathogenic Microbiology and Immunology, Institute
of Microbiology, Chinese Academy of Sciences, Beijing 100101, China
| | - Xiuling Shang
- CAS
Key Laboratory of Pathogenic Microbiology and Immunology, Institute
of Microbiology, Chinese Academy of Sciences, Beijing 100101, China
| | - Yun Zhang
- CAS
Key Laboratory of Pathogenic Microbiology and Immunology, Institute
of Microbiology, Chinese Academy of Sciences, Beijing 100101, China
| | - Tingyi Wen
- CAS
Key Laboratory of Pathogenic Microbiology and Immunology, Institute
of Microbiology, Chinese Academy of Sciences, Beijing 100101, China
- Savaid
Medical School, University of Chinese Academy of Sciences, Beijing 100049, China
| |
Collapse
|
9
|
Toliusis P, Zaremba M, Silanskas A, Szczelkun MD, Siksnys V. CgII cleaves DNA using a mechanism distinct from other ATP-dependent restriction endonucleases. Nucleic Acids Res 2017; 45:8435-8447. [PMID: 28854738 PMCID: PMC5737866 DOI: 10.1093/nar/gkx580] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2017] [Accepted: 06/28/2017] [Indexed: 01/10/2023] Open
Abstract
The restriction endonuclease CglI from Corynebacterium glutamicum recognizes an asymmetric 5′-GCCGC-3′ site and cleaves the DNA 7 and 6/7 nucleotides downstream on the top and bottom DNA strands, respectively, in an NTP-hydrolysis dependent reaction. CglI is composed of two different proteins: an endonuclease (R.CglI) and a DEAD-family helicase-like ATPase (H.CglI). These subunits form a heterotetrameric complex with R2H2 stoichiometry. However, the R2H2·CglI complex has only one nuclease active site sufficient to cut one DNA strand suggesting that two complexes are required to introduce a double strand break. Here, we report studies to evaluate the DNA cleavage mechanism of CglI. Using one- and two-site circular DNA substrates we show that CglI does not require two sites on the same DNA for optimal catalytic activity. However, one-site linear DNA is a poor substrate, supporting a mechanism where CglI complexes must communicate along the one-dimensional DNA contour before cleavage is activated. Based on experimental data, we propose that adenosine triphosphate (ATP) hydrolysis by CglI produces translocation on DNA preferentially in a downstream direction from the target, although upstream translocation is also possible. Our results are consistent with a mechanism of CglI action that is distinct from that of other ATP-dependent restriction-modification enzymes.
Collapse
Affiliation(s)
- Paulius Toliusis
- Department of Protein-DNA Interactions, Institute of Biotechnology, Vilnius University, Sauletekio al. 7, LT-10257, Vilnius, Lithuania
| | - Mindaugas Zaremba
- Department of Protein-DNA Interactions, Institute of Biotechnology, Vilnius University, Sauletekio al. 7, LT-10257, Vilnius, Lithuania
| | - Arunas Silanskas
- Department of Protein-DNA Interactions, Institute of Biotechnology, Vilnius University, Sauletekio al. 7, LT-10257, Vilnius, Lithuania
| | - Mark D Szczelkun
- DNA-Protein Interactions Unit, School of Biochemistry, Biomedical Sciences Building, University of Bristol, Bristol, BS8 1TD, UK
| | - Virginijus Siksnys
- Department of Protein-DNA Interactions, Institute of Biotechnology, Vilnius University, Sauletekio al. 7, LT-10257, Vilnius, Lithuania
| |
Collapse
|
10
|
Polyvalent Proteins, a Pervasive Theme in the Intergenomic Biological Conflicts of Bacteriophages and Conjugative Elements. J Bacteriol 2017; 199:JB.00245-17. [PMID: 28559295 PMCID: PMC5512222 DOI: 10.1128/jb.00245-17] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2017] [Accepted: 05/17/2017] [Indexed: 12/29/2022] Open
Abstract
Intense biological conflicts between prokaryotic genomes and their genomic parasites have resulted in an arms race in terms of the molecular “weaponry” deployed on both sides. Using a recursive computational approach, we uncovered a remarkable class of multidomain proteins with 2 to 15 domains in the same polypeptide deployed by viruses and plasmids in such conflicts. Domain architectures and genomic contexts indicate that they are part of a widespread conflict strategy involving proteins injected into the host cell along with parasite DNA during the earliest phase of infection. Their unique feature is the combination of domains with highly disparate biochemical activities in the same polypeptide; accordingly, we term them polyvalent proteins. Of the 131 domains in polyvalent proteins, a large fraction are enzymatic domains predicted to modify proteins, target nucleic acids, alter nucleotide signaling/metabolism, and attack peptidoglycan or cytoskeletal components. They further contain nucleic acid-binding domains, virion structural domains, and 40 novel uncharacterized domains. Analysis of their architectural network reveals both pervasive common themes and specialized strategies for conjugative elements and plasmids or (pro)phages. The themes include likely processing of multidomain polypeptides by zincin-like metallopeptidases and mechanisms to counter restriction or CRISPR/Cas systems and jump-start transcription or replication. DNA-binding domains acquired by eukaryotes from such systems have been reused in XPC/RAD4-dependent DNA repair and mitochondrial genome replication in kinetoplastids. Characterization of the novel domains discovered here, such as RNases and peptidases, are likely to aid in the development of new reagents and elucidation of the spread of antibiotic resistance. IMPORTANCE This is the first report of the widespread presence of large proteins, termed polyvalent proteins, predicted to be transmitted by genomic parasites such as conjugative elements, plasmids, and phages during the initial phase of infection along with their DNA. They are typified by the presence of multiple domains with disparate activities combined in the same protein. While some of these domains are predicted to assist the invasive element in replication, transcription, or protection of their DNA, several are likely to target various host defense systems or modify the host to favor the parasite's life cycle. Notably, DNA-binding domains from these systems have been transferred to eukaryotes, where they have been incorporated into DNA repair and mitochondrial genome replication systems.
Collapse
|
11
|
Patel S. Drivers of bacterial genomes plasticity and roles they play in pathogen virulence, persistence and drug resistance. INFECTION GENETICS AND EVOLUTION 2016; 45:151-164. [DOI: 10.1016/j.meegid.2016.08.030] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/28/2016] [Revised: 08/26/2016] [Accepted: 08/27/2016] [Indexed: 12/11/2022]
|
12
|
Weigele P, Raleigh EA. Biosynthesis and Function of Modified Bases in Bacteria and Their Viruses. Chem Rev 2016; 116:12655-12687. [PMID: 27319741 DOI: 10.1021/acs.chemrev.6b00114] [Citation(s) in RCA: 120] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/16/2023]
Abstract
Naturally occurring modification of the canonical A, G, C, and T bases can be found in the DNA of cellular organisms and viruses from all domains of life. Bacterial viruses (bacteriophages) are a particularly rich but still underexploited source of such modified variant nucleotides. The modifications conserve the coding and base-pairing functions of DNA, but add regulatory and protective functions. In prokaryotes, modified bases appear primarily to be part of an arms race between bacteriophages (and other genomic parasites) and their hosts, although, as in eukaryotes, some modifications have been adapted to convey epigenetic information. The first half of this review catalogs the identification and diversity of DNA modifications found in bacteria and bacteriophages. What is known about the biogenesis, context, and function of these modifications are also described. The second part of the review places these DNA modifications in the context of the arms race between bacteria and bacteriophages. It focuses particularly on the defense and counter-defense strategies that turn on direct recognition of the presence of a modified base. Where modification has been shown to affect other DNA transactions, such as expression and chromosome segregation, that is summarized, with reference to recent reviews.
Collapse
Affiliation(s)
- Peter Weigele
- Chemical Biology, New England Biolabs , Ipswich, Massachusetts 01938, United States
| | | |
Collapse
|
13
|
He X, Hull V, Thomas JA, Fu X, Gidwani S, Gupta YK, Black LW, Xu SY. Expression and purification of a single-chain Type IV restriction enzyme Eco94GmrSD and determination of its substrate preference. Sci Rep 2015; 5:9747. [PMID: 25988532 PMCID: PMC4437046 DOI: 10.1038/srep09747] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2014] [Accepted: 03/06/2015] [Indexed: 01/30/2023] Open
Abstract
The first reported Type IV restriction endonuclease (REase) GmrSD consists of GmrS
and GmrD subunits. In most bacteria, however, the gmrS and gmrD genes
are fused together to encode a single-chain protein. The fused coding sequence for
ECSTEC94C_1402 from E. coli strain STEC_94C was expressed in T7 Express. The
protein designated as Eco94GmrSD displays modification-dependent ATP-stimulated
REase activity on T4 DNA with glucosyl-5-hydroxymethyl-cytosines (glc-5hmC) and T4gt
DNA with 5-hydroxymethyl-cytosines (5hmC). A C-terminal 6xHis-tagged protein was
purified by two-column chromatography. The enzyme is active in Mg2+
and Mn2+ buffer. It prefers to cleave large glc-5hmC- or
5hmC-modified DNA. In phage restriction assays, Eco94GmrSD weakly restricted T4 and
T4gt, whereas T4 IPI*-deficient phage (Δip1) were restricted more
than 106-fold, consistent with IPI* protection of E. coli
DH10B from lethal expression of the closely homologous E. coli CT596 GmrSD.
Eco94GmrSD is proposed to belong to the His-Asn-His (HNH)-nuclease family by the
identification of a putative C-terminal REase catalytic site D507-H508-N522.
Supporting this, GmrSD variants D507A, H508A, and N522A displayed no endonuclease
activity. The presence of a large number of fused GmrSD homologs suggests that GmrSD
is an effective phage exclusion protein that provides a mechanism to thwart T-even
phage infection.
Collapse
Affiliation(s)
- Xinyi He
- 1] New England Biolabs, Inc., 240 County Road, Ipswich, MA 01938, USA [2] State Key Laboratory of Microbial Metabolism, and School of Life Sciences &Biotechnology Shanghai Jiao Tong University, 1954 Huashan Road, Shanghai, 200030, China
| | - Victoria Hull
- New England Biolabs, Inc., 240 County Road, Ipswich, MA 01938, USA
| | - Julie A Thomas
- Department of Biochemistry and Molecular Biology, University of Maryland School of Medicine, 108 North Green St, Baltimore, MD 21201-1503, USA
| | - Xiaoqing Fu
- New England Biolabs, Inc., 240 County Road, Ipswich, MA 01938, USA
| | - Sonal Gidwani
- New England Biolabs, Inc., 240 County Road, Ipswich, MA 01938, USA
| | - Yogesh K Gupta
- Department of Structural and Chemical Biology, Icahn School of Medicine at Mount Sinai, Box 1677, 1425 Madison Avenue, New York, NY 10029, USA
| | - Lindsay W Black
- Department of Biochemistry and Molecular Biology, University of Maryland School of Medicine, 108 North Green St, Baltimore, MD 21201-1503, USA
| | - Shuang-yong Xu
- New England Biolabs, Inc., 240 County Road, Ipswich, MA 01938, USA
| |
Collapse
|
14
|
Mills CL, Beuning PJ, Ondrechen MJ. Biochemical functional predictions for protein structures of unknown or uncertain function. Comput Struct Biotechnol J 2015; 13:182-91. [PMID: 25848497 PMCID: PMC4372640 DOI: 10.1016/j.csbj.2015.02.003] [Citation(s) in RCA: 62] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2014] [Revised: 02/06/2015] [Accepted: 02/11/2015] [Indexed: 01/07/2023] Open
Abstract
With the exponential growth in the determination of protein sequences and structures via genome sequencing and structural genomics efforts, there is a growing need for reliable computational methods to determine the biochemical function of these proteins. This paper reviews the efforts to address the challenge of annotating the function at the molecular level of uncharacterized proteins. While sequence- and three-dimensional-structure-based methods for protein function prediction have been reviewed previously, the recent trends in local structure-based methods have received less attention. These local structure-based methods are the primary focus of this review. Computational methods have been developed to predict the residues important for catalysis and the local spatial arrangements of these residues can be used to identify protein function. In addition, the combination of different types of methods can help obtain more information and better predictions of function for proteins of unknown function. Global initiatives, including the Enzyme Function Initiative (EFI), COMputational BRidges to EXperiments (COMBREX), and the Critical Assessment of Function Annotation (CAFA), are evaluating and testing the different approaches to predicting the function of proteins of unknown function. These initiatives and global collaborations will increase the capability and reliability of methods to predict biochemical function computationally and will add substantial value to the current volume of structural genomics data by reducing the number of absent or inaccurate functional annotations.
Collapse
Affiliation(s)
- Caitlyn L Mills
- Department of Chemistry and Chemical Biology, Northeastern University, Boston, MA 02115, United States
| | - Penny J Beuning
- Department of Chemistry and Chemical Biology, Northeastern University, Boston, MA 02115, United States
| | - Mary Jo Ondrechen
- Department of Chemistry and Chemical Biology, Northeastern University, Boston, MA 02115, United States
| |
Collapse
|
15
|
Lundin S, Jemt A, Terje-Hegge F, Foam N, Pettersson E, Käller M, Wirta V, Lexow P, Lundeberg J. Endonuclease specificity and sequence dependence of type IIS restriction enzymes. PLoS One 2015; 10:e0117059. [PMID: 25629514 PMCID: PMC4309577 DOI: 10.1371/journal.pone.0117059] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2014] [Accepted: 12/17/2014] [Indexed: 11/23/2022] Open
Abstract
Restriction enzymes that recognize specific sequences but cleave unknown sequence outside the recognition site are extensively utilized tools in molecular biology. Despite this, systematic functional categorization of cleavage performance has largely been lacking. We established a simple and automatable model system to assay cleavage distance variation (termed slippage) and the sequence dependence thereof. We coupled this to massively parallel sequencing in order to provide sensitive and accurate measurement. With this system 14 enzymes were assayed (AcuI, BbvI, BpmI, BpuEI, BseRI, BsgI, Eco57I, Eco57MI, EcoP15I, FauI, FokI, GsuI, MmeI and SmuI). We report significant variation of slippage ranging from 1–54%, variations in sequence context dependence, as well as variation between isoschizomers. We believe this largely overlooked property of enzymes with shifted cleavage would benefit from further large scale classification and engineering efforts seeking to improve performance. The gained insights of in-vitro performance may also aid the in-vivo understanding of these enzymes.
Collapse
Affiliation(s)
- Sverker Lundin
- Science for Life Laboratory, KTH, Gene Technology, Solna, 171 65, Sweden
| | - Anders Jemt
- Science for Life Laboratory, KTH, Gene Technology, Solna, 171 65, Sweden
| | | | | | | | | | | | | | - Joakim Lundeberg
- Science for Life Laboratory, KTH, Gene Technology, Solna, 171 65, Sweden
- * E-mail:
| |
Collapse
|
16
|
Zaremba M, Toliusis P, Grigaitis R, Manakova E, Silanskas A, Tamulaitiene G, Szczelkun MD, Siksnys V. DNA cleavage by CgII and NgoAVII requires interaction between N- and R-proteins and extensive nucleotide hydrolysis. Nucleic Acids Res 2014; 42:13887-96. [PMID: 25429977 PMCID: PMC4267653 DOI: 10.1093/nar/gku1236] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2014] [Revised: 10/31/2014] [Accepted: 11/10/2014] [Indexed: 01/07/2023] Open
Abstract
The stress-sensitive restriction-modification (RM) system CglI from Corynebacterium glutamicum and the homologous NgoAVII RM system from Neisseria gonorrhoeae FA1090 are composed of three genes: a DNA methyltransferase (M.CglI and M.NgoAVII), a putative restriction endonuclease (R.CglI and R.NgoAVII, or R-proteins) and a predicted DEAD-family helicase/ATPase (N.CglI and N.NgoAVII or N-proteins). Here we report a biochemical characterization of the R- and N-proteins. Size-exclusion chromatography and SAXS experiments reveal that the isolated R.CglI, R.NgoAVII and N.CglI proteins form homodimers, while N.NgoAVII is a monomer in solution. Moreover, the R.CglI and N.CglI proteins assemble in a complex with R2N2 stoichiometry. Next, we show that N-proteins have ATPase activity that is dependent on double-stranded DNA and is stimulated by the R-proteins. Functional ATPase activity and extensive ATP hydrolysis (∼170 ATP/s/monomer) are required for site-specific DNA cleavage by R-proteins. We show that ATP-dependent DNA cleavage by R-proteins occurs at fixed positions (6-7 nucleotides) downstream of the asymmetric recognition sequence 5'-GCCGC-3'. Despite similarities to both Type I and II restriction endonucleases, the CglI and NgoAVII enzymes may employ a unique catalytic mechanism for DNA cleavage.
Collapse
Affiliation(s)
- Mindaugas Zaremba
- Department of Protein-DNA Interactions, Institute of Biotechnology, Vilnius University, Graiciuno 8, LT-02241 Vilnius, Lithuania
| | - Paulius Toliusis
- Department of Protein-DNA Interactions, Institute of Biotechnology, Vilnius University, Graiciuno 8, LT-02241 Vilnius, Lithuania
| | - Rokas Grigaitis
- Department of Protein-DNA Interactions, Institute of Biotechnology, Vilnius University, Graiciuno 8, LT-02241 Vilnius, Lithuania
| | - Elena Manakova
- Department of Protein-DNA Interactions, Institute of Biotechnology, Vilnius University, Graiciuno 8, LT-02241 Vilnius, Lithuania
| | - Arunas Silanskas
- Department of Protein-DNA Interactions, Institute of Biotechnology, Vilnius University, Graiciuno 8, LT-02241 Vilnius, Lithuania
| | - Giedre Tamulaitiene
- Department of Protein-DNA Interactions, Institute of Biotechnology, Vilnius University, Graiciuno 8, LT-02241 Vilnius, Lithuania
| | - Mark D Szczelkun
- DNA-Protein Interactions Unit, School of Biochemistry, Medical Sciences Building, University of Bristol, Bristol BS8 1TD, UK
| | - Virginijus Siksnys
- Department of Protein-DNA Interactions, Institute of Biotechnology, Vilnius University, Graiciuno 8, LT-02241 Vilnius, Lithuania
| |
Collapse
|
17
|
Lu S, Le S, Tan Y, Li M, Liu C, Zhang K, Huang J, Chen H, Rao X, Zhu J, Zou L, Ni Q, Li S, Wang J, Jin X, Hu Q, Yao X, Zhao X, Zhang L, Huang G, Hu F. Unlocking the mystery of the hard-to-sequence phage genome: PaP1 methylome and bacterial immunity. BMC Genomics 2014; 15:803. [PMID: 25233860 PMCID: PMC4177049 DOI: 10.1186/1471-2164-15-803] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2014] [Accepted: 09/16/2014] [Indexed: 12/02/2022] Open
Abstract
Background Whole-genome sequencing is an important method to understand the genetic information, gene function, biological characteristics and survival mechanisms of organisms. Sequencing large genomes is very simple at present. However, we encountered a hard-to-sequence genome of Pseudomonas aeruginosa phage PaP1. Shotgun sequencing method failed to complete the sequence of this genome. Results After persevering for 10 years and going over three generations of sequencing techniques, we successfully completed the sequence of the PaP1 genome with a length of 91,715 bp. Single-molecule real-time sequencing results revealed that this genome contains 51 N-6-methyladenines and 152 N-4-methylcytosines. Three significant modified sequence motifs were predicted, but not all of the sites found in the genome were methylated in these motifs. Further investigations revealed a novel immune mechanism of bacteria, in which host bacteria can recognise and repel modified bases containing inserts in a large scale. This mechanism could be accounted for the failure of the shotgun method in PaP1 genome sequencing. This problem was resolved using the nfi- mutant of Escherichia coli DH5α as a host bacterium to construct a shotgun library. Conclusions This work provided insights into the hard-to-sequence phage PaP1 genome and discovered a new mechanism of bacterial immunity. The methylome of phage PaP1 is responsible for the failure of shotgun sequencing and for bacterial immunity mediated by enzyme Endo V activity; this methylome also provides a valuable resource for future studies on PaP1 genome replication and modification, as well as on gene regulation and host interaction. Electronic supplementary material The online version of this article (doi:10.1186/1471-2164-15-803) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - Fuquan Hu
- Department of Microbiology, College of Basic Medical Science, Third Military Medical University, Chongqing 400038, P, R, China.
| |
Collapse
|
18
|
Anton BP, Kasif S, Roberts RJ, Steffen M. Objective: biochemical function. Front Genet 2014; 5:210. [PMID: 25071837 PMCID: PMC4085566 DOI: 10.3389/fgene.2014.00210] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2014] [Accepted: 06/19/2014] [Indexed: 11/13/2022] Open
Affiliation(s)
| | - Simon Kasif
- Bioinformatics Program, Boston University Boston, MA, USA ; Department of Biomedical Engineering, Boston University Boston, MA, USA
| | | | - Martin Steffen
- Department of Biomedical Engineering, Boston University Boston, MA, USA ; Department of Pathology and Laboratory Medicine, Boston University School of Medicine Boston, MA, USA
| |
Collapse
|
19
|
Anton BP, Chang YC, Brown P, Choi HP, Faller LL, Guleria J, Hu Z, Klitgord N, Levy-Moonshine A, Maksad A, Mazumdar V, McGettrick M, Osmani L, Pokrzywa R, Rachlin J, Swaminathan R, Allen B, Housman G, Monahan C, Rochussen K, Tao K, Bhagwat AS, Brenner SE, Columbus L, de Crécy-Lagard V, Ferguson D, Fomenkov A, Gadda G, Morgan RD, Osterman AL, Rodionov DA, Rodionova IA, Rudd KE, Söll D, Spain J, Xu SY, Bateman A, Blumenthal RM, Bollinger JM, Chang WS, Ferrer M, Friedberg I, Galperin MY, Gobeill J, Haft D, Hunt J, Karp P, Klimke W, Krebs C, Macelis D, Madupu R, Martin MJ, Miller JH, O'Donovan C, Palsson B, Ruch P, Setterdahl A, Sutton G, Tate J, Yakunin A, Tchigvintsev D, Plata G, Hu J, Greiner R, Horn D, Sjölander K, Salzberg SL, Vitkup D, Letovsky S, Segrè D, DeLisi C, Roberts RJ, Steffen M, Kasif S. The COMBREX project: design, methodology, and initial results. PLoS Biol 2013; 11:e1001638. [PMID: 24013487 PMCID: PMC3754883 DOI: 10.1371/journal.pbio.1001638] [Citation(s) in RCA: 49] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Affiliation(s)
- Brian P. Anton
- New England Biolabs, Ipswich, Massachusetts, United States of America
- * E-mail: (BPA); (SK)
| | - Yi-Chien Chang
- Bioinformatics Program, Boston University, Boston, Massachusetts, United States of America
| | - Peter Brown
- Department of Biomedical Engineering, Boston University, Boston, Massachusetts, United States of America
| | - Han-Pil Choi
- Department of Biomedical Engineering, Boston University, Boston, Massachusetts, United States of America
| | - Lina L. Faller
- Bioinformatics Program, Boston University, Boston, Massachusetts, United States of America
| | - Jyotsna Guleria
- Department of Biomedical Engineering, Boston University, Boston, Massachusetts, United States of America
| | - Zhenjun Hu
- Bioinformatics Program, Boston University, Boston, Massachusetts, United States of America
| | - Niels Klitgord
- Bioinformatics Program, Boston University, Boston, Massachusetts, United States of America
| | - Ami Levy-Moonshine
- Department of Biomedical Engineering, Boston University, Boston, Massachusetts, United States of America
| | - Almaz Maksad
- Department of Biomedical Engineering, Boston University, Boston, Massachusetts, United States of America
| | - Varun Mazumdar
- Bioinformatics Program, Boston University, Boston, Massachusetts, United States of America
| | - Mark McGettrick
- Diatom Software LLC, Holliston, Massachusetts, United States of America
| | - Lais Osmani
- Department of Biomedical Engineering, Boston University, Boston, Massachusetts, United States of America
| | - Revonda Pokrzywa
- Department of Biomedical Engineering, Boston University, Boston, Massachusetts, United States of America
| | - John Rachlin
- Diatom Software LLC, Holliston, Massachusetts, United States of America
| | - Rajeswari Swaminathan
- Department of Biomedical Engineering, Boston University, Boston, Massachusetts, United States of America
| | - Benjamin Allen
- Program for Evolutionary Dynamics, Harvard University, Cambridge, Massachusetts, United States of America
- Department of Mathematics, Emmanuel College, Boston, Massachusetts, United States of America
| | - Genevieve Housman
- Department of Biomedical Engineering, Boston University, Boston, Massachusetts, United States of America
| | - Caitlin Monahan
- Department of Biomedical Engineering, Boston University, Boston, Massachusetts, United States of America
| | - Krista Rochussen
- Department of Biomedical Engineering, Boston University, Boston, Massachusetts, United States of America
| | - Kevin Tao
- Department of Biomedical Engineering, Boston University, Boston, Massachusetts, United States of America
| | - Ashok S. Bhagwat
- Department of Chemistry, Wayne State University, Detroit, Michigan, United States of America
| | - Steven E. Brenner
- Department of Plant and Microbial Biology, University of California, Berkeley, California, United States of America
| | - Linda Columbus
- Department of Chemistry, University of Virginia, Charlottesville, Virginia, United States of America
| | - Valérie de Crécy-Lagard
- Department of Microbiology and Cell Science, University of Florida, Gainesville, Florida, United States of America
| | - Donald Ferguson
- Department of Microbiology, Miami University, Oxford, Ohio, United States of America
| | - Alexey Fomenkov
- New England Biolabs, Ipswich, Massachusetts, United States of America
| | - Giovanni Gadda
- Department of Chemistry, Georgia State University, Atlanta, Georgia, United States of America
| | - Richard D. Morgan
- New England Biolabs, Ipswich, Massachusetts, United States of America
| | - Andrei L. Osterman
- Bioinformatics and Systems Biology, Sanford Burnham Medical Research Institute, La Jolla, California, United States of America
| | - Dmitry A. Rodionov
- Bioinformatics and Systems Biology, Sanford Burnham Medical Research Institute, La Jolla, California, United States of America
| | - Irina A. Rodionova
- Bioinformatics and Systems Biology, Sanford Burnham Medical Research Institute, La Jolla, California, United States of America
| | - Kenneth E. Rudd
- Department of Biochemistry and Molecular Biology, University of Miami, Miami, Florida, United States of America
| | - Dieter Söll
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, Connecticut, United States of America
| | - James Spain
- School of Civil and Environmental Engineering, Georgia Institute of Technology, Atlanta, Georgia, United States of America
| | - Shuang-yong Xu
- New England Biolabs, Ipswich, Massachusetts, United States of America
| | - Alex Bateman
- European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, United Kingdom
| | - Robert M. Blumenthal
- Department of Medical Microbiology and Immunology, and Program in Bioinformatics, University of Toledo, Toledo, Ohio, United States of America
| | - J. Martin Bollinger
- Department of Biochemistry and Molecular Biology, Pennsylvania State University, University Park, Pennsylvania, United States of America
| | - Woo-Suk Chang
- Department of Biology, University of Texas-Arlington, Arlington, Texas, United States of America
| | - Manuel Ferrer
- Spanish National Research Council (CSIC), Institute of Catalysis, Madrid, Spain
| | - Iddo Friedberg
- Department of Microbiology, Miami University, Oxford, Ohio, United States of America
| | - Michael Y. Galperin
- National Center for Biotechnology Information (NCBI), National Institutes of Health (NIH), Bethesda, Maryland, United States of America
| | - Julien Gobeill
- Department of Library and Information Sciences, University of Applied Sciences Western Switzerland, Geneva, Switzerland
- Bibliomics and Text Mining Group, Swiss Institute of Bioinformatics, Geneva, Switzerland
| | - Daniel Haft
- J. Craig Venter Institute, Rockville, Maryland, United States of America
| | - John Hunt
- Biological Sciences, Columbia University, New York, New York, United States of America
| | - Peter Karp
- Bioinformatics Research Group, Artificial Intelligence Center, SRI International, Menlo Park, California, United States of America
| | - William Klimke
- National Center for Biotechnology Information (NCBI), National Institutes of Health (NIH), Bethesda, Maryland, United States of America
| | - Carsten Krebs
- Department of Biochemistry and Molecular Biology, Pennsylvania State University, University Park, Pennsylvania, United States of America
| | - Dana Macelis
- New England Biolabs, Ipswich, Massachusetts, United States of America
| | - Ramana Madupu
- J. Craig Venter Institute, Rockville, Maryland, United States of America
| | - Maria J. Martin
- European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, United Kingdom
| | - Jeffrey H. Miller
- Department of Microbiology, Immunology, and Molecular Genetics, University of California, Los Angeles, Los Angeles, California, United States of America
| | - Claire O'Donovan
- European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, United Kingdom
| | - Bernhard Palsson
- Department of Bioengineering, University of California, San Diego, La Jolla, California, United States of America
| | - Patrick Ruch
- Department of Library and Information Sciences, University of Applied Sciences Western Switzerland, Geneva, Switzerland
- Bibliomics and Text Mining Group, Swiss Institute of Bioinformatics, Geneva, Switzerland
| | - Aaron Setterdahl
- Department of Chemistry, Indiana University Southeast, New Albany, Indiana, United States of America
| | - Granger Sutton
- J. Craig Venter Institute, Rockville, Maryland, United States of America
| | - John Tate
- Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, United Kingdom
| | - Alexander Yakunin
- Department of Chemical Engineering and Applied Chemistry, University of Toronto, Toronto, Ontario, Canada
| | - Dmitri Tchigvintsev
- Department of Chemical Engineering and Applied Chemistry, University of Toronto, Toronto, Ontario, Canada
| | - Germán Plata
- Center for Computational Biology and Bioinformatics, Columbia University, New York, New York, United States of America
- Integrated Program in Cellular, Molecular, Structural, and Genetic Studies, Columbia University, New York, New York, United States of America
| | - Jie Hu
- Center for Computational Biology and Bioinformatics, Columbia University, New York, New York, United States of America
| | - Russell Greiner
- Department of Computing Science, University of Alberta, Edmonton, Alberta, Canada
| | - David Horn
- School of Physics and Astronomy, Tel Aviv University, Tel Aviv, Israel
| | - Kimmen Sjölander
- Berkeley Phylogenomics Group, University of California, Berkeley, California, United States of America
| | - Steven L. Salzberg
- Departments of Medicine and Biostatistics, McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, Maryland, United States of America
| | - Dennis Vitkup
- Center for Computational Biology and Bioinformatics, Columbia University, New York, New York, United States of America
| | - Stanley Letovsky
- Bioinformatics Program, Boston University, Boston, Massachusetts, United States of America
| | - Daniel Segrè
- Bioinformatics Program, Boston University, Boston, Massachusetts, United States of America
| | - Charles DeLisi
- Bioinformatics Program, Boston University, Boston, Massachusetts, United States of America
| | - Richard J. Roberts
- New England Biolabs, Ipswich, Massachusetts, United States of America
- Bioinformatics Program, Boston University, Boston, Massachusetts, United States of America
| | - Martin Steffen
- Department of Biomedical Engineering, Boston University, Boston, Massachusetts, United States of America
| | - Simon Kasif
- Bioinformatics Program, Boston University, Boston, Massachusetts, United States of America
- Department of Biomedical Engineering, Boston University, Boston, Massachusetts, United States of America
- * E-mail: (BPA); (SK)
| |
Collapse
|
20
|
Anantharaman V, Makarova KS, Burroughs AM, Koonin EV, Aravind L. Comprehensive analysis of the HEPN superfamily: identification of novel roles in intra-genomic conflicts, defense, pathogenesis and RNA processing. Biol Direct 2013; 8:15. [PMID: 23768067 PMCID: PMC3710099 DOI: 10.1186/1745-6150-8-15] [Citation(s) in RCA: 177] [Impact Index Per Article: 16.1] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2013] [Accepted: 05/09/2013] [Indexed: 12/20/2022] Open
Abstract
Background The major role of enzymatic toxins that target nucleic acids in biological conflicts at all levels has become increasingly apparent thanks in large part to the advances of comparative genomics. Typically, toxins evolve rapidly hampering the identification of these proteins by sequence analysis. Here we analyze an unexpectedly widespread superfamily of toxin domains most of which possess RNase activity. Results The HEPN superfamily is comprised of all α-helical domains that were first identified as being associated with DNA polymerase β-type nucleotidyltransferases in prokaryotes and animal Sacsin proteins. Using sensitive sequence and structure comparison methods, we vastly extend the HEPN superfamily by identifying numerous novel families and by detecting diverged HEPN domains in several known protein families. The new HEPN families include the RNase LS and LsoA catalytic domains, KEN domains (e.g. RNaseL and Ire1) and the RNase domains of RloC and PrrC. The majority of HEPN domains contain conserved motifs that constitute a metal-independent endoRNase active site. Some HEPN domains lacking this motif probably function as non-catalytic RNA-binding domains, such as in the case of the mannitol repressor MtlR. Our analysis shows that HEPN domains function as toxins that are shared by numerous systems implicated in intra-genomic, inter-genomic and intra-organismal conflicts across the three domains of cellular life. In prokaryotes HEPN domains are essential components of numerous toxin-antitoxin (TA) and abortive infection (Abi) systems and in addition are tightly associated with many restriction-modification (R-M) and CRISPR-Cas systems, and occasionally with other defense systems such as Pgl and Ter. We present evidence of multiple modes of action of HEPN domains in these systems, which include direct attack on viral RNAs (e.g. LsoA and RNase LS) in conjunction with other RNase domains (e.g. a novel RNase H fold domain, NamA), suicidal or dormancy-inducing attack on self RNAs (RM systems and possibly CRISPR-Cas systems), and suicidal attack coupled with direct interaction with phage components (Abi systems). These findings are compatible with the hypothesis on coupling of pathogen-targeting (immunity) and self-directed (programmed cell death and dormancy induction) responses in the evolution of robust antiviral strategies. We propose that altruistic cell suicide mediated by HEPN domains and other functionally similar RNases was essential for the evolution of kin and group selection and cell cooperation. HEPN domains were repeatedly acquired by eukaryotes and incorporated into several core functions such as endonucleolytic processing of the 5.8S-25S/28S rRNA precursor (Las1), a novel ER membrane-associated RNA degradation system (C6orf70), sensing of unprocessed transcripts at the nuclear periphery (Swt1). Multiple lines of evidence suggest that, similar to prokaryotes, HEPN proteins were recruited to antiviral, antitransposon, apoptotic systems or RNA-level response to unfolded proteins (Sacsin and KEN domains) in several groups of eukaryotes. Conclusions Extensive sequence and structure comparisons reveal unexpectedly broad presence of the HEPN domain in an enormous variety of defense and stress response systems across the tree of life. In addition, HEPN domains have been recruited to perform essential functions, in particular in eukaryotic rRNA processing. These findings are expected to stimulate experiments that could shed light on diverse cellular processes across the three domains of life. Reviewers This article was reviewed by Martijn Huynen, Igor Zhulin and Nick Grishin
Collapse
Affiliation(s)
- Vivek Anantharaman
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | | | | | | | | |
Collapse
|
21
|
Xu SY, Gupta YK. Natural zinc ribbon HNH endonucleases and engineered zinc finger nicking endonuclease. Nucleic Acids Res 2012; 41:378-90. [PMID: 23125367 PMCID: PMC3592412 DOI: 10.1093/nar/gks1043] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/25/2023] Open
Abstract
Many bacteriophage and prophage genomes encode an HNH endonuclease (HNHE) next to their cohesive end site and terminase genes. The HNH catalytic domain contains the conserved catalytic residues His-Asn-His and a zinc-binding site [CxxC]2. An additional zinc ribbon (ZR) domain with one to two zinc-binding sites ([CxxxxC], [CxxxxH], [CxxxC], [HxxxH], [CxxC] or [CxxH]) is frequently found at the N-terminus or C-terminus of the HNHE or a ZR domain protein (ZRP) located adjacent to the HNHE. We expressed and purified 10 such HNHEs and characterized their cleavage sites. These HNHEs are site-specific and strand-specific nicking endonucleases (NEase or nickase) with 3- to 7-bp specificities. A minimal HNH nicking domain of 76 amino acid residues was identified from Bacillus phage γ HNHE and subsequently fused to a zinc finger protein to generate a chimeric NEase with a new specificity (12–13 bp). The identification of a large pool of previously unknown natural NEases and engineered NEases provides more ‘tools’ for DNA manipulation and molecular diagnostics. The small modular HNH nicking domain can be used to generate rare NEases applicable to targeted genome editing. In addition, the engineered ZF nickase is useful for evaluation of off-target sites in vitro before performing cell-based gene modification.
Collapse
Affiliation(s)
- Shuang-yong Xu
- New England Biolabs, Inc, Research Department, 240 County Road, Ipswich, MA 01938, USA.
| | | |
Collapse
|
22
|
Murray IA, Clark TA, Morgan RD, Boitano M, Anton BP, Luong K, Fomenkov A, Turner SW, Korlach J, Roberts RJ. The methylomes of six bacteria. Nucleic Acids Res 2012; 40:11450-62. [PMID: 23034806 PMCID: PMC3526280 DOI: 10.1093/nar/gks891] [Citation(s) in RCA: 195] [Impact Index Per Article: 16.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Six bacterial genomes, Geobacter metallireducens GS-15, Chromohalobacter salexigens, Vibrio breoganii 1C-10, Bacillus cereus ATCC 10987, Campylobacter jejuni subsp. jejuni 81-176 and C. jejuni NCTC 11168, all of which had previously been sequenced using other platforms were re-sequenced using single-molecule, real-time (SMRT) sequencing specifically to analyze their methylomes. In every case a number of new N(6)-methyladenine ((m6)A) and N(4)-methylcytosine ((m4)C) methylation patterns were discovered and the DNA methyltransferases (MTases) responsible for those methylation patterns were assigned. In 15 cases, it was possible to match MTase genes with MTase recognition sequences without further sub-cloning. Two Type I restriction systems required sub-cloning to differentiate their recognition sequences, while four MTase genes that were not expressed in the native organism were sub-cloned to test for viability and recognition sequences. Two of these proved active. No attempt was made to detect 5-methylcytosine ((m5)C) recognition motifs from the SMRT® sequencing data because this modification produces weaker signals using current methods. However, all predicted (m6)A and (m4)C MTases were detected unambiguously. This study shows that the addition of SMRT sequencing to traditional sequencing approaches gives a wealth of useful functional information about a genome showing not only which MTase genes are active but also revealing their recognition sequences.
Collapse
Affiliation(s)
- Iain A Murray
- New England Biolabs, 240 County Road, Ipswich, MA 01938, USA
| | | | | | | | | | | | | | | | | | | |
Collapse
|
23
|
Zhang G, Wang W, Deng A, Sun Z, Zhang Y, Liang Y, Che Y, Wen T. A mimicking-of-DNA-methylation-patterns pipeline for overcoming the restriction barrier of bacteria. PLoS Genet 2012; 8:e1002987. [PMID: 23028379 PMCID: PMC3459991 DOI: 10.1371/journal.pgen.1002987] [Citation(s) in RCA: 85] [Impact Index Per Article: 7.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2012] [Accepted: 08/10/2012] [Indexed: 12/20/2022] Open
Abstract
Genetic transformation of bacteria harboring multiple Restriction-Modification (R-M) systems is often difficult using conventional methods. Here, we describe a mimicking-of-DNA-methylation-patterns (MoDMP) pipeline to address this problem in three difficult-to-transform bacterial strains. Twenty-four putative DNA methyltransferases (MTases) from these difficult-to-transform strains were cloned and expressed in an Escherichia coli strain lacking all of the known R-M systems and orphan MTases. Thirteen of these MTases exhibited DNA modification activity in Southwestern dot blot or Liquid Chromatography–Mass Spectrometry (LC–MS) assays. The active MTase genes were assembled into three operons using the Saccharomyces cerevisiae DNA assembler and were co-expressed in the E. coli strain lacking known R-M systems and orphan MTases. Thereafter, results from the dot blot and restriction enzyme digestion assays indicated that the DNA methylation patterns of the difficult-to-transform strains are mimicked in these E. coli hosts. The transformation of the Gram-positive Bacillus amyloliquefaciens TA208 and B. cereus ATCC 10987 strains with the shuttle plasmids prepared from MoDMP hosts showed increased efficiencies (up to four orders of magnitude) compared to those using the plasmids prepared from the E. coli strain lacking known R-M systems and orphan MTases or its parental strain. Additionally, the gene coding for uracil phosphoribosyltransferase (upp) was directly inactivated using non-replicative plasmids prepared from the MoDMP host in B. amyloliquefaciens TA208. Moreover, the Gram-negative chemoautotrophic Nitrobacter hamburgensis strain X14 was transformed and expressed Green Fluorescent Protein (GFP). Finally, the sequence specificities of active MTases were identified by restriction enzyme digestion, making the MoDMP system potentially useful for other strains. The effectiveness of the MoDMP pipeline in different bacterial groups suggests a universal potential. This pipeline could facilitate the functional genomics of the strains that are difficult to transform. Approximately 95% of the genome-sequenced bacteria harbor Restriction-Modification (R-M) systems. R-M systems usually occur in pairs, i.e., DNA methyltransferases (MTases) and restriction endonucleases (REases). REases can degrade invading DNA to protect the cell from infection by phages. This protecting machinery has also become the barrier for experimental genetic manipulation, because the newly introduced DNA would be degraded by the REases of the transformed bacteria. In this study we have developed a pipeline to protect DNA by methylation from cleavage by host REases. Multiple DNA MTases were cloned from three difficult-to-transform bacterial strains and co-expressed in an E. coli strain lacking all of the known endogenous R-M systems and orphan MTases. Thus, the DNA methylation patterns of these strains have become similar to that of the difficult-to-transform strains. Ultimately, the DNA prepared from these E. coli strains can overcome the R-M barrier of the bacterial strains that are difficult to transform and achieve genetic manipulation. The effectiveness of this pipeline in different bacterial groups suggests a universal potential. This pipeline could facilitate functional genomics of bacterial strains that are difficult to transform.
Collapse
Affiliation(s)
- Guoqiang Zhang
- Department of Industrial Microbiology and Biotechnology, Institute of Microbiology, Chinese Academy of Sciences, Beijing, China
| | - Wenzhao Wang
- State Key Laboratory of Mycology, Institute of Microbiology, Chinese Academy of Sciences, Beijing, China
| | - Aihua Deng
- Department of Industrial Microbiology and Biotechnology, Institute of Microbiology, Chinese Academy of Sciences, Beijing, China
| | - Zhaopeng Sun
- Department of Industrial Microbiology and Biotechnology, Institute of Microbiology, Chinese Academy of Sciences, Beijing, China
| | - Yun Zhang
- Department of Industrial Microbiology and Biotechnology, Institute of Microbiology, Chinese Academy of Sciences, Beijing, China
| | - Yong Liang
- Department of Industrial Microbiology and Biotechnology, Institute of Microbiology, Chinese Academy of Sciences, Beijing, China
| | - Yongsheng Che
- Department of Natural Products Chemistry, Beijing Institute of Pharmacology and Toxicology, Beijing, China
| | - Tingyi Wen
- Department of Industrial Microbiology and Biotechnology, Institute of Microbiology, Chinese Academy of Sciences, Beijing, China
- * E-mail:
| |
Collapse
|