51
|
Genome-wide mapping of mutations at single-nucleotide resolution for protein, metabolic and genome engineering. Nat Biotechnol 2016; 35:48-55. [PMID: 27941803 DOI: 10.1038/nbt.3718] [Citation(s) in RCA: 243] [Impact Index Per Article: 30.4] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2016] [Accepted: 10/05/2016] [Indexed: 01/20/2023]
Abstract
Improvements in DNA synthesis and sequencing have underpinned comprehensive assessment of gene function in bacteria and eukaryotes. Genome-wide analyses require high-throughput methods to generate mutations and analyze their phenotypes, but approaches to date have been unable to efficiently link the effects of mutations in coding regions or promoter elements in a highly parallel fashion. We report that CRISPR-Cas9 gene editing in combination with massively parallel oligomer synthesis can enable trackable editing on a genome-wide scale. Our method, CRISPR-enabled trackable genome engineering (CREATE), links each guide RNA to homologous repair cassettes that both edit loci and function as barcodes to track genotype-phenotype relationships. We apply CREATE to site saturation mutagenesis for protein engineering, reconstruction of adaptive laboratory evolution experiments, and identification of stress tolerance and antibiotic resistance genes in bacteria. We provide preliminary evidence that CREATE will work in yeast. We also provide a webtool to design multiplex CREATE libraries.
Collapse
|
52
|
Halweg-Edwards AL, Pines G, Winkler JD, Pines A, Gill RT. A Web Interface for Codon Compression. ACS Synth Biol 2016; 5:1021-3. [PMID: 27169595 DOI: 10.1021/acssynbio.6b00026] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Saturation mutagenesis is widely used in protein engineering and other experiments. A common practice is to utilize the single degenerate codon NNK. However, this approach suffers from amino acid bias and the presence of a stop codon and of the wild type amino acid. These extra features needlessly increase library size and consequently downstream screening load. Recently, we developed the DYNAMCC algorithms for codon compression that find the minimal set of degenerate codons, covering any defined set of amino acids, with no off-target codons and with redundancy control. Additionally, we experimentally demonstrated the advantages of this approach over the standard NNK method. While the code is freely available from our Web site, we have now made this method more accessible to a broader audience without any computational background by building a user-friendly web-based interface for those algorithms. The Web site can be accessed through: www.dynamcc.com .
Collapse
Affiliation(s)
- Andrea L. Halweg-Edwards
- Department
of Chemical and Biological Engineering, University of Colorado Boulder, Boulder, Colorado 80309, United States
| | - Gur Pines
- Department
of Chemical and Biological Engineering, University of Colorado Boulder, Boulder, Colorado 80309, United States
| | - James D. Winkler
- Department
of Chemical and Biological Engineering, University of Colorado Boulder, Boulder, Colorado 80309, United States
| | | | - Ryan T. Gill
- Department
of Chemical and Biological Engineering, University of Colorado Boulder, Boulder, Colorado 80309, United States
| |
Collapse
|
53
|
Soo VWC, Yosaatmadja Y, Squire CJ, Patrick WM. Mechanistic and Evolutionary Insights from the Reciprocal Promiscuity of Two Pyridoxal Phosphate-dependent Enzymes. J Biol Chem 2016; 291:19873-87. [PMID: 27474741 PMCID: PMC5025676 DOI: 10.1074/jbc.m116.739557] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2016] [Indexed: 11/06/2022] Open
Abstract
Enzymes that utilize the cofactor pyridoxal 5′-phosphate play essential roles in amino acid metabolism in all organisms. The cofactor is used by proteins that adopt at least five different folds, which raises questions about the evolutionary processes that might explain the observed distribution of functions among folds. In this study, we show that a representative of fold type III, the Escherichia coli alanine racemase (ALR), is a promiscuous cystathionine β-lyase (CBL). Furthermore, E. coli CBL (fold type I) is a promiscuous alanine racemase. A single round of error-prone PCR and selection yielded variant ALR(Y274F), which catalyzes cystathionine β-elimination with a near-native Michaelis constant (Km = 3.3 mm) but a poor turnover number (kcat ≈10 h−1). In contrast, directed evolution also yielded CBL(P113S), which catalyzes l-alanine racemization with a poor Km (58 mm) but a high kcat (22 s−1). The structures of both variants were solved in the presence and absence of the l-alanine analogue, (R)-1-aminoethylphosphonic acid. As expected, the ALR active site was enlarged by the Y274F substitution, allowing better access for cystathionine. More surprisingly, the favorable kinetic parameters of CBL(P113S) appear to result from optimizing the pKa of Tyr-111, which acts as the catalytic acid during l-alanine racemization. Our data emphasize the short mutational routes between the functions of pyridoxal 5′-phosphate-dependent enzymes, regardless of whether or not they share the same fold. Thus, they confound the prevailing model of enzyme evolution, which predicts that overlapping patterns of promiscuity result from sharing a common multifunctional ancestor.
Collapse
Affiliation(s)
- Valerie W C Soo
- From the Institute of Natural and Mathematical Sciences, Massey University, Auckland 0632
| | - Yuliana Yosaatmadja
- the School of Biological Sciences, University of Auckland, Auckland 1142, and
| | | | - Wayne M Patrick
- the Department of Biochemistry, University of Otago, Dunedin 9054, New Zealand
| |
Collapse
|
54
|
Reetz MT. What are the Limitations of Enzymes in Synthetic Organic Chemistry? CHEM REC 2016; 16:2449-2459. [DOI: 10.1002/tcr.201600040] [Citation(s) in RCA: 59] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2016] [Indexed: 12/31/2022]
Affiliation(s)
- Manfred T. Reetz
- Fachbereich Chemie (15) Philipps-Universität Marburg Hans-Meerwein Straße; 35032 Marburg Germany
- Max-Planck-Institut für Kohlenforschung; Kaiser-Wilhelm-Platz 1 45470 Mülheim an der Ruhr Germany
| |
Collapse
|
55
|
Ferla MP. Mutanalyst, an online tool for assessing the mutational spectrum of epPCR libraries with poor sampling. BMC Bioinformatics 2016; 17:152. [PMID: 27044645 PMCID: PMC4820924 DOI: 10.1186/s12859-016-0996-7] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2016] [Accepted: 03/22/2016] [Indexed: 01/03/2023] Open
Abstract
Background Assessing library diversity is an important control step in a directed evolution experiment. To do this, a limited amount of colonies from a test library are sequenced and tested. In the case of an error-prone PCR library, the spectrum of the identified mutations — the proportions of mutations of a specific nucleobase to another— is calculated enabling the user to make more informed predictions on library diversity and coverage. However, the calculations of the mutational spectrum are severely affected by the limited sample sizes. Results Here an online program, called Mutanalyst, is presented, which not only automates the calculations, but also estimates errors involved. Specifically, the errors are calculated thanks to the complementarity of DNA, which means that a mutation has a complementary mutation on the other sequence. Additionally, in the case of determining the mean number of mutations per sequence it does so by fitting to a Poisson distribution, which is more robust than calculating the average in light of the small sampling size. Conclusion As a result of the added measures to keep into account of small sample size the user can better assess whether the library is satisfactory or whether error-prone PCR conditions should be adjusted. The program is available at www.mutanalyst.com.
Collapse
Affiliation(s)
- Matteo Paolo Ferla
- Formerly Department of Biochemistry, University of Otago, Dunedin, New Zealand. .,Present address: Biosyntia, DTU Centre for Biosustainability, Hørsholm, Denmark.
| |
Collapse
|
56
|
Wilson RH, Alonso H, Whitney SM. Evolving Methanococcoides burtonii archaeal Rubisco for improved photosynthesis and plant growth. Sci Rep 2016; 6:22284. [PMID: 26926260 PMCID: PMC4772096 DOI: 10.1038/srep22284] [Citation(s) in RCA: 55] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2016] [Accepted: 02/10/2016] [Indexed: 11/28/2022] Open
Abstract
In photosynthesis Ribulose-1,5-bisphosphate carboxylase/oxygenase (Rubisco) catalyses the often rate limiting CO2-fixation step in the Calvin cycle. This makes Rubisco both the gatekeeper for carbon entry into the biosphere and a target for functional improvement to enhance photosynthesis and plant growth. Encumbering the catalytic performance of Rubisco is its highly conserved, complex catalytic chemistry. Accordingly, traditional efforts to enhance Rubisco catalysis using protracted "trial and error" protein engineering approaches have met with limited success. Here we demonstrate the versatility of high throughput directed (laboratory) protein evolution for improving the carboxylation properties of a non-photosynthetic Rubisco from the archaea Methanococcoides burtonii. Using chloroplast transformation in the model plant Nicotiana tabacum (tobacco) we confirm the improved forms of M. burtonii Rubisco increased photosynthesis and growth relative to tobacco controls producing wild-type M. burtonii Rubisco. Our findings indicate continued directed evolution of archaeal Rubisco offers new potential for enhancing leaf photosynthesis and plant growth.
Collapse
Affiliation(s)
- Robert H. Wilson
- Research School of Biology, The Australian National University, Acton, Australian Capital Territory 2601, Australia
| | - Hernan Alonso
- Research School of Biology, The Australian National University, Acton, Australian Capital Territory 2601, Australia
| | - Spencer M. Whitney
- Research School of Biology, The Australian National University, Acton, Australian Capital Territory 2601, Australia
| |
Collapse
|
57
|
Sun Z, Wikmark Y, Bäckvall JE, Reetz MT. New Concepts for Increasing the Efficiency in Directed Evolution of Stereoselective Enzymes. Chemistry 2016; 22:5046-54. [DOI: 10.1002/chem.201504406] [Citation(s) in RCA: 67] [Impact Index Per Article: 8.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2015] [Indexed: 01/28/2023]
Affiliation(s)
- Zhoutong Sun
- Max-Planck-Institut für Kohlenforschung; Kaiser-Wilhelm-Platz 1 45470 Mülheim an der Ruhr Germany
- Fachbereich Chemie; Philipps-Universität Marburg; Hans-Meerwein-Strasse 4 35032 Marburg Germany
| | - Ylva Wikmark
- Department of Organic Chemistry; Arrhenius Laboratory; Stockholm University; 106 91 Stockholm Sweden
| | - Jan-E. Bäckvall
- Department of Organic Chemistry; Arrhenius Laboratory; Stockholm University; 106 91 Stockholm Sweden
| | - Manfred T. Reetz
- Max-Planck-Institut für Kohlenforschung; Kaiser-Wilhelm-Platz 1 45470 Mülheim an der Ruhr Germany
- Fachbereich Chemie; Philipps-Universität Marburg; Hans-Meerwein-Strasse 4 35032 Marburg Germany
| |
Collapse
|
58
|
Engqvist MKM, Nielsen J. ANT: Software for Generating and Evaluating Degenerate Codons for Natural and Expanded Genetic Codes. ACS Synth Biol 2015; 4:935-8. [PMID: 25901796 DOI: 10.1021/acssynbio.5b00018] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
Abstract
The Ambiguous Nucleotide Tool (ANT) is a desktop application that generates and evaluates degenerate codons. Degenerate codons are used to represent DNA positions that have multiple possible nucleotide alternatives. This is useful for protein engineering and directed evolution, where primers specified with degenerate codons are used as a basis for generating libraries of protein sequences. ANT is intuitive and can be used in a graphical user interface or by interacting with the code through a defined application programming interface. ANT comes with full support for nonstandard, user-defined, or expanded genetic codes (translation tables), which is important because synthetic biology is being applied to an ever widening range of natural and engineered organisms. The Python source code for ANT is freely distributed so that it may be used without restriction, modified, and incorporated in other software or custom data pipelines.
Collapse
Affiliation(s)
- Martin K. M. Engqvist
- Department of Chemical & Biological Engineering, Chalmers University of Technology, Kemivägen 10, SE-412 96 Göteborg, Sweden
| | - Jens Nielsen
- Department of Chemical & Biological Engineering, Chalmers University of Technology, Kemivägen 10, SE-412 96 Göteborg, Sweden
| |
Collapse
|
59
|
Hu CC, Gan P, Zhang RY, Xue JX, Ran LK. Identification of prostate cancer LncRNAs by RNA-Seq. Asian Pac J Cancer Prev 2015; 15:9439-44. [PMID: 25422238 DOI: 10.7314/apjcp.2014.15.21.9439] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
PURPOSE To identify prostate cancer lncRNAs using a pipeline proposed in this study, which is applicable for the identification of lncRNAs that are differentially expressed in prostate cancer tissues but have a negligible potential to encode proteins. MATERIALS AND METHODS We used two publicly available RNA-Seq datasets from normal prostate tissue and prostate cancer. Putative lncRNAs were predicted using the biological technology, then specific lncRNAs of prostate cancer were found by differential expression analysis and co-expression network was constructed by the weighted gene co-expression network analysis. RESULTS A total of 1,080 lncRNA transcripts were obtained in the RNA-Seq datasets. Three genes (PCA3, C20orf166-AS1 and RP11-267A15.1) showed a significant differential expression in the prostate cancer tissues, and were thus identified as prostate cancer specific lncRNAs. Brown and black modules had significant negative and positive correlations with prostate cancer, respectively. CONCLUSIONS The pipeline proposed in this study is useful for the prediction of prostate cancer specific lncRNAs. Three genes (PCA3, C20orf166-AS1, and RP11-267A15.1) were identified to have a significant differential expression in prostate cancer tissues. However, there have been no published studies to demonstrate the specificity of RP11-267A15.1 in prostate cancer tissues. Thus, the results of this study can provide a new theoretic insight into the identification of prostate cancer specific genes.
Collapse
Affiliation(s)
- Cheng-Cheng Hu
- Laboratory of Biomedical Engineering, Chongqing Medical University, Chongqing, China E-mail :
| | | | | | | | | |
Collapse
|
60
|
Sieber T, Hare E, Hofmann H, Trepel M. Biomathematical description of synthetic peptide libraries. PLoS One 2015; 10:e0129200. [PMID: 26042419 PMCID: PMC4456392 DOI: 10.1371/journal.pone.0129200] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2014] [Accepted: 05/07/2015] [Indexed: 12/28/2022] Open
Abstract
Libraries of randomised peptides displayed on phages or viral particles are essential tools in a wide spectrum of applications. However, there is only limited understanding of a library's fundamental dynamics and the influences of encoding schemes and sizes on their quality. Numeric properties of libraries, such as the expected number of different peptides and the library's coverage, have long been in use as measures of a library's quality. Here, we present a graphical framework of these measures together with a library's relative efficiency to help to describe libraries in enough detail for researchers to plan new experiments in a more informed manner. In particular, these values allow us to answer-in a probabilistic fashion-the question of whether a specific library does indeed contain one of the "best" possible peptides. The framework is implemented in a web-interface based on two packages, discreteRV and peptider, to the statistical software environment R. We further provide a user-friendly web-interface called PeLiCa (Peptide Library Calculator, http://www.pelica.org), allowing scientists to plan and analyse their peptide libraries.
Collapse
Affiliation(s)
- Timo Sieber
- Department of Oncology and Hematology, University Medical Center Hamburg-Eppendorf, Hamburg, Germany
| | - Eric Hare
- Department of Statistics, Iowa State University, Ames, IA, USA
| | - Heike Hofmann
- Department of Statistics, Iowa State University, Ames, IA, USA
- * E-mail:
| | - Martin Trepel
- Department of Hematology and Oncology, Augsburg Medical Center, Interdisciplinary Cancer Center, Augsburg, Germany
| |
Collapse
|
61
|
Maddock DJ, Patrick WM, Gerth ML. Substitutions at the cofactor phosphate-binding site of a clostridial alcohol dehydrogenase lead to unexpected changes in substrate specificity. Protein Eng Des Sel 2015; 28:251-8. [PMID: 26034298 PMCID: PMC4498498 DOI: 10.1093/protein/gzv028] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2015] [Accepted: 05/05/2015] [Indexed: 12/22/2022] Open
Abstract
Changing the cofactor specificity of an enzyme from nicotinamide adenine dinucleotide 2′-phosphate (NADPH) to the more abundant NADH is a common strategy for increasing overall enzyme efficiency in microbial metabolic engineering. The aim of this study was to switch the cofactor specificity of the primary–secondary alcohol dehydrogenase from Clostridium autoethanogenum, a bacterium with considerable promise for the bio-manufacturing of fuels and other petrochemicals, from strictly NADPH-dependent to NADH-dependent. We used insights from a homology model to build a site-saturation library focussed on residue S199, the position deemed most likely to disrupt binding of the 2′-phosphate of NADPH. Although the CaADH(S199X) library did not yield any NADH-dependent enzymes, it did reveal that substitutions at the cofactor phosphate-binding site can cause unanticipated changes in the substrate specificity of the enzyme. Using consensus-guided site-directed mutagenesis, we were able to create an enzyme that was stringently NADH-dependent, albeit with a concomitant reduction in activity. This study highlights the role that distal residues play in substrate specificity and the complexity of enzyme–cofactor interactions.
Collapse
Affiliation(s)
- Danielle J Maddock
- Department of Biochemistry, University of Otago, Dunedin 9010, New Zealand
| | - Wayne M Patrick
- Department of Biochemistry, University of Otago, Dunedin 9010, New Zealand
| | - Monica L Gerth
- Department of Biochemistry, University of Otago, Dunedin 9010, New Zealand
| |
Collapse
|
62
|
Sun Z, Lonsdale R, Kong XD, Xu JH, Zhou J, Reetz MT. Reshaping an Enzyme Binding Pocket for Enhanced and Inverted Stereoselectivity: Use of Smallest Amino Acid Alphabets in Directed Evolution. Angew Chem Int Ed Engl 2015. [DOI: 10.1002/ange.201501809] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
|
63
|
Sun Z, Lonsdale R, Kong XD, Xu JH, Zhou J, Reetz MT. Reshaping an Enzyme Binding Pocket for Enhanced and Inverted Stereoselectivity: Use of Smallest Amino Acid Alphabets in Directed Evolution. Angew Chem Int Ed Engl 2015; 54:12410-5. [DOI: 10.1002/anie.201501809] [Citation(s) in RCA: 82] [Impact Index Per Article: 9.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2015] [Indexed: 01/06/2023]
|
64
|
Hoebenreich S, Zilly FE, Acevedo-Rocha CG, Zilly M, Reetz MT. Speeding up directed evolution: Combining the advantages of solid-phase combinatorial gene synthesis with statistically guided reduction of screening effort. ACS Synth Biol 2015; 4:317-31. [PMID: 24921161 DOI: 10.1021/sb5002399] [Citation(s) in RCA: 41] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023]
Abstract
Efficient and economic methods in directed evolution at the protein, metabolic, and genome level are needed for biocatalyst development and the success of synthetic biology. In contrast to random strategies, semirational approaches such as saturation mutagenesis explore the sequence space in a focused manner. Although several combinatorial libraries based on saturation mutagenesis have been reported using solid-phase gene synthesis, direct comparison with traditional PCR-based methods is currently lacking. In this work, we compare combinatorial protein libraries created in-house via PCR versus those generated by commercial solid-phase gene synthesis. Using descriptive statistics and probabilistic distributions on amino acid occurrence frequencies, the quality of the libraries was assessed and compared, revealing that the outsourced libraries are characterized by less bias and outliers than the PCR-based ones. Afterward, we screened all libraries following a traditional algorithm for almost complete library coverage and compared this approach with an emergent statistical concept suggesting screening a lower portion of the protein sequence space. Upon analyzing the biocatalytic landscapes and best hits of all combinatorial libraries, we show that the screening effort could have been reduced in all cases by more than 50%, while still finding at least one of the best mutants.
Collapse
Affiliation(s)
- Sabrina Hoebenreich
- Max-Planck-Institut für Kohlenforschung, Kaiser-Wilhelm-Platz 1, 45470 Mülheim an der Ruhr, Germany
- Fachbereich
Chemie, Philipps-Universität Marburg, Hans-Meerwein-Straße, 35032 Marburg, Germany
| | - Felipe E. Zilly
- Max-Planck-Institut für Kohlenforschung, Kaiser-Wilhelm-Platz 1, 45470 Mülheim an der Ruhr, Germany
| | - Carlos G. Acevedo-Rocha
- Max-Planck-Institut für Kohlenforschung, Kaiser-Wilhelm-Platz 1, 45470 Mülheim an der Ruhr, Germany
- Fachbereich
Chemie, Philipps-Universität Marburg, Hans-Meerwein-Straße, 35032 Marburg, Germany
| | - Matías Zilly
- Fakultät
für Physik, Universität Duisburg-Essen, Lotharstraße 1, 47048 Duisburg, Germany
| | - Manfred T. Reetz
- Max-Planck-Institut für Kohlenforschung, Kaiser-Wilhelm-Platz 1, 45470 Mülheim an der Ruhr, Germany
- Fachbereich
Chemie, Philipps-Universität Marburg, Hans-Meerwein-Straße, 35032 Marburg, Germany
| |
Collapse
|
65
|
Currin A, Swainston N, Day PJ, Kell DB. Synthetic biology for the directed evolution of protein biocatalysts: navigating sequence space intelligently. Chem Soc Rev 2015; 44:1172-239. [PMID: 25503938 PMCID: PMC4349129 DOI: 10.1039/c4cs00351a] [Citation(s) in RCA: 256] [Impact Index Per Article: 28.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2014] [Indexed: 12/21/2022]
Abstract
The amino acid sequence of a protein affects both its structure and its function. Thus, the ability to modify the sequence, and hence the structure and activity, of individual proteins in a systematic way, opens up many opportunities, both scientifically and (as we focus on here) for exploitation in biocatalysis. Modern methods of synthetic biology, whereby increasingly large sequences of DNA can be synthesised de novo, allow an unprecedented ability to engineer proteins with novel functions. However, the number of possible proteins is far too large to test individually, so we need means for navigating the 'search space' of possible protein sequences efficiently and reliably in order to find desirable activities and other properties. Enzymologists distinguish binding (Kd) and catalytic (kcat) steps. In a similar way, judicious strategies have blended design (for binding, specificity and active site modelling) with the more empirical methods of classical directed evolution (DE) for improving kcat (where natural evolution rarely seeks the highest values), especially with regard to residues distant from the active site and where the functional linkages underpinning enzyme dynamics are both unknown and hard to predict. Epistasis (where the 'best' amino acid at one site depends on that or those at others) is a notable feature of directed evolution. The aim of this review is to highlight some of the approaches that are being developed to allow us to use directed evolution to improve enzyme properties, often dramatically. We note that directed evolution differs in a number of ways from natural evolution, including in particular the available mechanisms and the likely selection pressures. Thus, we stress the opportunities afforded by techniques that enable one to map sequence to (structure and) activity in silico, as an effective means of modelling and exploring protein landscapes. Because known landscapes may be assessed and reasoned about as a whole, simultaneously, this offers opportunities for protein improvement not readily available to natural evolution on rapid timescales. Intelligent landscape navigation, informed by sequence-activity relationships and coupled to the emerging methods of synthetic biology, offers scope for the development of novel biocatalysts that are both highly active and robust.
Collapse
Affiliation(s)
- Andrew Currin
- Manchester Institute of Biotechnology , The University of Manchester , 131, Princess St , Manchester M1 7DN , UK . ; http://dbkgroup.org/; @dbkell ; Tel: +44 (0)161 306 4492
- School of Chemistry , The University of Manchester , Manchester M13 9PL , UK
- Centre for Synthetic Biology of Fine and Speciality Chemicals (SYNBIOCHEM) , The University of Manchester , 131, Princess St , Manchester M1 7DN , UK
| | - Neil Swainston
- Manchester Institute of Biotechnology , The University of Manchester , 131, Princess St , Manchester M1 7DN , UK . ; http://dbkgroup.org/; @dbkell ; Tel: +44 (0)161 306 4492
- Centre for Synthetic Biology of Fine and Speciality Chemicals (SYNBIOCHEM) , The University of Manchester , 131, Princess St , Manchester M1 7DN , UK
- School of Computer Science , The University of Manchester , Manchester M13 9PL , UK
| | - Philip J. Day
- Manchester Institute of Biotechnology , The University of Manchester , 131, Princess St , Manchester M1 7DN , UK . ; http://dbkgroup.org/; @dbkell ; Tel: +44 (0)161 306 4492
- Centre for Synthetic Biology of Fine and Speciality Chemicals (SYNBIOCHEM) , The University of Manchester , 131, Princess St , Manchester M1 7DN , UK
- Faculty of Medical and Human Sciences , The University of Manchester , Manchester M13 9PT , UK
| | - Douglas B. Kell
- Manchester Institute of Biotechnology , The University of Manchester , 131, Princess St , Manchester M1 7DN , UK . ; http://dbkgroup.org/; @dbkell ; Tel: +44 (0)161 306 4492
- School of Chemistry , The University of Manchester , Manchester M13 9PL , UK
- Centre for Synthetic Biology of Fine and Speciality Chemicals (SYNBIOCHEM) , The University of Manchester , 131, Princess St , Manchester M1 7DN , UK
| |
Collapse
|
66
|
Cheng F, Zhu L, Schwaneberg U. Directed evolution 2.0: improving and deciphering enzyme properties. Chem Commun (Camb) 2015; 51:9760-72. [DOI: 10.1039/c5cc01594d] [Citation(s) in RCA: 100] [Impact Index Per Article: 11.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2023]
Abstract
A KnowVolution: knowledge gaining directed evolution including four phases is proposed in this feature article, which generates improved enzyme variants and molecular understanding.
Collapse
Affiliation(s)
- Feng Cheng
- Lehrstuhl für Biotechnologie
- RWTH Aachen University
- 52074 Aachen
- Germany
| | - Leilei Zhu
- Lehrstuhl für Biotechnologie
- RWTH Aachen University
- 52074 Aachen
- Germany
| | - Ulrich Schwaneberg
- Lehrstuhl für Biotechnologie
- RWTH Aachen University
- 52074 Aachen
- Germany
- DWI-Leibniz Institute for Interactive Materials
| |
Collapse
|
67
|
Jacobs TM, Yumerefendi H, Kuhlman B, Leaver-Fay A. SwiftLib: rapid degenerate-codon-library optimization through dynamic programming. Nucleic Acids Res 2014; 43:e34. [PMID: 25539925 PMCID: PMC4357694 DOI: 10.1093/nar/gku1323] [Citation(s) in RCA: 43] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Degenerate codon (DC) libraries efficiently address the experimental library-size limitations of directed evolution by focusing diversity toward the positions and toward the amino acids (AAs) that are most likely to generate hits; however, manually constructing DC libraries is challenging, error prone and time consuming. This paper provides a dynamic programming solution to the task of finding the best DCs while keeping the size of the library beneath some given limit, improving on the existing integer-linear programming formulation. It then extends the algorithm to consider multiple DCs at each position, a heretofore unsolved problem, while adhering to a constraint on the number of primers needed to synthesize the library. In the two library-design problems examined here, the use of multiple DCs produces libraries that very nearly cover the set of desired AAs while still staying within the experimental size limits. Surprisingly, the algorithm is able to find near-perfect libraries where the ratio of amino-acid sequences to nucleic-acid sequences approaches 1; it effectively side-steps the degeneracy of the genetic code. Our algorithm is freely available through our web server and solves most design problems in about a second.
Collapse
Affiliation(s)
- Timothy M Jacobs
- Department of Biochemistry, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Hayretin Yumerefendi
- Department of Biochemistry, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Brian Kuhlman
- Department of Biochemistry, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA Lineberger Comprehensive Cancer Center, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Andrew Leaver-Fay
- Department of Biochemistry, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| |
Collapse
|
68
|
Lin L, Hu S, Yu K, Huang J, Yao S, Lei Y, Hu G, Mei L. Enhancing the Activity of Glutamate Decarboxylase from Lactobacillus brevis by Directed Evolution. Chin J Chem Eng 2014. [DOI: 10.1016/j.cjche.2014.09.025] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022]
|
69
|
Zhao J, Kardashliev T, Joëlle Ruff A, Bocola M, Schwaneberg U. Lessons from diversity of directed evolution experiments by an analysis of 3,000 mutations. Biotechnol Bioeng 2014; 111:2380-9. [DOI: 10.1002/bit.25302] [Citation(s) in RCA: 35] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2014] [Revised: 05/14/2014] [Accepted: 05/27/2014] [Indexed: 12/20/2022]
Affiliation(s)
- Jing Zhao
- Lehrstuhl für Biotechnologie; RWTH Aachen University; Worringerweg 3 52074 Aachen Germany
| | - Tsvetan Kardashliev
- Lehrstuhl für Biotechnologie; RWTH Aachen University; Worringerweg 3 52074 Aachen Germany
| | - Anna Joëlle Ruff
- Lehrstuhl für Biotechnologie; RWTH Aachen University; Worringerweg 3 52074 Aachen Germany
| | - Marco Bocola
- Lehrstuhl für Biotechnologie; RWTH Aachen University; Worringerweg 3 52074 Aachen Germany
| | - Ulrich Schwaneberg
- Lehrstuhl für Biotechnologie; RWTH Aachen University; Worringerweg 3 52074 Aachen Germany
| |
Collapse
|
70
|
Polissi A, Sperandeo P. The lipopolysaccharide export pathway in Escherichia coli: structure, organization and regulated assembly of the Lpt machinery. Mar Drugs 2014; 12:1023-42. [PMID: 24549203 PMCID: PMC3944529 DOI: 10.3390/md12021023] [Citation(s) in RCA: 36] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2014] [Revised: 01/22/2014] [Accepted: 01/28/2014] [Indexed: 01/12/2023] Open
Abstract
The bacterial outer membrane (OM) is a peculiar biological structure with a unique composition that contributes significantly to the fitness of Gram-negative bacteria in hostile environments. OM components are all synthesized in the cytosol and must, then, be transported efficiently across three compartments to the cell surface. Lipopolysaccharide (LPS) is a unique glycolipid that paves the outer leaflet of the OM. Transport of this complex molecule poses several problems to the cells due to its amphipatic nature. In this review, the multiprotein machinery devoted to LPS transport to the OM is discussed together with the challenges associated with this process and the solutions that cells have evolved to address the problem of LPS biogenesis.
Collapse
Affiliation(s)
- Alessandra Polissi
- Department of Biotechnology and Biosciences, University of Milano-Bicocca, Piazza della Scienza 2, 20126 Milan, Italy.
| | - Paola Sperandeo
- Department of Biotechnology and Biosciences, University of Milano-Bicocca, Piazza della Scienza 2, 20126 Milan, Italy.
| |
Collapse
|
71
|
Sebestova E, Bendl J, Brezovsky J, Damborsky J. Computational tools for designing smart libraries. Methods Mol Biol 2014; 1179:291-314. [PMID: 25055786 DOI: 10.1007/978-1-4939-1053-3_20] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]
Abstract
Traditional directed evolution experiments are often time-, labor- and cost-intensive because they involve repeated rounds of random mutagenesis and the selection or screening of large mutant libraries. The efficiency of directed evolution experiments can be significantly improved by targeting mutagenesis to a limited number of hot-spot positions and/or selecting a limited set of substitutions. The design of such "smart" libraries can be greatly facilitated by in silico analyses and predictions. Here we provide an overview of computational tools applicable for (a) the identification of hot-spots for engineering enzyme properties, and (b) the evaluation of predicted hot-spots and selection of suitable amino acids for substitutions. The selected tools do not require any specific expertise and can easily be implemented by the wider scientific community.
Collapse
Affiliation(s)
- Eva Sebestova
- Loschmidt Laboratories, Masaryk University, Kamenice 5/A13, 625 00, Brno, Czech Republic
| | | | | | | |
Collapse
|
72
|
Copp JN, Hanson-Manful P, Ackerley DF, Patrick WM. Error-prone PCR and effective generation of gene variant libraries for directed evolution. Methods Mol Biol 2014; 1179:3-22. [PMID: 25055767 DOI: 10.1007/978-1-4939-1053-3_1] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
Abstract
Any single-enzyme directed evolution strategy has two fundamental requirements: the need to efficiently introduce variation into a gene of interest and the need to create an effective library from those variants. Generation of a maximally diverse gene library is particularly important when employing nontargeted mutagenesis strategies such as error-prone PCR (epPCR), which seek to explore very large areas of sequence space. Here we present comprehensive protocols and tips for using epPCR to generate gene variants that exhibit a relatively balanced spectrum of mutations and for capturing as much diversity as possible through effective cloning of those variants. The detailed library preparation methods that we describe are generally applicable to any directed evolution strategy that uses restriction enzymes to clone gene variants into an expression plasmid.
Collapse
Affiliation(s)
- Janine N Copp
- Michael Smith Laboratories, University of British Columbia, Vancouver, BC, V6T 1Z4, Canada
| | | | | | | |
Collapse
|
73
|
Probabilistic methods in directed evolution: library size, mutation rate, and diversity. Methods Mol Biol 2014; 1179:261-78. [PMID: 25055784 DOI: 10.1007/978-1-4939-1053-3_18] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023]
Abstract
Directed evolution has emerged as an important tool for engineering proteins with improved or novel properties. Because of their inherent reliance on randomness, directed evolution protocols are amenable to probabilistic modeling and analysis. This chapter summarizes and reviews in a nonmathematical way some of the probabilistic works related to directed evolution, with particular focus on three of the most widely used methods: saturation mutagenesis, error-prone PCR, and in vitro recombination. The ultimate aim is to provide the reader with practical information to guide the planning and design of directed evolution studies. Importantly, the applications and locations of freely available computational resources to assist with this process are described in detail.
Collapse
|
74
|
Tabei Y, Yamanishi Y. Scalable prediction of compound-protein interactions using minwise hashing. BMC SYSTEMS BIOLOGY 2013; 7 Suppl 6:S3. [PMID: 24564870 PMCID: PMC4029277 DOI: 10.1186/1752-0509-7-s6-s3] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
The identification of compound-protein interactions plays key roles in the drug development toward discovery of new drug leads and new therapeutic protein targets. There is therefore a strong incentive to develop new efficient methods for predicting compound-protein interactions on a genome-wide scale. In this paper we develop a novel chemogenomic method to make a scalable prediction of compound-protein interactions from heterogeneous biological data using minwise hashing. The proposed method mainly consists of two steps: 1) construction of new compact fingerprints for compound-protein pairs by an improved minwise hashing algorithm, and 2) application of a sparsity-induced classifier to the compact fingerprints. We test the proposed method on its ability to make a large-scale prediction of compound-protein interactions from compound substructure fingerprints and protein domain fingerprints, and show superior performance of the proposed method compared with the previous chemogenomic methods in terms of prediction accuracy, computational efficiency, and interpretability of the predictive model. All the previously developed methods are not computationally feasible for the full dataset consisting of about 200 millions of compound-protein pairs. The proposed method is expected to be useful for virtual screening of a huge number of compounds against many protein targets.
Collapse
|
75
|
Accelerated protein engineering for chemical biotechnology via homologous recombination. Curr Opin Biotechnol 2013; 24:1017-22. [DOI: 10.1016/j.copbio.2013.03.003] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2012] [Revised: 02/28/2013] [Accepted: 03/05/2013] [Indexed: 12/22/2022]
|
76
|
Parra LP, Agudo R, Reetz MT. Directed Evolution by Using Iterative Saturation Mutagenesis Based on Multiresidue Sites. Chembiochem 2013; 14:2301-9. [DOI: 10.1002/cbic.201300486] [Citation(s) in RCA: 43] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2013] [Indexed: 12/18/2022]
|
77
|
Tirosh Y, Ofer D, Eliyahu T, Linial M. Short toxin-like proteins attack the defense line of innate immunity. Toxins (Basel) 2013; 5:1314-31. [PMID: 23881252 PMCID: PMC3737499 DOI: 10.3390/toxins5071314] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2013] [Revised: 07/16/2013] [Accepted: 07/16/2013] [Indexed: 01/30/2023] Open
Abstract
ClanTox (classifier of animal toxins) was developed for identifying toxin-like candidates from complete proteomes. Searching mammalian proteomes for short toxin-like proteins (coined TOLIPs) revealed a number of overlooked secreted short proteins with an abundance of cysteines throughout their sequences. We applied bioinformatics and data-mining methods to infer the function of several top predicted candidates. We focused on cysteine-rich peptides that adopt the fold of the three-finger proteins (TFPs). We identified a cluster of duplicated genes that share a structural similarity with elapid neurotoxins, such as α-bungarotoxin. In the murine proteome, there are about 60 such proteins that belong to the Ly6/uPAR family. These proteins are secreted or anchored to the cell membrane. Ly6/uPAR proteins are associated with a rich repertoire of functions, including binding to receptors and adhesion. Ly6/uPAR proteins modulate cell signaling in the context of brain functions and cells of the innate immune system. We postulate that TOLIPs, as modulators of cell signaling, may be associated with pathologies and cellular imbalance. We show that proteins of the Ly6/uPAR family are associated with cancer diagnosis and malfunction of the immune system.
Collapse
Affiliation(s)
- Yitshak Tirosh
- Department of Biological Chemistry, Silberman Institute of Life Sciences, The Hebrew University of Jerusalem, Jerusalem 91904, Israel.
| | | | | | | |
Collapse
|
78
|
Nov Y. Fitness loss and library size determination in saturation mutagenesis. PLoS One 2013; 8:e68069. [PMID: 23844158 PMCID: PMC3700877 DOI: 10.1371/journal.pone.0068069] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2013] [Accepted: 05/24/2013] [Indexed: 01/31/2023] Open
Abstract
Saturation mutagenesis is a widely used directed evolution technique, in which a large number of protein variants, each having random amino acids in certain predetermined positions, are screened in order to discover high-fitness variants among them. Several metrics for determining the library size (the number of variants screened) have been suggested in the literature, but none of them incorporates the actual fitness of the variants discovered in the experiment. We present the results of an extensive simulation study, which is based on probabilistic models for protein fitness landscape, and which investigates how the result of a saturation mutagenesis experiment – the fitness of the best variant discovered – varies as a function of the library size. In particular, we study the loss of fitness in the experiment: the difference between the fitness of the best variant discovered, and the fitness of the best variant in variant space. Our results are that the existing criteria for determining the library size are conservative, so smaller libraries are often satisfactory. Reducing the library size can save labor, time, and expenses in the laboratory.
Collapse
Affiliation(s)
- Yuval Nov
- Department of Statistics, University of Haifa, Haifa, Israel.
| |
Collapse
|
79
|
Optimal codon randomization via mathematical programming. J Theor Biol 2013; 335:147-52. [PMID: 23792109 DOI: 10.1016/j.jtbi.2013.05.034] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2013] [Accepted: 05/28/2013] [Indexed: 01/21/2023]
Abstract
Codon randomization via degenerate oligonucleotides is a widely used approach for generating protein libraries. We use integer programming methodology to model and solve the problem of computing the minimal mixture of oligonucleotides required to induce an arbitrary target probability over the 20 standard amino acids. We consider both randomization via conventional degenerate oligonucleotides, which incorporate at each position of the randomized codon certain nucleotides in equal probabilities, and randomization via spiked oligonucleotides, which admit arbitrary nucleotide distribution at each of the codon's positions. Existing methods for computing such mixtures rely on various heuristics.
Collapse
|
80
|
Ruff AJ, Dennig A, Schwaneberg U. To get what we aim for - progress in diversity generation methods. FEBS J 2013; 280:2961-78. [DOI: 10.1111/febs.12325] [Citation(s) in RCA: 56] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2013] [Revised: 04/23/2013] [Accepted: 04/25/2013] [Indexed: 01/06/2023]
Affiliation(s)
- Anna J. Ruff
- Lehrstuhl für Biotechnologie; RWTH Aachen University; Germany
| | | | | |
Collapse
|
81
|
Müller CA, Akkapurathu B, Winkler T, Staudt S, Hummel W, Gröger H, Schwaneberg U. In VitroDouble Oxidation ofn-Heptane with Direct Cofactor Regeneration. Adv Synth Catal 2013. [DOI: 10.1002/adsc.201300143] [Citation(s) in RCA: 35] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/23/2023]
|
82
|
Barakat M, Ortet P, Whitworth DE. P2RP: a Web-based framework for the identification and analysis of regulatory proteins in prokaryotic genomes. BMC Genomics 2013; 14:269. [PMID: 23601859 PMCID: PMC3637814 DOI: 10.1186/1471-2164-14-269] [Citation(s) in RCA: 35] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2013] [Accepted: 03/19/2013] [Indexed: 11/10/2022] Open
Abstract
Background Regulatory proteins (RPs) such as transcription factors (TFs) and two-component system (TCS) proteins control how prokaryotic cells respond to changes in their external and/or internal state. Identification and annotation of TFs and TCSs is non-trivial, and between-genome comparisons are often confounded by different standards in annotation. There is a need for user-friendly, fast and convenient tools to allow researchers to overcome the inherent variability in annotation between genome sequences. Results We have developed the web-server P2RP (Predicted Prokaryotic Regulatory Proteins), which enables users to identify and annotate TFs and TCS proteins within their sequences of interest. Users can input amino acid or genomic DNA sequences, and predicted proteins therein are scanned for the possession of DNA-binding domains and/or TCS domains. RPs identified in this manner are categorised into families, unambiguously annotated, and a detailed description of their features generated, using an integrated software pipeline. P2RP results can then be outputted in user-specified formats. Conclusion Biologists have an increasing need for fast and intuitively usable tools, which is why P2RP has been developed as an interactive system. As well as assisting experimental biologists to interrogate novel sequence data, it is hoped that P2RP will be built into genome annotation pipelines and re-annotation processes, to increase the consistency of RP annotation in public genomic sequences. P2RP is the first publicly available tool for predicting and analysing RP proteins in users’ sequences. The server is freely available and can be accessed along with documentation at http://www.p2rp.org.
Collapse
|
83
|
Kisand V, Lettieri T. Genome sequencing of bacteria: sequencing, de novo assembly and rapid analysis using open source tools. BMC Genomics 2013; 14:211. [PMID: 23547799 PMCID: PMC3618134 DOI: 10.1186/1471-2164-14-211] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2012] [Accepted: 03/22/2013] [Indexed: 11/18/2022] Open
Abstract
Background De novo genome sequencing of previously uncharacterized microorganisms has the potential to open up new frontiers in microbial genomics by providing insight into both functional capabilities and biodiversity. Until recently, Roche 454 pyrosequencing was the NGS method of choice for de novo assembly because it generates hundreds of thousands of long reads (<450 bps), which are presumed to aid in the analysis of uncharacterized genomes. The array of tools for processing NGS data are increasingly free and open source and are often adopted for both their high quality and role in promoting academic freedom. Results The error rate of pyrosequencing the Alcanivorax borkumensis genome was such that thousands of insertions and deletions were artificially introduced into the finished genome. Despite a high coverage (~30 fold), it did not allow the reference genome to be fully mapped. Reads from regions with errors had low quality, low coverage, or were missing. The main defect of the reference mapping was the introduction of artificial indels into contigs through lower than 100% consensus and distracting gene calling due to artificial stop codons. No assembler was able to perform de novo assembly comparable to reference mapping. Automated annotation tools performed similarly on reference mapped and de novo draft genomes, and annotated most CDSs in the de novo assembled draft genomes. Conclusions Free and open source software (FOSS) tools for assembly and annotation of NGS data are being developed rapidly to provide accurate results with less computational effort. Usability is not high priority and these tools currently do not allow the data to be processed without manual intervention. Despite this, genome assemblers now readily assemble medium short reads into long contigs (>97-98% genome coverage). A notable gap in pyrosequencing technology is the quality of base pair calling and conflicting base pairs between single reads at the same nucleotide position. Regardless, using draft whole genomes that are not finished and remain fragmented into tens of contigs allows one to characterize unknown bacteria with modest effort.
Collapse
Affiliation(s)
- Veljo Kisand
- Institute of Technology, Tartu University, Nooruse 1, Tartu 50411, Estonia.
| | | |
Collapse
|
84
|
Sullivan B, Walton AZ, Stewart JD. Library construction and evaluation for site saturation mutagenesis. Enzyme Microb Technol 2013; 53:70-7. [PMID: 23683706 DOI: 10.1016/j.enzmictec.2013.02.012] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2012] [Revised: 02/15/2013] [Accepted: 02/20/2013] [Indexed: 01/24/2023]
Abstract
We developed a method for creating and evaluating site-saturation libraries that consistently yields an average of 27.4±3.0 codons of the 32 possible within a pool of 95 transformants. This was verified by sequencing 95 members from 11 independent libraries within the gene encoding alkene reductase OYE 2.6 from Pichia stipitis. Correct PCR primer design as well as a variety of factors that increase transformation efficiency were critical contributors to the method's overall success. We also developed a quantitative analysis of library quality (Q-values) that defines library degeneracy. Q-values can be calculated from standard fluorescence sequencing data (capillary electropherograms) and the degeneracy predicted from an early stage of library construction (pooled plasmids from the initial transformation) closely matched that observed after ca. 1000 library members were sequenced. Based on this experience, we suggest that this analysis can be a useful guide when applying our optimized protocol to new systems, allowing one to focus only on good-quality libraries and reject substandard libraries at an early stage. This advantage is particularly important when lower-throughput screening techniques such as chiral-phase GC must be employed to identify protein variants with desirable properties, e.g., altered stereoselectivities or when multiple codons are targeted for simultaneous randomization.
Collapse
Affiliation(s)
- Bradford Sullivan
- Department of Chemistry, University of Florida, 126 Sisler Hall, Gainesville, FL 32611, USA
| | | | | |
Collapse
|
85
|
Reetz MT. The Importance of Additive and Non-Additive Mutational Effects in Protein Engineering. Angew Chem Int Ed Engl 2013; 52:2658-66. [DOI: 10.1002/anie.201207842] [Citation(s) in RCA: 132] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2012] [Revised: 12/19/2012] [Indexed: 01/01/2023]
|
86
|
Die Bedeutung von additiven und nicht-additiven Mutationseffekten beim Protein-Engineering. Angew Chem Int Ed Engl 2013. [DOI: 10.1002/ange.201207842] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
|
87
|
Jakoblinnert A, Wachtmeister J, Schukur L, Shivange AV, Bocola M, Ansorge-Schumacher MB, Schwaneberg U. Reengineered carbonyl reductase for reducing methyl-substituted cyclohexanones. Protein Eng Des Sel 2013; 26:291-8. [DOI: 10.1093/protein/gzt001] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/15/2023] Open
|
88
|
Duplouy A, Iturbe-Ormaetxe I, Beatson SA, Szubert JM, Brownlie JC, McMeniman CJ, McGraw EA, Hurst GDD, Charlat S, O'Neill SL, Woolfit M. Draft genome sequence of the male-killing Wolbachia strain wBol1 reveals recent horizontal gene transfers from diverse sources. BMC Genomics 2013; 14:20. [PMID: 23324387 PMCID: PMC3639933 DOI: 10.1186/1471-2164-14-20] [Citation(s) in RCA: 59] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2012] [Accepted: 01/02/2013] [Indexed: 02/06/2023] Open
Abstract
Background The endosymbiont Wolbachia pipientis causes diverse and sometimes dramatic phenotypes in its invertebrate hosts. Four Wolbachia strains sequenced to date indicate that the constitution of the genome is dynamic, but these strains are quite divergent and do not allow resolution of genome diversification over shorter time periods. We have sequenced the genome of the strain wBol1-b, found in the butterfly Hypolimnas bolina, which kills the male offspring of infected hosts during embyronic development and is closely related to the non-male-killing strain wPip from Culex pipiens. Results The genomes of wBol1-b and wPip are similar in genomic organisation, sequence and gene content, but show substantial differences at some rapidly evolving regions of the genome, primarily associated with prophage and repetitive elements. We identified 44 genes in wBol1-b that do not have homologs in any previously sequenced strains, indicating that Wolbachia’s non-core genome diversifies rapidly. These wBol1-b specific genes include a number that have been recently horizontally transferred from phylogenetically distant bacterial taxa. We further report a second possible case of horizontal gene transfer from a eukaryote into Wolbachia. Conclusions Our analyses support the developing view that many endosymbiotic genomes are highly dynamic, and are exposed and receptive to exogenous genetic material from a wide range of sources. These data also suggest either that this bacterial species is particularly permissive for eukaryote-to-prokaryote gene transfers, or that these transfers may be more common than previously believed. The wBol1-b-specific genes we have identified provide candidates for further investigations of the genomic bases of phenotypic differences between closely-related Wolbachia strains.
Collapse
Affiliation(s)
- Anne Duplouy
- School of Biological Sciences, University of Queensland, 4072, Brisbane, QLD, Australia.
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
89
|
Construction and analysis of randomized protein-encoding libraries using error-prone PCR. Methods Mol Biol 2013; 996:251-67. [PMID: 23504429 DOI: 10.1007/978-1-62703-354-1_15] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/15/2023]
Abstract
In contrast to site-directed mutagenesis and rational design, directed evolution harnesses Darwinian principles to identify proteins with new or improved properties. The critical first steps in a directed evolution experiment are as follows: (a) to introduce random diversity into the gene of interest and (b) to capture that diversity by cloning the resulting population of molecules into a suitable expression vector, en bloc. Error-prone PCR (epPCR) is a common method for introducing random mutations into a gene. In this chapter, we describe detailed protocols for epPCR and for the construction of large, maximally diverse libraries of cloned variants. We also describe the utility of an online program, PEDEL-AA, for analyzing the compositions of epPCR libraries. The methods described here were used to construct several libraries in our laboratory. A side-by-side comparison of the results is used to show that, ultimately, epPCR is a highly stochastic process.
Collapse
|
90
|
Ruff AJ, Marienhagen J, Verma R, Roccatano D, Genieser HG, Niemann P, Shivange AV, Schwaneberg U. dRTP and dPTP a complementary nucleotide couple for the Sequence Saturation Mutagenesis (SeSaM) method. ACTA ACUST UNITED AC 2012. [DOI: 10.1016/j.molcatb.2012.04.018] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]
|
91
|
Ruff AJ, Dennig A, Wirtz G, Blanusa M, Schwaneberg U. Flow Cytometer-Based High-Throughput Screening System for Accelerated Directed Evolution of P450 Monooxygenases. ACS Catal 2012. [DOI: 10.1021/cs300115d] [Citation(s) in RCA: 33] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Affiliation(s)
- Anna Joëlle Ruff
- Lehrstuhl für Biotechnologie, RWTH Aachen University, Worringerweg 1, 52074 Aachen,
Germany
| | - Alexander Dennig
- Lehrstuhl für Biotechnologie, RWTH Aachen University, Worringerweg 1, 52074 Aachen,
Germany
| | - Georgette Wirtz
- Lehrstuhl für Biotechnologie, RWTH Aachen University, Worringerweg 1, 52074 Aachen,
Germany
| | - Milan Blanusa
- School of Engineering
and Science, Jacobs University Bremen,
Campus Ring 1, 28759 Bremen,
Germany
| | - Ulrich Schwaneberg
- Lehrstuhl für Biotechnologie, RWTH Aachen University, Worringerweg 1, 52074 Aachen,
Germany
| |
Collapse
|
92
|
Verma R, Schwaneberg U, Roccatano D. Computer-Aided Protein Directed Evolution: a Review of Web Servers, Databases and other Computational Tools for Protein Engineering. Comput Struct Biotechnol J 2012; 2:e201209008. [PMID: 24688649 PMCID: PMC3962222 DOI: 10.5936/csbj.201209008] [Citation(s) in RCA: 48] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2012] [Revised: 10/07/2012] [Accepted: 10/12/2012] [Indexed: 12/01/2022] Open
Abstract
The combination of computational and directed evolution methods has proven a winning strategy for protein engineering. We refer to this approach as computer-aided protein directed evolution (CAPDE) and the review summarizes the recent developments in this rapidly growing field. We will restrict ourselves to overview the availability, usability and limitations of web servers, databases and other computational tools proposed in the last five years. The goal of this review is to provide concise information about currently available computational resources to assist the design of directed evolution based protein engineering experiment.
Collapse
Affiliation(s)
- Rajni Verma
- School of Engineering and Science, Jacobs University Bremen, Campus Ring 1, 28759 Bremen, Germany ; Department of Biotechnology, RWTH Aachen University, Worringer Weg 1, 52074 Aachen, Germany
| | - Ulrich Schwaneberg
- Department of Biotechnology, RWTH Aachen University, Worringer Weg 1, 52074 Aachen, Germany
| | - Danilo Roccatano
- School of Engineering and Science, Jacobs University Bremen, Campus Ring 1, 28759 Bremen, Germany
| |
Collapse
|
93
|
Katiyar A, Smita S, Lenka SK, Rajwanshi R, Chinnusamy V, Bansal KC. Genome-wide classification and expression analysis of MYB transcription factor families in rice and Arabidopsis. BMC Genomics 2012. [PMID: 23050870 DOI: 10.1186/1471-2164-13] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/14/2023] Open
Abstract
BACKGROUND The MYB gene family comprises one of the richest groups of transcription factors in plants. Plant MYB proteins are characterized by a highly conserved MYB DNA-binding domain. MYB proteins are classified into four major groups namely, 1R-MYB, 2R-MYB, 3R-MYB and 4R-MYB based on the number and position of MYB repeats. MYB transcription factors are involved in plant development, secondary metabolism, hormone signal transduction, disease resistance and abiotic stress tolerance. A comparative analysis of MYB family genes in rice and Arabidopsis will help reveal the evolution and function of MYB genes in plants. RESULTS A genome-wide analysis identified at least 155 and 197 MYB genes in rice and Arabidopsis, respectively. Gene structure analysis revealed that MYB family genes possess relatively more number of introns in the middle as compared with C- and N-terminal regions of the predicted genes. Intronless MYB-genes are highly conserved both in rice and Arabidopsis. MYB genes encoding R2R3 repeat MYB proteins retained conserved gene structure with three exons and two introns, whereas genes encoding R1R2R3 repeat containing proteins consist of six exons and five introns. The splicing pattern is similar among R1R2R3 MYB genes in Arabidopsis. In contrast, variation in splicing pattern was observed among R1R2R3 MYB members of rice. Consensus motif analysis of 1kb upstream region (5' to translation initiation codon) of MYB gene ORFs led to the identification of conserved and over-represented cis-motifs in both rice and Arabidopsis. Real-time quantitative RT-PCR analysis showed that several members of MYBs are up-regulated by various abiotic stresses both in rice and Arabidopsis. CONCLUSION A comprehensive genome-wide analysis of chromosomal distribution, tandem repeats and phylogenetic relationship of MYB family genes in rice and Arabidopsis suggested their evolution via duplication. Genome-wide comparative analysis of MYB genes and their expression analysis identified several MYBs with potential role in development and stress response of plants.
Collapse
Affiliation(s)
- Amit Katiyar
- National Research Centre on Plant Biotechnology, Indian Agricultural Research Institute, New Delhi, 110012, India
| | | | | | | | | | | |
Collapse
|
94
|
Yusuf D, Butland SL, Swanson MI, Bolotin E, Ticoll A, Cheung WA, Zhang XYC, Dickman CTD, Fulton DL, Lim JS, Schnabl JM, Ramos OHP, Vasseur-Cognet M, de Leeuw CN, Simpson EM, Ryffel GU, Lam EWF, Kist R, Wilson MSC, Marco-Ferreres R, Brosens JJ, Beccari LL, Bovolenta P, Benayoun BA, Monteiro LJ, Schwenen HDC, Grontved L, Wederell E, Mandrup S, Veitia RA, Chakravarthy H, Hoodless PA, Mancarelli MM, Torbett BE, Banham AH, Reddy SP, Cullum RL, Liedtke M, Tschan MP, Vaz M, Rizzino A, Zannini M, Frietze S, Farnham PJ, Eijkelenboom A, Brown PJ, Laperrière D, Leprince D, de Cristofaro T, Prince KL, Putker M, del Peso L, Camenisch G, Wenger RH, Mikula M, Rozendaal M, Mader S, Ostrowski J, Rhodes SJ, Van Rechem C, Boulay G, Olechnowicz SWZ, Breslin MB, Lan MS, Nanan KK, Wegner M, Hou J, Mullen RD, Colvin SC, Noy PJ, Webb CF, Witek ME, Ferrell S, Daniel JM, Park J, Waldman SA, Peet DJ, Taggart M, Jayaraman PS, Karrich JJ, Blom B, Vesuna F, O'Geen H, Sun Y, Gronostajski RM, Woodcroft MW, Hough MR, Chen E, Europe-Finner GN, Karolczak-Bayatti M, Bailey J, Hankinson O, Raman V, LeBrun DP, Biswal S, Harvey CJ, DeBruyne JP, Hogenesch JB, Hevner RF, Héligon C, Luo XM, Blank MC, Millen KJ, Sharlin DS, Forrest D, Dahlman-Wright K, Zhao C, Mishima Y, Sinha S, Chakrabarti R, Portales-Casamar E, Sladek FM, Bradley PH, Wasserman WW. The transcription factor encyclopedia. Genome Biol 2012; 13:R24. [PMID: 22458515 PMCID: PMC3439975 DOI: 10.1186/gb-2012-13-3-r24] [Citation(s) in RCA: 87] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2012] [Revised: 03/19/2012] [Accepted: 03/29/2012] [Indexed: 12/20/2022] Open
Abstract
Here we present the Transcription Factor Encyclopedia (TFe), a new web-based compendium of mini review articles on transcription factors (TFs) that is founded on the principles of open access and collaboration. Our consortium of over 100 researchers has collectively contributed over 130 mini review articles on pertinent human, mouse and rat TFs. Notable features of the TFe website include a high-quality PDF generator and web API for programmatic data retrieval. TFe aims to rapidly educate scientists about the TFs they encounter through the delivery of succinct summaries written and vetted by experts in the field. TFe is available at http://www.cisreg.ca/tfe.
Collapse
Affiliation(s)
- Dimas Yusuf
- Department of Medical Genetics, Faculty of Medicine, Centre for Molecular Medicine and Therapeutics, Child and Family Research Institute, University of British Columbia, Vancouver, Canada
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
95
|
Construction of "small-intelligent" focused mutagenesis libraries using well-designed combinatorial degenerate primers. Biotechniques 2012; 52:149-58. [PMID: 22401547 DOI: 10.2144/000113820] [Citation(s) in RCA: 114] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2011] [Accepted: 01/13/2012] [Indexed: 11/23/2022] Open
Abstract
Site-saturation mutagenesis is a powerful tool for protein optimization due to its efficiency and simplicity. A degenerate codon NNN or NNS (K) is often used to encode the 20 standard amino acids, but this will produce redundant codons and cause uneven distribution of amino acids in the constructed library. Here we present a novel "small-intelligent" strategy to construct mutagenesis libraries that have a minimal gene library size without inherent amino acid biases, stop codons, or rare codons of Escherichia coli by coupling well-designed combinatorial degenerate primers with suitable PCR-based mutagenesis methods. The designed primer mixture contains exactly one codon per amino acid and thus allows the construction of small-intelligent mutagenesis libraries with one gene per protein. In addition, the software tool DC-Analyzer was developed to assist in primer design according to the user-defined randomization scheme for library construction. This small-intelligent strategy was successfully applied to the randomization of halohydrin dehalogenases with one or two randomized sites. With the help of DC-Analyzer, the strategy was proven to be as simple as NNS randomization and could serve as a general tool to efficiently randomize target genes at positions of interest.
Collapse
|
96
|
Leucine-Rich Repeat (LRR) Domains Containing Intervening Motifs in Plants. Biomolecules 2012; 2:288-311. [PMID: 24970139 PMCID: PMC4030839 DOI: 10.3390/biom2020288] [Citation(s) in RCA: 54] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2012] [Revised: 06/13/2012] [Accepted: 06/13/2012] [Indexed: 01/05/2023] Open
Abstract
LRRs (leucine rich repeats) are present in over 14,000 proteins. Non-LRR, island regions (IRs) interrupting LRRs are widely distributed. The present article reviews 19 families of LRR proteins having non-LRR IRs (LRR@IR proteins) from various plant species. The LRR@IR proteins are LRR-containing receptor-like kinases (LRR-RLKs), LRR-containing receptor-like proteins (LRR-RLPs), TONSOKU/BRUSHY1, and MJK13.7; the LRR-RLKs are homologs of TMK1/Rhg4, BRI1, PSKR, PSYR1, Arabidopsis At1g74360, and RPK2, while the LRR-RLPs are those of Cf-9/Cf-4, Cf-2/Cf-5, Ve, HcrVf, RPP27, EIX1, clavata 2, fascinated ear2, RLP2, rice Os10g0479700, and putative soybean disease resistance protein. The LRRs are intersected by single, non-LRR IRs; only the RPK2 homologs have two IRs. In most of the LRR-RLKs and LRR-RLPs, the number of repeat units in the preceding LRR block (N1) is greater than the number of the following block (N2); N1 » N2 in which N1 is variable in the homologs of individual families, while N2 is highly conserved. The five families of the LRR-RLKs except for the RPK2 family show N1 = 8 − 18 and N2 = 3 − 5. The nine families of the LRR-RLPs show N1 = 12 − 33 and N2 = 4; while N1 = 6 and N2 = 4 for the rice Os10g0479700 family and the N1 = 4 − 28 and N2 = 4 for the soybean protein family. The rule of N1 » N2 might play a common, significant role in ligand interaction, dimerization, and/or signal transduction of the LRR-RLKs and the LRR-RLPs. The structure and evolution of the LRR domains with non-LRR IRs and their proteins are also discussed.
Collapse
|
97
|
Büchel K, McDowell E, Nelson W, Descour A, Gershenzon J, Hilker M, Soderlund C, Gang DR, Fenning T, Meiners T. An elm EST database for identifying leaf beetle egg-induced defense genes. BMC Genomics 2012; 13:242. [PMID: 22702658 PMCID: PMC3439254 DOI: 10.1186/1471-2164-13-242] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2011] [Accepted: 05/15/2012] [Indexed: 01/07/2023] Open
Abstract
Background Plants can defend themselves against herbivorous insects prior to the onset of larval feeding by responding to the eggs laid on their leaves. In the European field elm (Ulmus minor), egg laying by the elm leaf beetle ( Xanthogaleruca luteola) activates the emission of volatiles that attract specialised egg parasitoids, which in turn kill the eggs. Little is known about the transcriptional changes that insect eggs trigger in plants and how such indirect defense mechanisms are orchestrated in the context of other biological processes. Results Here we present the first large scale study of egg-induced changes in the transcriptional profile of a tree. Five cDNA libraries were generated from leaves of (i) untreated control elms, and elms treated with (ii) egg laying and feeding by elm leaf beetles, (iii) feeding, (iv) artificial transfer of egg clutches, and (v) methyl jasmonate. A total of 361,196 ESTs expressed sequence tags (ESTs) were identified which clustered into 52,823 unique transcripts (Unitrans) and were stored in a database with a public web interface. Among the analyzed Unitrans, 73% could be annotated by homology to known genes in the UniProt (Plant) database, particularly to those from Vitis, Ricinus, Populus and Arabidopsis. Comparative in silico analysis among the different treatments revealed differences in Gene Ontology term abundances. Defense- and stress-related gene transcripts were present in high abundance in leaves after herbivore egg laying, but transcripts involved in photosynthesis showed decreased abundance. Many pathogen-related genes and genes involved in phytohormone signaling were expressed, indicative of jasmonic acid biosynthesis and activation of jasmonic acid responsive genes. Cross-comparisons between different libraries based on expression profiles allowed the identification of genes with a potential relevance in egg-induced defenses, as well as other biological processes, including signal transduction, transport and primary metabolism. Conclusion Here we present a dataset for a large-scale study of the mechanisms of plant defense against insect eggs in a co-evolved, natural ecological plant–insect system. The EST database analysis provided here is a first step in elucidating the transcriptional responses of elm to elm leaf beetle infestation, and adds further to our knowledge on insect egg-induced transcriptomic changes in plants. The sequences identified in our comparative analysis give many hints about novel defense mechanisms directed towards eggs.
Collapse
Affiliation(s)
- Kerstin Büchel
- Freie Universität Berlin, Applied Zoology / Animal Ecology, Berlin, Germany
| | | | | | | | | | | | | | | | | | | |
Collapse
|
98
|
McGuire AM, Weiner B, Park ST, Wapinski I, Raman S, Dolganov G, Peterson M, Riley R, Zucker J, Abeel T, White J, Sisk P, Stolte C, Koehrsen M, Yamamoto RT, Iacobelli-Martinez M, Kidd MJ, Maer AM, Schoolnik GK, Regev A, Galagan J. Comparative analysis of Mycobacterium and related Actinomycetes yields insight into the evolution of Mycobacterium tuberculosis pathogenesis. BMC Genomics 2012; 13:120. [PMID: 22452820 PMCID: PMC3388012 DOI: 10.1186/1471-2164-13-120] [Citation(s) in RCA: 69] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2011] [Accepted: 03/28/2012] [Indexed: 01/24/2023] Open
Abstract
BACKGROUND The sequence of the pathogen Mycobacterium tuberculosis (Mtb) strain H37Rv has been available for over a decade, but the biology of the pathogen remains poorly understood. Genome sequences from other Mtb strains and closely related bacteria present an opportunity to apply the power of comparative genomics to understand the evolution of Mtb pathogenesis. We conducted a comparative analysis using 31 genomes from the Tuberculosis Database (TBDB.org), including 8 strains of Mtb and M. bovis, 11 additional Mycobacteria, 4 Corynebacteria, 2 Streptomyces, Rhodococcus jostii RHA1, Nocardia farcinia, Acidothermus cellulolyticus, Rhodobacter sphaeroides, Propionibacterium acnes, and Bifidobacterium longum. RESULTS Our results highlight the functional importance of lipid metabolism and its regulation, and reveal variation between the evolutionary profiles of genes implicated in saturated and unsaturated fatty acid metabolism. It also suggests that DNA repair and molybdopterin cofactors are important in pathogenic Mycobacteria. By analyzing sequence conservation and gene expression data, we identify nearly 400 conserved noncoding regions. These include 37 predicted promoter regulatory motifs, of which 14 correspond to previously validated motifs, as well as 50 potential noncoding RNAs, of which we experimentally confirm the expression of four. CONCLUSIONS Our analysis of protein evolution highlights gene families that are associated with the adaptation of environmental Mycobacteria to obligate pathogenesis. These families include fatty acid metabolism, DNA repair, and molybdopterin biosynthesis. Our analysis reinforces recent findings suggesting that small noncoding RNAs are more common in Mycobacteria than previously expected. Our data provide a foundation for understanding the genome and biology of Mtb in a comparative context, and are available online and through TBDB.org.
Collapse
|
99
|
Szklarczyk R, Wanschers BF, Cuypers TD, Esseling JJ, Riemersma M, van den Brand MA, Gloerich J, Lasonder E, van den Heuvel LP, Nijtmans LG, Huynen MA. Iterative orthology prediction uncovers new mitochondrial proteins and identifies C12orf62 as the human ortholog of COX14, a protein involved in the assembly of cytochrome c oxidase. Genome Biol 2012; 13:R12. [PMID: 22356826 PMCID: PMC3334569 DOI: 10.1186/gb-2012-13-2-r12] [Citation(s) in RCA: 80] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2011] [Revised: 02/03/2012] [Accepted: 02/22/2012] [Indexed: 11/10/2022] Open
Abstract
Background Orthology is a central tenet of comparative genomics and ortholog identification is instrumental to protein function prediction. Major advances have been made to determine orthology relations among a set of homologous proteins. However, they depend on the comparison of individual sequences and do not take into account divergent orthologs. Results We have developed an iterative orthology prediction method, Ortho-Profile, that uses reciprocal best hits at the level of sequence profiles to infer orthology. It increases ortholog detection by 20% compared to sequence-to-sequence comparisons. Ortho-Profile predicts 598 human orthologs of mitochondrial proteins from Saccharomyces cerevisiae and Schizosaccharomyces pombe with 94% accuracy. Of these, 181 were not known to localize to mitochondria in mammals. Among the predictions of the Ortho-Profile method are 11 human cytochrome c oxidase (COX) assembly proteins that are implicated in mitochondrial function and disease. Their co-expression patterns, experimentally verified subcellular localization, and co-purification with human COX-associated proteins support these predictions. For the human gene C12orf62, the ortholog of S. cerevisiae COX14, we specifically confirm its role in negative regulation of the translation of cytochrome c oxidase. Conclusions Divergent homologs can often only be detected by comparing sequence profiles and profile-based hidden Markov models. The Ortho-Profile method takes advantage of these techniques in the quest for orthologs.
Collapse
Affiliation(s)
- Radek Szklarczyk
- Centre for Molecular and Biomolecular Informatics, Radboud University Nijmegen Medical Centre, Nijmegen, 6500 HB, The Netherlands.
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
100
|
When second best is good enough: another probabilistic look at saturation mutagenesis. Appl Environ Microbiol 2011; 78:258-62. [PMID: 22038607 DOI: 10.1128/aem.06265-11] [Citation(s) in RCA: 89] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023] Open
Abstract
We developed new criteria for determining the library size in a saturation mutagenesis experiment. When the number of all possible distinct variants is large, any of the top-performing variants (e.g., any of the top three) is likely to meet the design requirements, so the probability that the library contains at least one of them is a sensible criterion for determining the library size. By using a criterion of this type, one may significantly reduce the library size and thus save costs and labor while minimally compromising the quality of the best variant discovered. We present the probabilistic tools underlying these criteria and use them to compare the efficiencies of four randomization schemes: NNN, which uses all 64 codons; NNB, which uses 48 codons; NNK, which uses 32 codons; and MAX, which assigns equal probabilities to each of the 20 amino acids. MAX was found to be the most efficient randomization scheme and NNN the least efficient. TopLib, a computer program for carrying out the related calculations, is available through a user-friendly Web server.
Collapse
|