1
|
Klukowski P, Riek R, Güntert P. Time-optimized protein NMR assignment with an integrative deep learning approach using AlphaFold and chemical shift prediction. SCIENCE ADVANCES 2023; 9:eadi9323. [PMID: 37992167 PMCID: PMC10664993 DOI: 10.1126/sciadv.adi9323] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/29/2023] [Accepted: 10/20/2023] [Indexed: 11/24/2023]
Abstract
Chemical shift assignment is vital for nuclear magnetic resonance (NMR)-based studies of protein structures, dynamics, and interactions, providing crucial atomic-level insight. However, obtaining chemical shift assignments is labor intensive and requires extensive measurement time. To address this limitation, we previously proposed ARTINA, a deep learning method for automatic assignment of two-dimensional (2D)-4D NMR spectra. Here, we present an integrative approach that combines ARTINA with AlphaFold and UCBShift, enabling chemical shift assignment with reduced experimental data, increased accuracy, and enhanced robustness for larger systems, as presented in a comprehensive study with more than 5000 automated assignment calculations on 89 proteins. We demonstrate that five 3D spectra yield more accurate assignments (92.59%) than pure ARTINA runs using all experimentally available NMR data (on average 10 3D spectra per protein, 91.37%), considerably reducing the required measurement time. We also showcase automated assignments of only 15N-labeled samples, and report improved assignment accuracy in larger synthetic systems of up to 500 residues.
Collapse
Affiliation(s)
- Piotr Klukowski
- Institute of Molecular Physical Science, ETH Zurich, Vladimir-Prelog-Weg 2, 8093 Zurich, Switzerland
| | - Roland Riek
- Institute of Molecular Physical Science, ETH Zurich, Vladimir-Prelog-Weg 2, 8093 Zurich, Switzerland
| | - Peter Güntert
- Institute of Molecular Physical Science, ETH Zurich, Vladimir-Prelog-Weg 2, 8093 Zurich, Switzerland
- Institute of Biophysical Chemistry, Goethe University Frankfurt, Max-von-Laue-Str. 9, 60438 Frankfurt am Main, Germany
- Department of Chemistry, Tokyo Metropolitan University, 1-1 Minami-Osawa, Hachioji, 192-0397 Tokyo, Japan
| |
Collapse
|
2
|
NMR Structure Determinations of Small Proteins Using only One Fractionally 20% 13C- and Uniformly 100% 15N-Labeled Sample. Molecules 2021; 26:molecules26030747. [PMID: 33535444 PMCID: PMC7867066 DOI: 10.3390/molecules26030747] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2021] [Revised: 01/26/2021] [Accepted: 01/27/2021] [Indexed: 11/17/2022] Open
Abstract
Uniformly 13C- and 15N-labeled samples ensure fast and reliable nuclear magnetic resonance (NMR) assignments of proteins and are commonly used for structure elucidation by NMR. However, the preparation of uniformly labeled samples is a labor-intensive and expensive step. Reducing the portion of 13C-labeled glucose by a factor of five using a fractional 20% 13C- and 100% 15N-labeling scheme could lower the total chemical costs, yet retaining sufficient structural information of uniformly [13C, 15N]-labeled sample as a result of the improved sensitivity of NMR instruments. Moreover, fractional 13C-labeling can facilitate reliable resonance assignments of sidechains because of the biosynthetic pathways of each amino-acid. Preparation of only one [20% 13C, 100% 15N]-labeled sample for small proteins (<15 kDa) could also eliminate redundant sample preparations of 100% 15N-labeled and uniformly 100% [13C, 15N]-labeled samples of proteins. We determined the NMR structures of a small alpha-helical protein, the C domain of IgG-binding protein A from Staphylococcus aureus (SpaC), and a small beta-sheet protein, CBM64 module using [20% 13C, 100% 15N]-labeled sample and compared with the crystal structures and the NMR structures derived from the 100% [13C, 15N]-labeled sample. Our results suggest that one [20% 13C, 100% 15N]-labeled sample of small proteins could be routinely used as an alternative to conventional 100% [13C, 15N]-labeling for backbone resonance assignments, NMR structure determination, 15N-relaxation analysis, and ligand–protein interaction.
Collapse
|
3
|
Monneau YR, Rossi P, Bhaumik A, Huang C, Jiang Y, Saleh T, Xie T, Xing Q, Kalodimos CG. Automatic methyl assignment in large proteins by the MAGIC algorithm. JOURNAL OF BIOMOLECULAR NMR 2017; 69:215-227. [PMID: 29098507 PMCID: PMC5764113 DOI: 10.1007/s10858-017-0149-y] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/14/2017] [Accepted: 10/23/2017] [Indexed: 05/03/2023]
Abstract
Selective methyl labeling is an extremely powerful approach to study the structure, dynamics and function of biomolecules by NMR. Despite spectacular progress in the field, such studies remain rather limited in number. One of the main obstacles remains the assignment of the methyl resonances, which is labor intensive and error prone. Typically, NOESY crosspeak patterns are manually correlated to the available crystal structure or an in silico template model of the protein. Here, we propose methyl assignment by graphing inference construct, an exhaustive search algorithm with no peak network definition requirement. In order to overcome the combinatorial problem, the exhaustive search is performed locally, i.e. for a small number of methyls connected through-space according to experimental 3D methyl NOESY data. The local network approach drastically reduces the search space. Only the best local assignments are combined to provide the final output. Assignments that match the data with comparable scores are made available to the user for cross-validation by additional experiments such as methyl-amide NOEs. Several NMR datasets for proteins in the 25-50 kDa range were used during development and for performance evaluation against the manually assigned data. We show that the algorithm is robust, reliable and greatly speeds up the methyl assignment task.
Collapse
Affiliation(s)
- Yoan R Monneau
- Université Grenoble Alpes, CEA, CNRS, IBS, 38000, Grenoble, France
| | - Paolo Rossi
- Deparment of Structural Biology, St. Jude Children's Research Hospital, Memphis, TN, 38105, USA.
| | - Anusarka Bhaumik
- Deparment of Structural Biology, St. Jude Children's Research Hospital, Memphis, TN, 38105, USA
| | - Chengdong Huang
- Deparment of Structural Biology, St. Jude Children's Research Hospital, Memphis, TN, 38105, USA
| | - Yajun Jiang
- Deparment of Structural Biology, St. Jude Children's Research Hospital, Memphis, TN, 38105, USA
| | - Tamjeed Saleh
- Deparment of Structural Biology, St. Jude Children's Research Hospital, Memphis, TN, 38105, USA
| | - Tao Xie
- Deparment of Structural Biology, St. Jude Children's Research Hospital, Memphis, TN, 38105, USA
| | - Qiong Xing
- Deparment of Structural Biology, St. Jude Children's Research Hospital, Memphis, TN, 38105, USA
| | - Charalampos G Kalodimos
- Deparment of Structural Biology, St. Jude Children's Research Hospital, Memphis, TN, 38105, USA.
| |
Collapse
|
4
|
Trautwein M, Fredriksson K, Möller HM, Exner TE. Automated assignment of NMR chemical shifts based on a known structure and 4D spectra. JOURNAL OF BIOMOLECULAR NMR 2016; 65:217-236. [PMID: 27484442 DOI: 10.1007/s10858-016-0050-0] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/08/2016] [Accepted: 07/28/2016] [Indexed: 06/06/2023]
Abstract
Apart from their central role during 3D structure determination of proteins the backbone chemical shift assignment is the basis for a number of applications, like chemical shift perturbation mapping and studies on the dynamics of proteins. This assignment is not a trivial task even if a 3D protein structure is known and needs almost as much effort as the assignment for structure prediction if performed manually. We present here a new algorithm based solely on 4D [(1)H,(15)N]-HSQC-NOESY-[(1)H,(15)N]-HSQC spectra which is able to assign a large percentage of chemical shifts (73-82 %) unambiguously, demonstrated with proteins up to a size of 250 residues. For the remaining residues, a small number of possible assignments is filtered out. This is done by comparing distances in the 3D structure to restraints obtained from the peak volumes in the 4D spectrum. Using dead-end elimination, assignments are removed in which at least one of the restraints is violated. Including additional information from chemical shift predictions, a complete unambiguous assignment was obtained for Ubiquitin and 95 % of the residues were correctly assigned in the 251 residue-long N-terminal domain of enzyme I. The program including source code is available at https://github.com/thomasexner/4Dassign .
Collapse
Affiliation(s)
- Matthias Trautwein
- Institute of Pharmacy, Eberhard Karls Universität Tübingen, Auf der Morgenstelle 8, 72076, Tübingen, Germany
| | - Kai Fredriksson
- Institute of Pharmacy, Eberhard Karls Universität Tübingen, Auf der Morgenstelle 8, 72076, Tübingen, Germany
| | - Heiko M Möller
- Institute of Chemistry, University of Potsdam, Karl-Liebknecht-Str. 24-25, 14476, Potsdam OT Golm, Germany
| | - Thomas E Exner
- Institute of Pharmacy, Eberhard Karls Universität Tübingen, Auf der Morgenstelle 8, 72076, Tübingen, Germany.
| |
Collapse
|
5
|
Xiao Y, Warner LR, Latham MP, Ahn NG, Pardi A. Structure-Based Assignment of Ile, Leu, and Val Methyl Groups in the Active and Inactive Forms of the Mitogen-Activated Protein Kinase Extracellular Signal-Regulated Kinase 2. Biochemistry 2015; 54:4307-19. [PMID: 26132046 DOI: 10.1021/acs.biochem.5b00506] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
Resonance assignments are the first step in most NMR studies of protein structure, function, and dynamics. Standard protein assignment methods employ through-bond backbone experiments on uniformly (13)C/(15)N-labeled proteins. For larger proteins, this through-bond assignment procedure often breaks down due to rapid relaxation and spectral overlap. The challenges involved in studies of larger proteins led to efficient methods for (13)C labeling of side chain methyl groups, which have favorable relaxation properties and high signal-to-noise. These methyls are often still assigned by linking them to the previously assigned backbone, thus limiting the applications for larger proteins. Here, a structure-based procedure is described for assignment of (13)C(1)H3-labeled methyls by comparing distance information obtained from three-dimensional methyl-methyl nuclear Overhauser effect (NOE) spectroscopy with the X-ray structure. The Ile, Leu, or Val (ILV) methyl type is determined by through-bond experiments, and the methyl-methyl NOE data are analyzed in combination with the known structure. A hierarchical approach was employed that maps the largest observed "NOE-methyl cluster" onto the structure. The combination of identification of ILV methyl type with mapping of the NOE-methyl clusters greatly simplifies the assignment process. This method was applied to the inactive and active forms of the 42-kDa ILV (13)C(1)H3-methyl labeled extracellular signal-regulated kinase 2 (ERK2), leading to assignment of 60% of the methyls, including 90% of Ile residues. A series of ILV to Ala mutants were analyzed, which helped confirm the assignments. These assignments were used to probe the local and long-range effects of ligand binding to inactive and active ERK2.
Collapse
Affiliation(s)
- Yao Xiao
- †Department of Chemistry and Biochemistry and ‡BioFrontiers Institute, University of Colorado Boulder, Boulder, Colorado 80309, United States
| | - Lisa R Warner
- †Department of Chemistry and Biochemistry and ‡BioFrontiers Institute, University of Colorado Boulder, Boulder, Colorado 80309, United States
| | - Michael P Latham
- †Department of Chemistry and Biochemistry and ‡BioFrontiers Institute, University of Colorado Boulder, Boulder, Colorado 80309, United States
| | - Natalie G Ahn
- †Department of Chemistry and Biochemistry and ‡BioFrontiers Institute, University of Colorado Boulder, Boulder, Colorado 80309, United States
| | - Arthur Pardi
- †Department of Chemistry and Biochemistry and ‡BioFrontiers Institute, University of Colorado Boulder, Boulder, Colorado 80309, United States
| |
Collapse
|
6
|
Li DW, Meng D, Brüschweiler R. Reliable resonance assignments of selected residues of proteins with known structure based on empirical NMR chemical shift prediction. JOURNAL OF MAGNETIC RESONANCE (SAN DIEGO, CALIF. : 1997) 2015; 254:93-97. [PMID: 25863893 PMCID: PMC4467894 DOI: 10.1016/j.jmr.2015.02.013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/13/2014] [Revised: 02/12/2015] [Accepted: 02/14/2015] [Indexed: 06/04/2023]
Abstract
A robust NMR resonance assignment method is introduced for proteins whose 3D structure has previously been determined by X-ray crystallography. The goal of the method is to obtain a subset of correct assignments from a parsimonious set of 3D NMR experiments of (15)N, (13)C labeled proteins. Chemical shifts of sequential residue pairs are predicted from static protein structures using PPM_One, which are then compared with the corresponding experimental shifts. Globally optimized weighted matching identifies the assignments that are robust with respect to small changes in NMR cross-peak positions. The method, termed PASSPORT, is demonstrated for 4 proteins with 100-250 amino acids using 3D NHCA and a 3D CBCA(CO)NH experiments as input producing correct assignments with high reliability for 22% of the residues. The method, which works best for Gly, Ala, Ser, and Thr residues, provides assignments that serve as anchor points for additional assignments by both manual and semi-automated methods or they can be directly used for further studies, e.g. on ligand binding, protein dynamics, or post-translational modification, such as phosphorylation.
Collapse
Affiliation(s)
- Da-Wei Li
- Campus Chemical Instrumentation Center, The Ohio State University, Columbus, OH 43210, United States; Department of Chemistry & Biochemistry, The Ohio State University, Columbus, OH 43210, United States; Chemical Sciences Laboratory, Department of Chemistry and Biochemistry and National High Magnetic Field Laboratory, Florida State University, Tallahassee, FL 32306, United States
| | - Dan Meng
- Chemical Sciences Laboratory, Department of Chemistry and Biochemistry and National High Magnetic Field Laboratory, Florida State University, Tallahassee, FL 32306, United States
| | - Rafael Brüschweiler
- Campus Chemical Instrumentation Center, The Ohio State University, Columbus, OH 43210, United States; Department of Chemistry & Biochemistry, The Ohio State University, Columbus, OH 43210, United States; Chemical Sciences Laboratory, Department of Chemistry and Biochemistry and National High Magnetic Field Laboratory, Florida State University, Tallahassee, FL 32306, United States.
| |
Collapse
|
7
|
MacRaild CA, Norton RS. RASP: rapid and robust backbone chemical shift assignments from protein structure. JOURNAL OF BIOMOLECULAR NMR 2014; 58:155-63. [PMID: 24445369 DOI: 10.1007/s10858-014-9813-7] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/09/2014] [Accepted: 01/15/2014] [Indexed: 05/05/2023]
Abstract
Chemical shift prediction has an unappreciated power to guide backbone resonance assignment in cases where protein structure is known. Here we describe Resonance Assignment by chemical Shift Prediction (RASP), a method that exploits this power to derive protein backbone resonance assignments from chemical shift predictions. Robust assignments can be obtained from a minimal set of only the most sensitive triple-resonance experiments, even for spectroscopically challenging proteins. Over a test set of 154 proteins RASP assigns 88 % of residues with an accuracy of 99.7 %, using only information available from HNCO and HNCA spectra. Applied to experimental data from a challenging 34 kDa protein, RASP assigns 90 % of manually assigned residues using only 40 % of the experimental data required for the manual assignment. RASP has the potential to significantly accelerate the backbone assignment process for a wide range of proteins for which structural information is available, including those for which conventional assignment strategies are not feasible.
Collapse
Affiliation(s)
- Christopher A MacRaild
- Medicinal Chemistry, Monash Institute of Pharmaceutical Sciences, Monash University, 381 Royal Parade, Parkville, 3052, Australia,
| | | |
Collapse
|
8
|
Cavuşlar G, Çatay B, Apaydın MS. A tabu search approach for the NMR protein structure-based assignment problem. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2012; 9:1621-1628. [PMID: 23221084 DOI: 10.1109/tcbb.2012.122] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/01/2023]
Abstract
Spectroscopy is an experimental technique which exploits the magnetic properties of specific nuclei and enables the study of proteins in solution. The key bottleneck of NMR studies is to map the NMR peaks to corresponding nuclei, also known as the assignment problem. Structure-Based Assignment (SBA) is an approach to solve this computationally challenging problem by using prior information about the protein obtained from a homologous structure. NVR-BIP used the Nuclear Vector Replacement (NVR) framework to model SBA as a binary integer programming problem. In this paper, we prove that this problem is NP-hard and propose a tabu search (TS) algorithm (NVR-TS) equipped with a guided perturbation mechanism to efficiently solve it. NVR-TS uses a quadratic penalty relaxation of NVR-BIP where the violations in the Nuclear Overhauser Effect constraints are penalized in the objective function. Experimental results indicate that our algorithm finds the optimal solution on NVRBIP’s data set which consists of seven proteins with 25 templates (31 to 126 residues). Furthermore, it achieves relatively high assignment accuracies on two additional large proteins, MBP and EIN (348 and 243 residues, respectively), which NVR-BIP failed to solve. The executable and the input files are available for download at http://people.sabanciuniv.edu/catay/NVR-TS/NVR-TS.html.
Collapse
Affiliation(s)
- Gizem Cavuşlar
- University of Wisconsin-Madison, 1513 University Avenue, Madison, WI 53706, USA.
| | | | | |
Collapse
|
9
|
Schmidt E, Güntert P. A new algorithm for reliable and general NMR resonance assignment. J Am Chem Soc 2012; 134:12817-29. [PMID: 22794163 DOI: 10.1021/ja305091n] [Citation(s) in RCA: 133] [Impact Index Per Article: 11.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
The new FLYA automated resonance assignment algorithm determines NMR chemical shift assignments on the basis of peak lists from any combination of multidimensional through-bond or through-space NMR experiments for proteins. Backbone and side-chain assignments can be determined. All experimental data are used simultaneously, thereby exploiting optimally the redundancy present in the input peak lists and circumventing potential pitfalls of assignment strategies in which results obtained in a given step remain fixed input data for subsequent steps. Instead of prescribing a specific assignment strategy, the FLYA resonance assignment algorithm requires only experimental peak lists and the primary structure of the protein, from which the peaks expected in a given spectrum can be generated by applying a set of rules, defined in a straightforward way by specifying through-bond or through-space magnetization transfer pathways. The algorithm determines the resonance assignment by finding an optimal mapping between the set of expected peaks that are assigned by definition but have unknown positions and the set of measured peaks in the input peak lists that are initially unassigned but have a known position in the spectrum. Using peak lists obtained by purely automated peak picking from the experimental spectra of three proteins, FLYA assigned correctly 96-99% of the backbone and 90-91% of all resonances that could be assigned manually. Systematic studies quantified the impact of various factors on the assignment accuracy, namely the extent of missing real peaks and the amount of additional artifact peaks in the input peak lists, as well as the accuracy of the peak positions. Comparing the resonance assignments from FLYA with those obtained from two other existing algorithms showed that using identical experimental input data these other algorithms yielded significantly (40-142%) more erroneous assignments than FLYA. The FLYA resonance assignment algorithm thus has the reliability and flexibility to replace most manual and semi-automatic assignment procedures for NMR studies of proteins.
Collapse
Affiliation(s)
- Elena Schmidt
- Institute of Biophysical Chemistry, Center for Biomolecular Magnetic Resonance, Goethe University Frankfurt am Main, Frankfurt am Main, Germany
| | | |
Collapse
|
10
|
Bahrami A, Clos LJ, Markley JL, Butcher SE, Eghbalnia HR. RNA-PAIRS: RNA probabilistic assignment of imino resonance shifts. JOURNAL OF BIOMOLECULAR NMR 2012; 52:289-302. [PMID: 22359049 PMCID: PMC3480180 DOI: 10.1007/s10858-012-9603-z] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/06/2011] [Accepted: 01/08/2012] [Indexed: 05/13/2023]
Abstract
The significant biological role of RNA has further highlighted the need for improving the accuracy, efficiency and the reach of methods for investigating RNA structure and function. Nuclear magnetic resonance (NMR) spectroscopy is vital to furthering the goals of RNA structural biology because of its distinctive capabilities. However, the dispersion pattern in the NMR spectra of RNA makes automated resonance assignment, a key step in NMR investigation of biomolecules, remarkably challenging. Herein we present RNA Probabilistic Assignment of Imino Resonance Shifts (RNA-PAIRS), a method for the automated assignment of RNA imino resonances with synchronized verification and correction of predicted secondary structure. RNA-PAIRS represents an advance in modeling the assignment paradigm because it seeds the probabilistic network for assignment with experimental NMR data, and predicted RNA secondary structure, simultaneously and from the start. Subsequently, RNA-PAIRS sets in motion a dynamic network that reverberates between predictions and experimental evidence in order to reconcile and rectify resonance assignments and secondary structure information. The procedure is halted when assignments and base-parings are deemed to be most consistent with observed crosspeaks. The current implementation of RNA-PAIRS uses an initial peak list derived from proton-nitrogen heteronuclear multiple quantum correlation ((1)H-(15)N 2D HMQC) and proton-proton nuclear Overhauser enhancement spectroscopy ((1)H-(1)H 2D NOESY) experiments. We have evaluated the performance of RNA-PAIRS by using it to analyze NMR datasets from 26 previously studied RNAs, including a 111-nucleotide complex. For moderately sized RNA molecules, and over a range of comparatively complex structural motifs, the average assignment accuracy exceeds 90%, while the average base pair prediction accuracy exceeded 93%. RNA-PAIRS yielded accurate assignments and base pairings consistent with imino resonances for a majority of the NMR resonances, even when the initial predictions are only modestly accurate. RNA-PAIRS is available as a public web-server at http://pine.nmrfam.wisc.edu/RNA/.
Collapse
Affiliation(s)
- Arash Bahrami
- National Magnetic Resonance Facility at Madison, Madison, WI, USA
| | - Lawrence J. Clos
- National Magnetic Resonance Facility at Madison, Madison, WI, USA
| | - John L. Markley
- National Magnetic Resonance Facility at Madison, Madison, WI, USA. Biochemistry Department, University of Wisconsin-Madison, Madison, WI 53706, USA
| | - Samuel E. Butcher
- National Magnetic Resonance Facility at Madison, Madison, WI, USA. Biochemistry Department, University of Wisconsin-Madison, Madison, WI 53706, USA
| | - Hamid R. Eghbalnia
- Department of Molecular and Cellular Physiology, University of Cincinnati, P.O. Box 670576, Cincinnati, OH 45267-0576, USA
| |
Collapse
|
11
|
Jang R, Gao X, Li M. Combining automated peak tracking in SAR by NMR with structure-based backbone assignment from 15N-NOESY. BMC Bioinformatics 2012; 13 Suppl 3:S4. [PMID: 22536902 PMCID: PMC3402924 DOI: 10.1186/1471-2105-13-s3-s4] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
Background Chemical shift mapping is an important technique in NMR-based drug screening for identifying the atoms of a target protein that potentially bind to a drug molecule upon the molecule's introduction in increasing concentrations. The goal is to obtain a mapping of peaks with known residue assignment from the reference spectrum of the unbound protein to peaks with unknown assignment in the target spectrum of the bound protein. Although a series of perturbed spectra help to trace a path from reference peaks to target peaks, a one-to-one mapping generally is not possible, especially for large proteins, due to errors, such as noise peaks, missing peaks, missing but then reappearing, overlapped, and new peaks not associated with any peaks in the reference. Due to these difficulties, the mapping is typically done manually or semi-automatically, which is not efficient for high-throughput drug screening. Results We present PeakWalker, a novel peak walking algorithm for fast-exchange systems that models the errors explicitly and performs many-to-one mapping. On the proteins: hBclXL, UbcH5B, and histone H1, it achieves an average accuracy of over 95% with less than 1.5 residues predicted per target peak. Given these mappings as input, we present PeakAssigner, a novel combined structure-based backbone resonance and NOE assignment algorithm that uses just 15N-NOESY, while avoiding TOCSY experiments and 13C-labeling, to resolve the ambiguities for a one-to-one mapping. On the three proteins, it achieves an average accuracy of 94% or better. Conclusions Our mathematical programming approach for modeling chemical shift mapping as a graph problem, while modeling the errors directly, is potentially a time- and cost-effective first step for high-throughput drug screening based on limited NMR data and homologous 3D structures.
Collapse
Affiliation(s)
- Richard Jang
- David R Cheriton School of Computer Science, University of Waterloo, Waterloo, Ontario, N2L 3G1, Canada
| | | | | |
Collapse
|
12
|
Jang R, Gao X, Li M. Towards fully automated structure-based NMR resonance assignment of ¹⁵N-labeled proteins from automatically picked peaks. J Comput Biol 2011; 18:347-63. [PMID: 21385039 DOI: 10.1089/cmb.2010.0251] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
In NMR resonance assignment, an indispensable step in NMR protein studies, manually processed peaks from both N-labeled and C-labeled spectra are typically used as inputs. However, the use of homologous structures can allow one to use only N-labeled NMR data and avoid the added expense of using C-labeled data. We propose a novel integer programming framework for structure-based backbone resonance assignment using N-labeled data. The core consists of a pair of integer programming models: one for spin system forming and amino acid typing, and the other for backbone resonance assignment. The goal is to perform the assignment directly from spectra without any manual intervention via automatically picked peaks, which are much noisier than manually picked peaks, so methods must be error-tolerant. In the case of semi-automated/manually processed peak data, we compare our system with the Xiong-Pandurangan-Bailey-Kellogg's contact replacement (CR) method, which is the most error-tolerant method for structure-based resonance assignment. Our system, on average, reduces the error rate of the CR method by five folds on their data set. In addition, by using an iterative algorithm, our system has the added capability of using the NOESY data to correct assignment errors due to errors in predicting the amino acid and secondary structure type of each spin system. On a publicly available data set for human ubiquitin, where the typing accuracy is 83%, we achieve 91% accuracy, compared to the 59% accuracy obtained without correcting for such errors. In the case of automatically picked peaks, using assignment information from yeast ubiquitin, we achieve a fully automatic assignment with 97% accuracy. To our knowledge, this is the first system that can achieve fully automatic structure-based assignment directly from spectra. This has implications in NMR protein mutant studies, where the assignment step is repeated for each mutant.
Collapse
Affiliation(s)
- Richard Jang
- David R. Cheriton School of Computer Science, University of Waterloo, Waterloo, Ontario, Canada
| | | | | |
Collapse
|
13
|
Abstract
Around half of all protein structures solved nowadays using solution-state nuclear magnetic resonance (NMR) spectroscopy have been because of automated data analysis. The pervasiveness of computational approaches in general hides, however, a more nuanced view in which the full variety and richness of the field appears. This review is structured around a comparison of methods associated with three NMR observables: classical nuclear Overhauser effect (NOE) constraint gathering in contrast with more recent chemical shift and residual dipole coupling (RDC) based protocols. In each case, the emphasis is placed on the latest research, covering mainly the past 5 years. By describing both general concepts and representative programs, the objective is to map out a field in which--through the very profusion of approaches--it is all too easy to lose one's bearings.
Collapse
|
14
|
Wang X, Tash B, Flanagan JM, Tian F. RDC derived protein backbone resonance assignment using fragment assembly. JOURNAL OF BIOMOLECULAR NMR 2011; 49:85-98. [PMID: 21191805 PMCID: PMC6936109 DOI: 10.1007/s10858-010-9467-z] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/23/2010] [Accepted: 12/15/2010] [Indexed: 05/23/2023]
Abstract
Experimental residual dipolar couplings (RDCs) in combination with structural models have the potential for accelerating the protein backbone resonance assignment process because RDCs can be measured accurately and interpreted quantitatively. However, this application has been limited due to the need for very high-resolution structural templates. Here, we introduce a new approach to resonance assignment based on optimal agreement between the experimental and calculated RDCs from a structural template that contains all assignable residues. To overcome the inherent computational complexity of such a global search, we have adopted an efficient two-stage search algorithm and included connectivity data from conventional assignment experiments. In the first stage, a list of strings of resonances (CA-links) is generated via exhaustive searches for short segments of sequentially connected residues in a protein (local templates), and then ranked by the agreement of the experimental (13)C(α) chemical shifts and (15)N-(1)H RDCs to the predicted values for each local template. In the second stage, the top CA-links for different local templates in stage I are combinatorially connected to produce CA-links for all assignable residues. The resulting CA-links are ranked for resonance assignment according to their measured RDCs and predicted values from a tertiary structure. Since the final RDC ranking of CA-links includes all assignable residues and the assignment is derived from a "global minimum", our approach is far less reliant on the quality of experimental data and structural templates. The present approach is validated with the assignments of several proteins, including a 42 kDa maltose binding protein (MBP) using RDCs and structural templates of varying quality. Since backbone resonance assignment is an essential first step for most of biomolecular NMR applications and is often a bottleneck for large systems, we expect that this new approach will improve the efficiency of the assignment process for small and medium size proteins and will extend the size limits assignable by current methods for proteins with structural models.
Collapse
Affiliation(s)
- Xingsheng Wang
- Department of Biochemistry and Molecular Biology, College of Medicine, Pennsylvania State University, Hershey, PA 17033, USA
| | | | | | | |
Collapse
|
15
|
Crippen GM, Rousaki A, Revington M, Zhang Y, Zuiderweg ERP. SAGA: rapid automatic mainchain NMR assignment for large proteins. JOURNAL OF BIOMOLECULAR NMR 2010; 46:281-298. [PMID: 20232231 DOI: 10.1007/s10858-010-9403-2] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/07/2009] [Accepted: 02/23/2010] [Indexed: 05/26/2023]
Abstract
Here we describe a new algorithm for automatically determining the mainchain sequential assignment of NMR spectra for proteins. Using only the customary triple resonance experiments, assignments can be quickly found for not only small proteins having rather complete data, but also for large proteins, even when only half the residues can be assigned. The result of the calculation is not the single best assignment according to some criterion, but rather a large number of satisfactory assignments that are summarized in such a way as to help the user identify portions of the sequence that are assigned with confidence, vs. other portions where the assignment has some correlated alternatives. Thus very imperfect initial data can be used to suggest future experiments.
Collapse
Affiliation(s)
- Gordon M Crippen
- College of Pharmacy, University of Michigan, Ann Arbor, MI 48109, USA.
| | | | | | | | | |
Collapse
|