1
|
Gao Y, Lyu Q, Luo P, Li M, Zhou R, Zhang J, Lyu Q. Applications of Machine Learning to Predict Cisplatin Resistance in Lung Cancer. Int J Gen Med 2021; 14:5911-5925. [PMID: 34588799 PMCID: PMC8473573 DOI: 10.2147/ijgm.s329644] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2021] [Accepted: 09/03/2021] [Indexed: 12/25/2022] Open
Abstract
Purpose Lung cancer, mainly lung adenocarcinoma, lung squamous cell carcinoma and small cell lung cancer, has the highest incidence and cancer-related mortality worldwide. Platinum-based chemotherapy plays an important role in the treatment of various lung cancer subtypes, but not all patients benefit from this treatment regimen; thus, it is worth identifying lung cancer patients who are resistant or sensitive to platinum-based therapy. Methods The drug response and sequencing data of 170 lung cancer cell lines were downloaded from the Genomics of Drug Sensitivity in Cancer (GDSC) database, and support vector machines (SVMs) and beam search were used to select an optimal gene panel that can predict the sensitivity of cell lines to cisplatin. Then, we used available cell line data to explore the potential mechanisms. Results In this work, the drug response and sequencing data of 170 lung cancer cell lines were downloaded from the GDSC database, and SVMs and beam search were used to screen a panel of genes related to lung cancer cell line resistance to cisplatin. A final panel of nine genes (PLXNC1, KIAA0649, SPTBN4, SLC14A2, F13A1, COL5A1, SCN2A, PLEC, and ALMS1) was identified, and achieved an area under the curve (AUC) of 0.873 ± 0.004. The natural logarithm of the half maximal inhibitory concentration (lnIC50) values of the mutant-type (panel-MT) group was significantly higher than that of the wild-type (panel-WT) group, regardless of the lung cancer subtype. The differentially expressed pathways between the two groups may explain this difference. Conclusion In this study, we found that a panel of nine genes can accurately predict sensitivity to cisplatin, which may provide individualized treatment recommendations to improve the prognosis of patients with lung cancer.
Collapse
Affiliation(s)
- Yanan Gao
- Department of Radiotherapy, Affiliated Cancer Hospital, Zhengzhou University, Zhengzhou, People's Republic of China
| | - Qiong Lyu
- Department of Oncology, Zhujiang Hospital, Southern Medical University, Guangzhou, Guangdong, People's Republic of China
| | - Peng Luo
- Department of Oncology, Zhujiang Hospital, Southern Medical University, Guangzhou, Guangdong, People's Republic of China
| | - Mujiao Li
- School of Biomedical Engineering, Southern Medical University, Guangzhou, People's Republic of China
| | - Rui Zhou
- School of Biomedical Engineering, Southern Medical University, Guangzhou, People's Republic of China
| | - Jian Zhang
- Department of Oncology, Zhujiang Hospital, Southern Medical University, Guangzhou, Guangdong, People's Republic of China
| | - Qingwen Lyu
- Department of Information, Zhujiang Hospital, Southern Medical University, Guangzhou, People's Republic of China
| |
Collapse
|
2
|
Keich U, Tamura K, Noble WS. Averaging Strategy To Reduce Variability in Target-Decoy Estimates of False Discovery Rate. J Proteome Res 2019; 18:585-593. [PMID: 30560673 DOI: 10.1021/acs.jproteome.8b00802] [Citation(s) in RCA: 21] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Decoy database search with target-decoy competition (TDC) provides an intuitive, easy-to-implement method for estimating the false discovery rate (FDR) associated with spectrum identifications from shotgun proteomics data. However, the procedure can yield different results for a fixed data set analyzed with different decoy databases, and this decoy-induced variability is particularly problematic for smaller FDR thresholds, data sets, or databases. The average TDC (aTDC) protocol combats this problem by exploiting multiple independently shuffled decoy databases to provide an FDR estimate with reduced variability. We provide a tutorial introduction to aTDC, describe an improved variant of the protocol that offers increased statistical power, and discuss how to deploy aTDC in practice using the Crux software toolkit.
Collapse
Affiliation(s)
- Uri Keich
- School of Mathematics and Statistics F07 , University of Sydney , Sydney , New South Wales 2006 , Australia
| | - Kaipo Tamura
- Department of Genome Sciences , University of Washington , Foege Building S220B, 3720 15th Avenue NE , Seattle , Washington 98195-5065 , United States
| | - William Stafford Noble
- Department of Genome Sciences , University of Washington , Foege Building S220B, 3720 15th Avenue NE , Seattle , Washington 98195-5065 , United States.,Department of Computer Science and Engineering , University of Washington , Seattle , Washington 98195-5065 , United States
| |
Collapse
|
3
|
Pace MC, Xu G, Fromholt S, Howard J, Crosby K, Giasson BI, Lewis J, Borchelt DR. Changes in proteome solubility indicate widespread proteostatic disruption in mouse models of neurodegenerative disease. Acta Neuropathol 2018; 136:919-938. [PMID: 30140941 DOI: 10.1007/s00401-018-1895-y] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2018] [Accepted: 08/02/2018] [Indexed: 12/17/2022]
Abstract
The deposition of pathologic misfolded proteins in neurodegenerative disorders such as Alzheimer's disease, Parkinson's disease, frontotemporal dementia and amyotrophic lateral sclerosis is hypothesized to burden protein homeostatic (proteostatic) machinery, potentially leading to insufficient capacity to maintain the proteome. This hypothesis has been supported by previous work in our laboratory, as evidenced by the perturbation of cytosolic protein solubility in response to amyloid plaques in a mouse model of Alzheimer's amyloidosis. In the current study, we demonstrate changes in proteome solubility are a common pathology to mouse models of neurodegenerative disease. Pathological accumulations of misfolded tau, α-synuclein and mutant superoxide dismutase 1 in CNS tissues of transgenic mice were associated with changes in the solubility of hundreds of CNS proteins in each model. We observed that changes in proteome solubility were progressive and, using the rTg4510 model of inducible tau pathology, demonstrated that these changes were dependent upon sustained expression of the primary pathologic protein. In all of the models examined, changes in proteome solubility were robust, easily detected, and provided a sensitive indicator of proteostatic disruption. Interestingly, a subset of the proteins that display a shift towards insolubility were common between these different models, suggesting that a specific subset of the proteome is vulnerable to proteostatic disruption. Overall, our data suggest that neurodegenerative proteinopathies modeled in mice impose a burden on the proteostatic network that diminishes the ability of neural cells to prevent aberrant conformational changes that alter the solubility of hundreds of abundant cellular proteins.
Collapse
Affiliation(s)
- Michael C Pace
- Department of Neuroscience, Center for Translational Research in Neurodegenerative Disease, McKnight Brain Institute, University of Florida, Gainesville, FL, 32610-0244, USA
| | - Guilian Xu
- Department of Neuroscience, Center for Translational Research in Neurodegenerative Disease, McKnight Brain Institute, University of Florida, Gainesville, FL, 32610-0244, USA
| | - Susan Fromholt
- Department of Neuroscience, Center for Translational Research in Neurodegenerative Disease, McKnight Brain Institute, University of Florida, Gainesville, FL, 32610-0244, USA
| | - John Howard
- Department of Neuroscience, Center for Translational Research in Neurodegenerative Disease, McKnight Brain Institute, University of Florida, Gainesville, FL, 32610-0244, USA
| | - Keith Crosby
- Department of Neuroscience, Center for Translational Research in Neurodegenerative Disease, McKnight Brain Institute, University of Florida, Gainesville, FL, 32610-0244, USA
| | - Benoit I Giasson
- Department of Neuroscience, Center for Translational Research in Neurodegenerative Disease, McKnight Brain Institute, University of Florida, Gainesville, FL, 32610-0244, USA
| | - Jada Lewis
- Department of Neuroscience, Center for Translational Research in Neurodegenerative Disease, McKnight Brain Institute, University of Florida, Gainesville, FL, 32610-0244, USA.
| | - David R Borchelt
- Department of Neuroscience, Center for Translational Research in Neurodegenerative Disease, McKnight Brain Institute, University of Florida, Gainesville, FL, 32610-0244, USA.
- SantaFe Healthcare Alzheimer's Disease Research Center, Gainesville, FL, USA.
| |
Collapse
|
4
|
Characterization of gene regulation and protein interaction networks for Matrin 3 encoding mutations linked to amyotrophic lateral sclerosis and myopathy. Sci Rep 2018; 8:4049. [PMID: 29511296 PMCID: PMC5840295 DOI: 10.1038/s41598-018-21371-4] [Citation(s) in RCA: 27] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2017] [Accepted: 02/02/2018] [Indexed: 02/08/2023] Open
Abstract
To understand how mutations in Matrin 3 (MATR3) cause amyotrophic lateral sclerosis (ALS) and distal myopathy, we used transcriptome and interactome analysis, coupled with microscopy. Over-expression of wild-type (WT) or F115C mutant MATR3 had little impact on gene expression in neuroglia cells. Only 23 genes, expressed at levels of >100 transcripts showed ≥1.6-fold changes in expression by transfection with WT or mutant MATR3:YFP vectors. We identified ~123 proteins that bound MATR3, with proteins associated with stress granules and RNA processing/splicing being prominent. The interactome of myopathic S85C and ALS-variant F115C MATR3 were virtually identical to WT protein. Deletion of RNA recognition motif (RRM1) or Zn finger motifs (ZnF1 or ZnF2) diminished the binding of a subset of MATR3 interacting proteins. Remarkably, deletion of the RRM2 motif caused enhanced binding of >100 hundred proteins. In live cells, MATR3 lacking RRM2 (ΔRRM2) formed intranuclear spherical structures that fused over time into large structures. Our findings in the cell models used here suggest that MATR3 with disease-causing mutations is not dramatically different from WT protein in modulating gene regulation or in binding to normal interacting partners. The intra-nuclear localization and interaction network of MATR3 is strongly modulated by its RRM2 domain.
Collapse
|
5
|
Liang X, Xia Z, Jian L, Niu X, Link A. An adaptive classification model for peptide identification. BMC Genomics 2015; 16 Suppl 11:S1. [PMID: 26578406 PMCID: PMC4652454 DOI: 10.1186/1471-2164-16-s11-s1] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/14/2023] Open
Abstract
Background Peptide sequence assignment is the central task in protein identification with MS/MS-based strategies. Although a number of post-database search algorithms for filtering target peptide spectrum matches (PSMs) have been developed, the discrepancy among the output PSMs is usually significant, remaining a few disputable PSMs. Current studies show that a number of target PSMs which are close to decoy PSMs can hardly be separated from those decoys by only using the discrimination function. Results In this paper, we assign each target PSM a weight showing its possibility of being correct. We employ a SVM-based learning model to search the optimal weight for each target PSM and develop a new score system, CRanker, to rank all target PSMs. Due to the large PSM datasets generated in routine database searches, we use the Cholesky factorization technique for storing a kernel matrix to reduce the memory requirement. Conclusions Compared with PeptideProphet and Percolator, CRanker has identified more PSMs under similar false discover rates over different datasets. CRanker has shown consistent performance on different test sets, validated the reasonability the proposed model.
Collapse
|
6
|
Sikdar S, Gill R, Datta S. Improving protein identification from tandem mass spectrometry data by one-step methods and integrating data from other platforms. Brief Bioinform 2015; 17:262-9. [PMID: 26141827 DOI: 10.1093/bib/bbv043] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2015] [Indexed: 01/28/2023] Open
Abstract
MOTIVATION Many approaches have been proposed for the protein identification problem based on tandem mass spectrometry (MS/MS) data. In these experiments, proteins are digested into peptides and the resulting peptide mixture is subjected to mass spectrometry. Some interesting putative peptide features (peaks) are selected from the mass spectra. Following that, the precursor ions undergo fragmentation and are analyzed by MS/MS. The process of identification of peptides from the mass spectra and the constituent proteins in the sample is called protein identification from MS/MS data. There are many two-step protein identification procedures, reviewed in the literature, which first attempt to identify the peptides in a separate process and then use these results to infer the proteins. However, in recent years, there have been attempts to provide a one-step solution to protein identification, which simultaneously identifies the proteins and the peptides in the sample. RESULTS In this review, we briefly introduce the most popular two-step protein identification procedure, PeptideProphet coupled with ProteinProphet. Following that, we describe the difficulties with two-step procedures and review some recently introduced one-step protein/peptide identification procedures that do not suffer from these issues. The focus of this review is on one-step procedures that are based on statistical likelihood-based models, but some discussion of other one-step procedures is also included. We report comparative performances of one-step and two-step methods, which support the overall superiorities of one-step procedures. We also cover some recent efforts to improve protein identification by incorporating other molecular data along with MS/MS data.
Collapse
|
7
|
Quantification of cellular NEMO content and its impact on NF-κB activation by genotoxic stress. PLoS One 2015; 10:e0116374. [PMID: 25742655 PMCID: PMC4350935 DOI: 10.1371/journal.pone.0116374] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2014] [Accepted: 12/08/2014] [Indexed: 12/28/2022] Open
Abstract
NF-κB essential modulator, NEMO, plays a key role in canonical NF-κB signaling induced by a variety of stimuli, including cytokines and genotoxic agents. To dissect the different biochemical and functional roles of NEMO in NF-κB signaling, various mutant forms of NEMO have been previously analyzed. However, transient or stable overexpression of wild-type NEMO can significantly inhibit NF-κB activation, thereby confounding the analysis of NEMO mutant phenotypes. What levels of NEMO overexpression lead to such an artifact and what levels are tolerated with no significant impact on NEMO function in NF-κB activation are currently unknown. Here we purified full-length recombinant human NEMO protein and used it as a standard to quantify the average number of NEMO molecules per cell in a 1.3E2 NEMO-deficient murine pre-B cell clone stably reconstituted with full-length human NEMO (C5). We determined that the C5 cell clone has an average of 4 x 105 molecules of NEMO per cell. Stable reconstitution of 1.3E2 cells with different numbers of NEMO molecules per cell has demonstrated that a 10-fold range of NEMO expression (0.6–6x105 molecules per cell) yields statistically equivalent NF-κB activation in response to the DNA damaging agent etoposide. Using the C5 cell line, we also quantified the number of NEMO molecules per cell in several commonly employed human cell lines. These results establish baseline numbers of endogenous NEMO per cell and highlight surprisingly normal functionality of NEMO in the DNA damage pathway over a wide range of expression levels that can provide a guideline for future NEMO reconstitution studies.
Collapse
|
8
|
Keich U, Noble WS. On the importance of well-calibrated scores for identifying shotgun proteomics spectra. J Proteome Res 2014; 14:1147-60. [PMID: 25482958 PMCID: PMC4324453 DOI: 10.1021/pr5010983] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
![]()
Identifying
the peptide responsible for generating an observed
fragmentation spectrum requires scoring a collection of candidate
peptides and then identifying the peptide that achieves the highest
score. However, analysis of a large collection of such spectra requires
that the score assigned to one spectrum be well-calibrated with respect
to the scores assigned to other spectra. In this work, we define the
notion of calibration in the context of shotgun proteomics spectrum
identification, and we introduce a simple, albeit computationally
intensive, technique to calibrate an arbitrary score function. We
demonstrate that this calibration procedure yields an increased number
of identified spectra at a fixed false discovery rate (FDR) threshold.
We also show that proper calibration of scores has a surprising effect
on a previously described FDR estimation procedure, making the procedure
less conservative. Finally, we provide empirical results suggesting
that even partial calibration, which is much less computationally
demanding, can yield significant increases in spectrum identification.
Overall, we argue that accurate shotgun proteomics analysis requires
careful attention to score calibration.
Collapse
Affiliation(s)
- Uri Keich
- School of Mathematics and Statistics F07, University of Sydney , NSW 2006, Australia
| | | |
Collapse
|
9
|
Abstract
Background The sequence database searching has been the dominant method for peptide identification, in which a large number of peptide spectra generated from LC/MS/MS experiments are searched using a search engine against theoretical fragmentation spectra derived from a protein sequences database or a spectral library. Selecting trustworthy peptide spectrum matches (PSMs) remains a challenge. Results A novel scoring method named FC-Ranker is developed to assign a nonnegative weight to each target PSM based on the possibility of its being correct. Particularly, the scores of PSMs are updated by using a fuzzy SVM classification model and a fuzzy silhouette index iteratively. Trustworthy PSMs will be assigned high scores when the algorithm stops. Conclusions Our experimental studies show that FC-Ranker outperforms other post-database search algorithms over a variety of datasets, and it can be extended to solve a general classification problem with uncertain labels.
Collapse
|
10
|
Dumaual CM, Steere BA, Walls CD, Wang M, Zhang ZY, Randall SK. Integrated analysis of global mRNA and protein expression data in HEK293 cells overexpressing PRL-1. PLoS One 2013; 8:e72977. [PMID: 24019887 PMCID: PMC3760866 DOI: 10.1371/journal.pone.0072977] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2013] [Accepted: 07/17/2013] [Indexed: 11/19/2022] Open
Abstract
BACKGROUND The protein tyrosine phosphatase PRL-1 represents a putative oncogene with wide-ranging cellular effects. Overexpression of PRL-1 can promote cell proliferation, survival, migration, invasion, and metastasis, but the underlying mechanisms by which it influences these processes remain poorly understood. METHODOLOGY To increase our comprehension of PRL-1 mediated signaling events, we employed transcriptional profiling (DNA microarray) and proteomics (mass spectrometry) to perform a thorough characterization of the global molecular changes in gene expression that occur in response to stable PRL-1 overexpression in a relevant model system (HEK293). PRINCIPAL FINDINGS Overexpression of PRL-1 led to several significant changes in the mRNA and protein expression profiles of HEK293 cells. The differentially expressed gene set was highly enriched in genes involved in cytoskeletal remodeling, integrin-mediated cell-matrix adhesion, and RNA recognition and splicing. In particular, members of the Rho signaling pathway and molecules that converge on this pathway were heavily influenced by PRL-1 overexpression, supporting observations from previous studies that link PRL-1 to the Rho GTPase signaling network. In addition, several genes not previously associated with PRL-1 were found to be significantly altered by its expression. Most notable among these were Filamin A, RhoGDIα, SPARC, hnRNPH2, and PRDX2. CONCLUSIONS AND SIGNIFICANCE This systems-level approach sheds new light on the molecular networks underlying PRL-1 action and presents several novel directions for future, hypothesis-based studies.
Collapse
Affiliation(s)
- Carmen M. Dumaual
- Department of Biology, Indiana University-Purdue University Indianapolis, Indianapolis, Indiana, United States of America
| | - Boyd A. Steere
- Lilly Research Laboratories, Eli Lilly and Company, Indianapolis, Indiana, United States of America
| | - Chad D. Walls
- Department of Biochemistry and Molecular Biology, Indiana University School of Medicine, Indianapolis, Indiana, United States of America
| | - Mu Wang
- Department of Biochemistry and Molecular Biology, Indiana University School of Medicine, Indianapolis, Indiana, United States of America
| | - Zhong-Yin Zhang
- Department of Biochemistry and Molecular Biology, Indiana University School of Medicine, Indianapolis, Indiana, United States of America
| | - Stephen K. Randall
- Department of Biology, Indiana University-Purdue University Indianapolis, Indianapolis, Indiana, United States of America
| |
Collapse
|
11
|
Kumar S, Zou Y, Bao Q, Wang M, Dai G. Proteomic analysis of immediate-early response plasma proteins after 70% and 90% partial hepatectomy. Hepatol Res 2013; 43:876-89. [PMID: 23279269 PMCID: PMC4354878 DOI: 10.1111/hepr.12030] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/30/2012] [Revised: 11/18/2012] [Accepted: 11/20/2012] [Indexed: 02/08/2023]
Abstract
AIM Partial hepatectomy (PH) induces robust hepatic regenerative and metabolic responses that are considered to be triggered by humoral factors. The aim of the study was to identify plasma protein factors that potentially trigger or reflect the body's immediate-early responses to liver mass reduction. METHODS Male C57BL/6 mice were subjected to sham operation, 70% PH or 90% PH. Blood was collected from the inferior vena cava at 20, 60 and 180 min after surgery. RESULTS Using a label-free quantitative mass spectrometry-based proteomics approach, we identified 399 proteins exhibiting significant changes in plasma expression between any two groups. Of the 399 proteins, 167 proteins had multiple unique sequences and high peptide ID confidence (>90%) and were defined as priority 1 proteins. A group of plasma proteins largely associated with metabolism is enriched after 70% PH. Among the plasma proteins that respond to 90% PH are a dominant group of proteins that are also associated with metabolism and one known cytokine (platelet factor 4). Ninety percent PH and 70% PH induces similar changes in plasma protein profile. CONCLUSION Our findings enable us to gain insight into the immediate-early response of plasma proteins to liver mass loss. Our data support the notion that increased metabolic demands of the body after massive liver mass loss may function as a sensor that calibrates hepatic regenerative response.
Collapse
Affiliation(s)
- Sudhanshu Kumar
- Department of Biology, School of Science, Center for Regenerative Biology and Medicine, Indiana University-Purdue University Indianapolis, Indiana
| | - Yuhong Zou
- Department of Biology, School of Science, Center for Regenerative Biology and Medicine, Indiana University-Purdue University Indianapolis, Indiana
| | - Qi Bao
- Department of Biology, School of Science, Center for Regenerative Biology and Medicine, Indiana University-Purdue University Indianapolis, Indiana
| | - Mu Wang
- Department of Biochemistry and Molecular Biology, School of Medicine, Indiana University, Indianapolis, Indiana
| | - Guoli Dai
- Department of Biology, School of Science, Center for Regenerative Biology and Medicine, Indiana University-Purdue University Indianapolis, Indiana
| |
Collapse
|
12
|
Langley RJ, Tsalik EL, van Velkinburgh JC, Glickman SW, Rice BJ, Wang C, Chen B, Carin L, Suarez A, Mohney RP, Freeman DH, Wang M, You J, Wulff J, Thompson JW, Moseley MA, Reisinger S, Edmonds BT, Grinnell B, Nelson DR, Dinwiddie DL, Miller NA, Saunders CJ, Soden SS, Rogers AJ, Gazourian L, Fredenburgh LE, Massaro AF, Baron RM, Choi AMK, Corey GR, Ginsburg GS, Cairns CB, Otero RM, Fowler VG, Rivers EP, Woods CW, Kingsmore SF. An integrated clinico-metabolomic model improves prediction of death in sepsis. Sci Transl Med 2013; 5:195ra95. [PMID: 23884467 DOI: 10.1126/scitranslmed.3005893] [Citation(s) in RCA: 329] [Impact Index Per Article: 29.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
Abstract
Sepsis is a common cause of death, but outcomes in individual patients are difficult to predict. Elucidating the molecular processes that differ between sepsis patients who survive and those who die may permit more appropriate treatments to be deployed. We examined the clinical features and the plasma metabolome and proteome of patients with and without community-acquired sepsis, upon their arrival at hospital emergency departments and 24 hours later. The metabolomes and proteomes of patients at hospital admittance who would ultimately die differed markedly from those of patients who would survive. The different profiles of proteins and metabolites clustered into the following groups: fatty acid transport and β-oxidation, gluconeogenesis, and the citric acid cycle. They differed consistently among several sets of patients, and diverged more as death approached. In contrast, the metabolomes and proteomes of surviving patients with mild sepsis did not differ from survivors with severe sepsis or septic shock. An algorithm derived from clinical features together with measurements of five metabolites predicted patient survival. This algorithm may help to guide the treatment of individual patients with sepsis.
Collapse
|
13
|
Granholm V, Noble WS, Käll L. A cross-validation scheme for machine learning algorithms in shotgun proteomics. BMC Bioinformatics 2012; 13 Suppl 16:S3. [PMID: 23176259 PMCID: PMC3489528 DOI: 10.1186/1471-2105-13-s16-s3] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
Peptides are routinely identified from mass spectrometry-based proteomics experiments by matching observed spectra to peptides derived from protein databases. The error rates of these identifications can be estimated by target-decoy analysis, which involves matching spectra to shuffled or reversed peptides. Besides estimating error rates, decoy searches can be used by semi-supervised machine learning algorithms to increase the number of confidently identified peptides. As for all machine learning algorithms, however, the results must be validated to avoid issues such as overfitting or biased learning, which would produce unreliable peptide identifications. Here, we discuss how the target-decoy method is employed in machine learning for shotgun proteomics, focusing on how the results can be validated by cross-validation, a frequently used validation scheme in machine learning. We also use simulated data to demonstrate the proposed cross-validation scheme's ability to detect overfitting.
Collapse
Affiliation(s)
- Viktor Granholm
- Science for Life Laboratory, Department of Biochemistry and Biophysics, Stockholm University, Solna, Sweden
| | | | | |
Collapse
|
14
|
Cox JM, Troutt JS, Knierman MD, Siegel RW, Qian YW, Ackermann BL, Konrad RJ. Determination of cathepsin S abundance and activity in human plasma and implications for clinical investigation. Anal Biochem 2012; 430:130-7. [PMID: 22922382 DOI: 10.1016/j.ab.2012.08.011] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2012] [Revised: 07/16/2012] [Accepted: 08/16/2012] [Indexed: 11/24/2022]
Abstract
There is strong experimental evidence associating cathepsin S with the pathogenesis of atherosclerosis, with emerging data to support its role in diseases such as abdominal aortic aneurysm, obesity, and type 2 diabetes. To further our understanding of cathepsin S, we have developed a novel sandwich immunoassay to measure the mature form of cathepsin S in plasma (mean values from 12 healthy donors of 53±17ng/ml, range=39-102). We also developed a targeted liquid chromatography-tandem mass spectrometry (LC-MS/MS) assay to measure in vitro cathepsin S activity to compare activity levels with the protein mass levels determined by enzyme-linked immunosorbent assay (ELISA). Interestingly, we observed that only 0.4 to 1.1% of circulating cathepsin S was enzymatically active. We subsequently demonstrated that the attenuated activity we observed resulted from binding between cathepsin S and its endogenous inhibitor cystatin C in plasma. These data were obtained through immunoprecipitation coupled with either Western blotting analysis or in-gel tryptic digestion and LC-MS/MS characterization of Coomassie-stained gel bands. Although many laboratories have explored the relationship between cathepsin S and cystatin C, this is the first study to demonstrate their association in human circulation, a finding that could prove to be important in furthering our understanding of cathepsin S biology.
Collapse
Affiliation(s)
- Jennifer M Cox
- Lilly Research Laboratories, Eli Lilly and Company, Indianapolis, IN 46285, USA.
| | | | | | | | | | | | | |
Collapse
|
15
|
Chen VC, Gouw JW, Naus CC, Foster LJ. Connexin multi-site phosphorylation: mass spectrometry-based proteomics fills the gap. BIOCHIMICA ET BIOPHYSICA ACTA-BIOMEMBRANES 2012; 1828:23-34. [PMID: 22421184 DOI: 10.1016/j.bbamem.2012.02.028] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/15/2011] [Revised: 02/19/2012] [Accepted: 02/28/2012] [Indexed: 10/28/2022]
Abstract
Connexins require an integrated network for protein synthesis, assembly, gating, internalization, degradation and feedback control that are necessary to regulate the biosynthesis, and turnover of gap junction channels. At the most fundamental level, the introduction of sequence-altering, modifications introduces changes in protein conformation, activity, charge, stability and localization. Understanding the sites, patterns and magnitude of protein post-translational modification, including phosphorylation, is absolutely critical. Historically, the examination of connexin phosphorylation has been placed within the context that one or small number of sites of modification strictly corresponds to one molecular function. However, the release of high-profile proteomic datasets appears to challenge this dogma by demonstrating connexins undergo multiple levels of multi-site phosphorylation. With the growing prominence of mass spectrometry in biology and medicine, we are now getting a glimpse of the richness of connexin phosphate signals. Having implications to health and disease, this review provides an overview of technologies in the context of targeted and discovery proteomics, and further discusses how these techniques are being applied to "fill the gaps" in understanding of connexin post-translational control. This article is part of a Special Issue entitled: The Communicating junctions, roles and dysfunctions.
Collapse
Affiliation(s)
- Vincent C Chen
- Department of Biochemistry and Molecular Biology, University of British Columbia, Vancouver, British Columbia, Canada.
| | | | | | | |
Collapse
|
16
|
Bell LN, Vuppalanchi R, Watkins PB, Bonkovsky HL, Serrano J, Fontana RJ, Wang M, Rochon J, Chalasani N. Serum proteomic profiling in patients with drug-induced liver injury. Aliment Pharmacol Ther 2012; 35:600-12. [PMID: 22403816 PMCID: PMC3654532 DOI: 10.1111/j.1365-2036.2011.04982.x] [Citation(s) in RCA: 50] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Abstract
BACKGROUND Idiosyncratic drug-induced liver injury (DILI) is a complex disorder that is difficult to predict, diagnose and treat. AIM To describe the global serum proteome of patients with DILI and controls. METHODS A label-free, mass spectrometry-based quantitative proteomic approach was used to explore protein expression in serum samples from 74 DILI patients (collected within 14 days of DILI onset) and 40 controls. A longitudinal analysis was conducted in a subset of 21 DILI patients with available 6-month follow-up serum samples. RESULTS Comparison of DILI patients based on pattern, severity and causality assessment of liver injury revealed many differentially expressed priority 1 proteins among groups. Expression of fumarylacetoacetase was correlated with alanine aminotransferase (ALT; r = 0.237; P = 0.047), aspartate aminotransferase (AST; r = 0.389; P = 0.001) and alkaline phosphatase (r = -0.240; P = 0.043), and this was the only protein with significant differential expression when comparing patients with hepatocellular vs. cholestatic or mixed injury. In the longitudinal analysis, expression of 53 priority 1 proteins changed significantly from onset of DILI to 6-month follow-up, and nearly all proteins returned to expression levels comparable to control subjects. Ninety-two serum priority 1 proteins with significant differential expression were identified when comparing the DILI and control groups. Pattern analysis revealed proteins that are components of inflammation, immune system activation and several hepatotoxicity-specific pathways. Apolipoprotein E expression had the greatest power to differentiate DILI patients from controls (89% correct classification; AUROC = 0.97). CONCLUSION This proteomic analysis identified differentially expressed proteins that are components of pathways previously implicated in the pathogenesis of idiosyncratic drug-induced liver injury.
Collapse
Affiliation(s)
- L. N. Bell
- Division of Gastroenterology/Hepatology, Indiana University, Indianapolis, IN, USA
| | - R. Vuppalanchi
- Division of Gastroenterology/Hepatology, Indiana University, Indianapolis, IN, USA
| | - P. B. Watkins
- Department of Internal Medicine, University of North Carolina at Hill, Chapel Hill, NC, USA
| | - H. L. Bonkovsky
- Department of Internal Medicine, University of North Carolina at Hill, Chapel Hill, NC, USA.
,Cannon Research Center and Center for Liver and Digestive Diseases, Carolinas Medical Center, Charlotte, NC, USA.
,Department of Internal Medicine, University of Connecticut, Farmington, CT, USA
| | - J. Serrano
- Liver Disease Research Branch, NIH/NIDDK, Bethesda, MD, USA
| | - R. J. Fontana
- Department of Internal Medicine, University of Michigan, Ann Arbor, MI, USA
| | - M. Wang
- Protein Analysis Research Center, Indiana University, Indianapolis, IN, USA
| | - J. Rochon
- Duke Clinical Research Institute, Durham, NC, USA
| | - N. Chalasani
- Division of Gastroenterology/Hepatology, Indiana University, Indianapolis, IN, USA
| | | |
Collapse
|
17
|
Abstract
OBJECTIVES The aims of this study were to characterize the proteome of normal pancreatic juice, to analyze the effect of secretin on the normal proteome, and to compare these results with published data from patients with pancreatic cancer. METHODS Paired pancreatic fluid specimens (before and after intravenous secretin stimulation) were obtained during endoscopic pancreatography from 3 patients without significant pancreatic pathology. Proteins were identified and quantified by mass spectrometry-based protein quantification technology. The human RefSeq (NCBI) database was used to compare the data in samples from patients without pancreatic disease with published data from 3 patients with pancreatic cancer. RESULTS A total of 285 proteins were identified in normal pancreatic juice. Ninety had sufficient amino acid sequences identified to characterize the protein with a high level of confidence. All 90 proteins were present before and after secretin administration but with altered relative concentrations, usually by 1 to 2 folds, after stimulation. Comparison with 170 published pancreatic cancer proteins yielded an overlap of only 42 proteins. CONCLUSIONS Normal pancreatic juice contains multiple proteins related to many biological processes. Secretin alters the concentration but not the spectrum of these proteins. The pancreatic juice proteome of patients without pancreatic disease and that of patients with pancreatic cancer differ markedly.
Collapse
|
18
|
Sheng Q, Dai J, Wu Y, Tang H, Zeng R. BuildSummary: Using a Group-Based Approach To Improve the Sensitivity of Peptide/Protein Identification in Shotgun Proteomics. J Proteome Res 2012; 11:1494-502. [DOI: 10.1021/pr200194p] [Citation(s) in RCA: 44] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Affiliation(s)
- Quanhu Sheng
- Key Laboratory of Systems Biology,
Institute of Biochemistry and Cell Biology, Shanghai Institutes for
Biological Sciences, Chinese Academy of Sciences, Shanghai 200031, China
| | - Jie Dai
- Key Laboratory of Systems Biology,
Institute of Biochemistry and Cell Biology, Shanghai Institutes for
Biological Sciences, Chinese Academy of Sciences, Shanghai 200031, China
| | - Yibo Wu
- Key Laboratory of Systems Biology,
Institute of Biochemistry and Cell Biology, Shanghai Institutes for
Biological Sciences, Chinese Academy of Sciences, Shanghai 200031, China
| | - Haixu Tang
- School of Informatics and Computing, Indiana University, Bloomington, Indiana 47406, United
States
| | - Rong Zeng
- Key Laboratory of Systems Biology,
Institute of Biochemistry and Cell Biology, Shanghai Institutes for
Biological Sciences, Chinese Academy of Sciences, Shanghai 200031, China
| |
Collapse
|
19
|
Abstract
Major technological advances have made proteomics an extremely active field for biomarker discovery in recent years due primarily to the development of newer mass spectrometric technologies and the explosion in genomic and protein bioinformatics. This leads to an increased emphasis on larger scale, faster, and more efficient methods for detecting protein biomarkers in human tissues, cells, and biofluids. Most current proteomic methodologies for biomarker discovery, however, are not highly automated and are generally labor-intensive and expensive. More automation and improved software programs capable of handling a large amount of data are essential to reduce the cost of discovery and to increase throughput. In this chapter, we discuss and describe mass spectrometry-based proteomic methods for quantitative protein analysis.
Collapse
Affiliation(s)
- Mu Wang
- Department of Biochemistry and Molecular Biology, Indiana University School of Medicine, 635 Barnhill Drive, MS 4053, Indianapolis, IN 46202, USA.
| | | |
Collapse
|
20
|
Zieger MAJ, Gupta MP, Wang M. Proteomic analysis of endothelial cold-adaptation. BMC Genomics 2011; 12:630. [PMID: 22192797 PMCID: PMC3270058 DOI: 10.1186/1471-2164-12-630] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2011] [Accepted: 12/22/2011] [Indexed: 02/07/2023] Open
Abstract
BACKGROUND Understanding how human cells in tissue culture adapt to hypothermia may aid in developing new clinical procedures for improved ischemic and hypothermic protection. Human coronary artery endothelial cells grown to confluence at 37°C and then transferred to 25°C become resistant over time to oxidative stress and injury induced by 0°C storage and rewarming. This protection correlates with an increase in intracellular glutathione at 25°C. To help understand the molecular basis of endothelial cold-adaptation, isolated proteins from cold-adapted (25°C/72 h) and pre-adapted cells were analyzed by quantitative proteomic methods and differentially expressed proteins were categorized using the DAVID Bioinformatics Resource. RESULTS Cells adapted to 25°C expressed changes in the abundance of 219 unique proteins representing a broad range of categories such as translation, glycolysis, biosynthetic (anabolic) processes, NAD, cytoskeletal organization, RNA processing, oxidoreductase activity, response-to-stress and cell redox homeostasis. The number of proteins that decreased significantly with cold-adaptation exceeded the number that increased by 2:1. Almost half of the decreases were associated with protein metabolic processes and a third were related to anabolic processes including protein, DNA and fatty acid synthesis. Changes consistent with the suppression of cytoskeletal dynamics provided further evidence that cold-adapted cells are in an energy conserving state. Among the specific changes were increases in the abundance and activity of redox proteins glutathione S-transferase, thioredoxin and thioredoxin reductase, which correlated with a decrease in oxidative stress, an increase in protein glutathionylation, and a recovery of reduced protein thiols during rewarming from 0°C. Increases in S-adenosylhomocysteine hydrolase and nicotinamide phosphoribosyltransferase implicate a central role for the methionine-cysteine transulfuration pathway in increasing glutathione levels and the NAD salvage pathway in increasing the reducing capacity of cold-adapted cells. CONCLUSIONS Endothelial adaptation to mild-moderate hypothermia down-regulates anabolic processes and increases the reducing capacity of cells to enhance their resistance to oxidation and injury associated with 0°C storage and rewarming. Inducing these characteristics in a clinical setting could potentially limit the damaging effects of energy insufficiency due to ischemia and prevent the disruption of integrated metabolism at low temperatures.
Collapse
Affiliation(s)
- Michael A J Zieger
- Methodist Research Institute, Indiana University Health, Indianapolis, IN 46202, USA.
| | | | | |
Collapse
|
21
|
Proteomic Characterization of Cerebrospinal Fluid from Ataxia-Telangiectasia (A-T) Patients Using a LC/MS-Based Label-Free Protein Quantification Technology. INTERNATIONAL JOURNAL OF PROTEOMICS 2011; 2011:578903. [PMID: 22084690 PMCID: PMC3200215 DOI: 10.1155/2011/578903] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/24/2011] [Accepted: 03/18/2011] [Indexed: 11/18/2022]
Abstract
Cerebrospinal fluid (CSF) has been used for biomarker discovery of neurodegenerative diseases in humans since biological changes in the brain can be seen in this biofluid. Inactivation of A-T-mutated protein (ATM), a multifunctional protein kinase, is responsible for A-T, yet biochemical studies have not succeeded in conclusively identifying the molecular mechanism(s) underlying the neurodegeneration seen in A-T patients or the proteins that can be used as biomarkers for neurologic assessment of A-T or as potential therapeutic targets. In this study, we applied a high-throughput LC/MS-based label-free protein quantification technology to quantitatively characterize the proteins in CSF samples in order to identify differentially expressed proteins that can serve as potential biomarker candidates for A-T. Among 204 identified CSF proteins with high peptide-identification confidence, thirteen showed significant protein expression changes. Bioinformatic analysis revealed that these 13 proteins are either involved in neurodegenerative disorders or cancer. Future molecular and functional characterization of these proteins would provide more insights into the potential therapeutic targets for the treatment of A-T and the biomarkers that can be used to monitor or predict A-T disease progression. Clinical validation studies are required before any of these proteins can be developed into clinically useful biomarkers.
Collapse
|
22
|
van den Toorn HWP, Muñoz J, Mohammed S, Raijmakers R, Heck AJR, van Breukelen B. RockerBox: Analysis and Filtering of Massive Proteomics Search Results. J Proteome Res 2011; 10:1420-4. [DOI: 10.1021/pr1010185] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
Affiliation(s)
- Henk W. P. van den Toorn
- Bijvoet Center for Biomolecular Research and Utrecht Institute for Pharmaceutical Sciences, Utrecht University, Padualaan 8, 3584 CH Utrecht, The Netherlands
- Netherlands Proteomics Centre, The Netherlands
- Netherlands Bioinformatics Centre, The Netherlands
| | - Javier Muñoz
- Bijvoet Center for Biomolecular Research and Utrecht Institute for Pharmaceutical Sciences, Utrecht University, Padualaan 8, 3584 CH Utrecht, The Netherlands
- Netherlands Proteomics Centre, The Netherlands
| | - Shabaz Mohammed
- Bijvoet Center for Biomolecular Research and Utrecht Institute for Pharmaceutical Sciences, Utrecht University, Padualaan 8, 3584 CH Utrecht, The Netherlands
- Netherlands Proteomics Centre, The Netherlands
| | - Reinout Raijmakers
- Bijvoet Center for Biomolecular Research and Utrecht Institute for Pharmaceutical Sciences, Utrecht University, Padualaan 8, 3584 CH Utrecht, The Netherlands
- Netherlands Proteomics Centre, The Netherlands
| | - Albert J. R. Heck
- Bijvoet Center for Biomolecular Research and Utrecht Institute for Pharmaceutical Sciences, Utrecht University, Padualaan 8, 3584 CH Utrecht, The Netherlands
- Netherlands Proteomics Centre, The Netherlands
| | - Bas van Breukelen
- Bijvoet Center for Biomolecular Research and Utrecht Institute for Pharmaceutical Sciences, Utrecht University, Padualaan 8, 3584 CH Utrecht, The Netherlands
- Netherlands Proteomics Centre, The Netherlands
- Netherlands Bioinformatics Centre, The Netherlands
| |
Collapse
|
23
|
Nesvizhskii AI. A survey of computational methods and error rate estimation procedures for peptide and protein identification in shotgun proteomics. J Proteomics 2010; 73:2092-123. [PMID: 20816881 DOI: 10.1016/j.jprot.2010.08.009] [Citation(s) in RCA: 358] [Impact Index Per Article: 25.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2010] [Revised: 08/25/2010] [Accepted: 08/25/2010] [Indexed: 12/18/2022]
Abstract
This manuscript provides a comprehensive review of the peptide and protein identification process using tandem mass spectrometry (MS/MS) data generated in shotgun proteomic experiments. The commonly used methods for assigning peptide sequences to MS/MS spectra are critically discussed and compared, from basic strategies to advanced multi-stage approaches. A particular attention is paid to the problem of false-positive identifications. Existing statistical approaches for assessing the significance of peptide to spectrum matches are surveyed, ranging from single-spectrum approaches such as expectation values to global error rate estimation procedures such as false discovery rates and posterior probabilities. The importance of using auxiliary discriminant information (mass accuracy, peptide separation coordinates, digestion properties, and etc.) is discussed, and advanced computational approaches for joint modeling of multiple sources of information are presented. This review also includes a detailed analysis of the issues affecting the interpretation of data at the protein level, including the amplification of error rates when going from peptide to protein level, and the ambiguities in inferring the identifies of sample proteins in the presence of shared peptides. Commonly used methods for computing protein-level confidence scores are discussed in detail. The review concludes with a discussion of several outstanding computational issues.
Collapse
|
24
|
McHugh LC, Arthur JW. Harvest: an open-source tool for the validation and improvement of peptide identification metrics and fragmentation exploration. BMC Bioinformatics 2010; 11:448. [PMID: 20815925 PMCID: PMC2941693 DOI: 10.1186/1471-2105-11-448] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2010] [Accepted: 09/06/2010] [Indexed: 01/21/2023] Open
Abstract
Background Protein identification using mass spectrometry is an important tool in many areas of the life sciences, and in proteomics research in particular. Increasing the number of proteins correctly identified is dependent on the ability to include new knowledge about the mass spectrometry fragmentation process, into computational algorithms designed to separate true matches of peptides to unidentified mass spectra from spurious matches. This discrimination is achieved by computing a function of the various features of the potential match between the observed and theoretical spectra to give a numerical approximation of their similarity. It is these underlying "metrics" that determine the ability of a protein identification package to maximise correct identifications while limiting false discovery rates. There is currently no software available specifically for the simple implementation and analysis of arbitrary novel metrics for peptide matching and for the exploration of fragmentation patterns for a given dataset. Results We present Harvest: an open source software tool for analysing fragmentation patterns and assessing the power of a new piece of information about the MS/MS fragmentation process to more clearly differentiate between correct and random peptide assignments. We demonstrate this functionality using data metrics derived from the properties of individual datasets in a peptide identification context. Using Harvest, we demonstrate how the development of such metrics may improve correct peptide assignment confidence in the context of a high-throughput proteomics experiment and characterise properties of peptide fragmentation. Conclusions Harvest provides a simple framework in C++ for analysing and prototyping metrics for peptide matching, the core of the protein identification problem. It is not a protein identification package and answers a different research question to packages such as Sequest, Mascot, X!Tandem, and other protein identification packages. It does not aim to maximise the number of assigned peptides from a set of unknown spectra, but instead provides a method by which researchers can explore fragmentation properties and assess the power of novel metrics for peptide matching in the context of a given experiment. Metrics developed using Harvest may then become candidates for later integration into protein identification packages.
Collapse
Affiliation(s)
- Leo C McHugh
- Discipline of Medicine, Sydney Medical School, University of Sydney, Sydney, Australia
| | | |
Collapse
|
25
|
Renard BY, Timm W, Kirchner M, Steen JAJ, Hamprecht FA, Steen H. Estimating the confidence of peptide identifications without decoy databases. Anal Chem 2010; 82:4314-8. [PMID: 20455556 DOI: 10.1021/ac902892j] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Using decoy databases to compute the confidence of peptide identifications has become the standard procedure for mass spectrometry driven proteomics. While decoy databases have numerous advantages, they double the run time and are not applicable to all peptide identification problems such as error-tolerant or de novo searches or the large-scale identification of cross-linked peptides. Instead, we propose a fast, simple and robust mixture modeling approach to estimate the confidence of peptide identifications without the need for decoy database searches, which automatically checks whether its underlying assumptions are fulfilled. This approach is then evaluated on 41 LC/MS data sets of varying complexity and origin. The results are very similar to those of the decoy database strategy at a negligible computational cost. Our approach is applicable not only to standard protein identification workflows, but also to proteomics problems for which meaningful decoy databases cannot be constructed.
Collapse
Affiliation(s)
- Bernhard Y Renard
- Interdisciplinary Center for Scientific Computing, University of Heidelberg, Speyerer Strasse 6, 69115 Heidelberg, Germany
| | | | | | | | | | | |
Collapse
|
26
|
Bell LN, Lee L, Saxena R, Bemis KG, Wang M, Theodorakis JL, Vuppalanchi R, Alloosh M, Sturek M, Chalasani N. Serum proteomic analysis of diet-induced steatohepatitis and metabolic syndrome in the Ossabaw miniature swine. Am J Physiol Gastrointest Liver Physiol 2010; 298:G746-54. [PMID: 20167877 PMCID: PMC3774260 DOI: 10.1152/ajpgi.00485.2009] [Citation(s) in RCA: 41] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 01/31/2023]
Abstract
We recently developed a nutritional model of steatohepatitis and metabolic syndrome in Ossabaw pigs. Here we describe changes in the serum proteome of pigs fed standard chow (control group; n = 7), atherogenic diet (n = 5), or modified atherogenic diet (M-ath diet group; n = 6). Pigs fed atherogenic diet developed metabolic syndrome and mildly abnormal liver histology, whereas pigs fed M-ath diet exhibited severe metabolic syndrome and liver injury closely resembling human nonalcoholic steatohepatitis (NASH). Using a label-free mass spectrometry-based proteomics approach, we identified 1,096 serum proteins, 162 of which changed significantly between any two diet groups (false discovery rate <5%). Biological classification of proteins with significant changes revealed functions previously implicated in development of NASH in humans, including immune system regulation and inflammation (orosomucoid 1, serum amyloid P component, paraoxonase 1, protein similar to alpha-2-macroglobulin precursor, beta-2-microglobulin, p101 protein, and complement components 2 and C8G), lipid metabolism (apolipoproteins C-III, E, E precursor, B, and N), structural and extracellular matrix proteins (transthyretin and endopeptidase 24.16 type M2), and coagulation [carboxypeptidase B2 (plasma)]. Several proteins with significant differential expression in pigs were also identified in our recent human proteomics study as changing significantly in serum from patients across the spectrum of nonalcoholic fatty liver disease, including apolipoproteins C-III and B, orosomucoid 1, serum amyloid P component, transthyretin, paraoxonase 1, and a protein similar to alpha-2-macroglobulin precursor. This serum proteomic analysis provides additional information about the pathogenesis of NASH and further characterizes our large animal model of diet-induced steatohepatitis and metabolic syndrome in Ossabaw pigs.
Collapse
Affiliation(s)
- Lauren N. Bell
- 1Division of Clinical Pharmacology, ,2Division of Gastroenterology/Hepatology,
| | - Lydia Lee
- 2Division of Gastroenterology/Hepatology,
| | - Romil Saxena
- 2Division of Gastroenterology/Hepatology, ,4Department of Pathology and Laboratory Medicine,
| | | | - Mu Wang
- 3Monarch LifeSciences, ,5Department of Biochemistry and Molecular Biology, and
| | | | - Raj Vuppalanchi
- 1Division of Clinical Pharmacology, ,2Division of Gastroenterology/Hepatology,
| | - Mouhamad Alloosh
- 6Department of Cellular and Integrative Physiology, Indiana University School of Medicine, Indianapolis, Indiana
| | - Michael Sturek
- 6Department of Cellular and Integrative Physiology, Indiana University School of Medicine, Indianapolis, Indiana
| | - Naga Chalasani
- 1Division of Clinical Pharmacology, ,2Division of Gastroenterology/Hepatology,
| |
Collapse
|
27
|
Werner SR, Saha JK, Broderick CL, Zhen EY, Higgs RE, Duffin KL, Smith RC. Proteomic analysis of demyelinated and remyelinating brain tissue following dietary cuprizone administration. J Mol Neurosci 2010; 42:210-25. [PMID: 20401640 DOI: 10.1007/s12031-010-9354-9] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2010] [Accepted: 03/18/2010] [Indexed: 10/19/2022]
Abstract
Cuprizone intoxication is a commonly used model of demyelination that allows the temporal separation of demyelination and remyelination. The underlying biochemical alterations leading to demyelination, using this model, remain unclear and may be multifold. Analysis of proteomic changes within the brains of cuprizone-exposed animals may help elucidate key cellular processes. In the current study, we report the results of the liquid chromatography tandem mass spectrometry analysis of total protein from the brain hemispheres of control and toxin-exposed mice at 6 weeks of exposure and after 3 and 6 weeks of recovery to identify protein changes during the remyelination phase. We found that at 6 weeks of cuprizone exposure, myelin proteins were reduced compared to controls and increased throughout the course of recovery, as expected. In contrast, other protein groups, such as proteins related to mitochondrial function, were increased at 6 weeks of treatment compared to untreated controls and returned toward control levels following withdrawal of toxin. These results suggest that a global proteomic analysis of the brain tissue of cuprizone-treated mice can identify changes related to the demyelination/remyelination process.
Collapse
Affiliation(s)
- Sean R Werner
- Biotechnology Discovery Research, Eli Lilly and Company, Lilly Research Laboratories, Lilly Corporate Center, Indianapolis, IN 46285, USA.
| | | | | | | | | | | | | |
Collapse
|
28
|
Bell LN, Theodorakis JL, Vuppalanchi R, Saxena R, Bemis KG, Wang M, Chalasani N. Serum proteomics and biomarker discovery across the spectrum of nonalcoholic fatty liver disease. Hepatology 2010; 51:111-20. [PMID: 19885878 PMCID: PMC2903216 DOI: 10.1002/hep.23271] [Citation(s) in RCA: 103] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Abstract
UNLABELLED Nonalcoholic fatty liver disease (NAFLD), ranging from relatively benign simple steatosis to progressive nonalcoholic steatohepatitis (NASH) and fibrosis, is an increasingly common chronic liver disease. Liver biopsy is currently the only reliable tool for staging the subtypes of NAFLD; therefore, noninvasive serum biomarkers for evaluation of liver disease and fibrosis are urgently needed. We performed this study to describe changes in the serum proteome and identify biomarker candidates in serum samples from 69 patients with varying stages of NAFLD (simple steatosis, NASH, and NASH with advanced bridging [F3/F4] fibrosis) and 16 obese controls. Using a label-free mass spectrometry-based approach we identified over 1,700 serum proteins with a peptide identification (ID) confidence level of >75%, 605 of which changed significantly between any two patient groups (false discovery rate <5%). Importantly, expression levels of 55 and 15 proteins changed significantly between the simple steatosis and NASH F3/F4 group and the NASH and NASH F3/F4 group, respectively. Classification of proteins with significant changes showed involvement in immune system regulation and inflammation, coagulation, cellular and extracellular matrix structure and function, and roles as carrier proteins in the blood. Further, many of these proteins are synthesized exclusively by the liver and could potentially serve as diagnostic biomarkers for identifying and staging NAFLD. CONCLUSION This proteomic analysis reveals important information regarding the pathogenesis/progression of NAFLD and NASH and demonstrates key changes in serum protein expression levels between control subjects and patients with different stages of fatty liver. Future validation of these potential biomarkers is needed such that these proteins may be used in place of liver biopsy to facilitate diagnosis and treatment of patients with NAFLD.
Collapse
Affiliation(s)
- Lauren N. Bell
- Division of Clinical Pharmacology, Indiana University, Indianapolis, IN, Division of Gastroenterology/Hepatology, Indiana University, Indianapolis, IN
| | | | - Raj Vuppalanchi
- Division of Clinical Pharmacology, Indiana University, Indianapolis, IN, Division of Gastroenterology/Hepatology, Indiana University, Indianapolis, IN
| | - Romil Saxena
- Division of Gastroenterology/Hepatology, Indiana University, Indianapolis, IN, Department of Pathology and Laboratory Medicine, Indiana University, Indianapolis, IN
| | | | - Mu Wang
- Monarch LifeSciences, Indianapolis, IN, Department of Biochemistry and Molecular Biology, Indiana University, Indianapolis, IN
| | - Naga Chalasani
- Division of Clinical Pharmacology, Indiana University, Indianapolis, IN, Division of Gastroenterology/Hepatology, Indiana University, Indianapolis, IN
| |
Collapse
|
29
|
Menschaert G, Vandekerckhove TTM, Landuyt B, Hayakawa E, Schoofs L, Luyten W, Van Criekinge W. Spectral clustering in peptidomics studies helps to unravel modification profile of biologically active peptides and enhances peptide identification rate. Proteomics 2009; 9:4381-8. [PMID: 19658089 DOI: 10.1002/pmic.200900248] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023]
Abstract
When studying the set of biologically active peptides (the so-called peptidome) of a cell type, organ, or entire organism, the identification of peptides is mostly attempted by MS. However, identification rates are often dismally unsatisfactory. A great deal of failed or missed identifications may be attributable to the wealth of modifications on peptides, some of which may originate from in vivo post-translational processes to activate the molecule, whereas others could be introduced during the tissue preparation procedures. Preliminary knowledge of the modification profile of specific peptidome samples would greatly improve identification rates. To this end we developed an approach that performs clustering of mass spectra in a way that allows us to group spectra having similar peak patterns over significant segments. Comparing members of one spectral group enables us to assess the modifications (expressed as mass shifts in Dalton) present in a peptidome sample. The clustering algorithm in this study is called Bonanza, and it was applied to MALDI-TOF/TOF MS spectra from the mouse. Peptide identification rates went up from 17 to 36% for 278 spectra obtained from the pancreatic islets and from 21 to 43% for 163 pituitary spectra. Spectral clustering with subsequent advanced database search may result in the discovery of new biologically active peptides and modifications thereof, as shown by this report indeed.
Collapse
Affiliation(s)
- Gerben Menschaert
- Department of Molecular Biotechnology, Faculty of Bioscience Engineering, Laboratory for Bioinformatics and Computational Genomics, Ghent University, Ghent, Belgium.
| | | | | | | | | | | | | |
Collapse
|
30
|
Salem M, Kenney PB, Rexroad CE, Yao J. Proteomic signature of muscle atrophy in rainbow trout. J Proteomics 2009; 73:778-89. [PMID: 19903543 DOI: 10.1016/j.jprot.2009.10.014] [Citation(s) in RCA: 35] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2009] [Revised: 10/13/2009] [Accepted: 10/31/2009] [Indexed: 02/06/2023]
Abstract
Muscle deterioration arises as a physiological response to elevated energetic demands of fish during sexual maturation and spawning. Previously, we used this model to characterize the transcriptomic mechanisms associated with fish muscle degradation and identified potential biological markers of muscle growth and quality. However, transcriptional measurements do not necessarily reflect changes in active mature proteins. Here we report the characterization of proteomic profile in degenerating muscle of rainbow trout in relation to the female reproductive cycle using a LC/MS-based label-free protein quantification method. A total of 146 significantly changed proteins in atrophying muscles (FDR <5%) was identified. Proteins were clustered according to their gene ontology identifiers. Muscle atrophy was associated with decreased abundance in proteins of anaerobic respiration, protein biosynthesis, monooxygenases, follistatins, and myogenin, as well as growth hormone, interleukin-1 and estrogen receptors. In contrast, proteins of MAPK/ERK kinase, glutamine synthetase, transcription factors, Stat3, JunB, Id2, and NFkappaB inhibitor, were greater in atrophying muscle. These changes are discussed in light of the mammalian muscle atrophy paradigm and proposed fish-specific mechanisms of muscle degradation. These data will help identify genes associated with muscle degeneration and superior flesh quality in rainbow trout, facilitating identification of genetic markers for muscle growth and quality.
Collapse
Affiliation(s)
- Mohamed Salem
- Laboratory of Animal Biotechnology and Genomics, Division of Animal and Nutritional Sciences, West Virginia University, Morgantown, WV 26506-6108, United States
| | | | | | | |
Collapse
|
31
|
Klammer AA, Park CY, Noble WS. Statistical calibration of the SEQUEST XCorr function. J Proteome Res 2009; 8:2106-13. [PMID: 19275164 DOI: 10.1021/pr8011107] [Citation(s) in RCA: 47] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Obtaining accurate peptide identifications from shotgun proteomics liquid chromatography tandem mass spectrometry (LC-MS/MS) experiments requires a score function that consistently ranks correct peptide-spectrum matches (PSMs) above incorrect matches. We have observed that, for the Sequest score function Xcorr, the inability to discriminate between correct and incorrect PSMs is due in part to spectrum-specific properties of the score distribution. In other words, some spectra score well regardless of which peptides they are scored against, and other spectra score well because they are scored against a large number of peptides. We describe a protocol for calibrating PSM score functions, and we demonstrate its application to Xcorr and the preliminary Sequest score function Sp. The protocol accounts for spectrum- and peptide-specific effects by calculating p values for each spectrum individually, using only that spectrum's score distribution. We demonstrate that these calculated p values are uniform under a null distribution and therefore accurately measure significance. These p values can be used to estimate the false discovery rate, therefore, eliminating the need for an extra search against a decoy database. In addition, we show that the pvalues are better calibrated than their underlying scores; consequently, when ranking top-scoring PSMs from multiple spectra, p values are better at discriminating between correct and incorrect PSMs. The calibration protocol is generally applicable to any PSM score function for which an appopriate parametric family can be identified.
Collapse
Affiliation(s)
- Aaron A Klammer
- Department of Genome Sciences, University of Washington, Seattle, Washington 98195, USA
| | | | | |
Collapse
|
32
|
Salmi J, Nyman TA, Nevalainen OS, Aittokallio T. Filtering strategies for improving protein identification in high-throughput MS/MS studies. Proteomics 2009; 9:848-60. [PMID: 19160393 DOI: 10.1002/pmic.200800517] [Citation(s) in RCA: 28] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
Despite the recent advances in streamlining high-throughput proteomic pipelines using tandem mass spectrometry (MS/MS), reliable identification of peptides and proteins on a larger scale has remained a challenging task, still involving a considerable degree of user interaction. Recently, a number of papers have proposed computational strategies both for distinguishing poor MS/MS spectra prior to database search (pre-filtering) as well as for verifying the peptide identifications made by the search programs (post-filtering). Both of these filtering approaches can be very beneficial to the overall protein identification pipeline, since they can remove a substantial part of the time consuming manual validation work and convert large sets of MS/MS spectra into more reliable and interpretable proteome information. The choice of the filtering method depends both on the properties of the data and on the goals of the experiment. This review discusses the different pre- and post-filtering strategies available to the researchers, together with their relative merits and potential pitfalls. We also highlight some additional research topics, such as spectral denoising and statistical assessment of the identification results, which aim at further improving the coverage and accuracy of high-throughput protein identification studies.
Collapse
Affiliation(s)
- Jussi Salmi
- Department of Information Technology, University of Turku, Turku, Finland.
| | | | | | | |
Collapse
|
33
|
Saxena C, Bonacci TM, Huss KL, Bloem LJ, Higgs RE, Hale JE. Capture of Drug Targets from Live Cells Using a Multipurpose Immuno-Chemo-Proteomics Tool. J Proteome Res 2009; 8:3951-7. [DOI: 10.1021/pr900277x] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Affiliation(s)
- Chaitanya Saxena
- Integrative Biology, Lilly Research Laboratories, Eli Lilly and Company, Indianapolis, Indiana 46285
| | - Tabetha M. Bonacci
- Integrative Biology, Lilly Research Laboratories, Eli Lilly and Company, Indianapolis, Indiana 46285
| | - Karen L. Huss
- Integrative Biology, Lilly Research Laboratories, Eli Lilly and Company, Indianapolis, Indiana 46285
| | - Laura J. Bloem
- Integrative Biology, Lilly Research Laboratories, Eli Lilly and Company, Indianapolis, Indiana 46285
| | - Richard E. Higgs
- Integrative Biology, Lilly Research Laboratories, Eli Lilly and Company, Indianapolis, Indiana 46285
| | - John E. Hale
- Integrative Biology, Lilly Research Laboratories, Eli Lilly and Company, Indianapolis, Indiana 46285
| |
Collapse
|
34
|
Jorrín-Novo JV, Maldonado AM, Echevarría-Zomeño S, Valledor L, Castillejo MA, Curto M, Valero J, Sghaier B, Donoso G, Redondo I. Plant proteomics update (2007–2008): Second-generation proteomic techniques, an appropriate experimental design, and data analysis to fulfill MIAPE standards, increase plant proteome coverage and expand biological knowledge. J Proteomics 2009; 72:285-314. [DOI: 10.1016/j.jprot.2009.01.026] [Citation(s) in RCA: 174] [Impact Index Per Article: 11.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
|
35
|
Edwards N, Wu X, Tseng CW. An Unsupervised, Model-Free, Machine-Learning Combiner for Peptide Identifications from Tandem Mass Spectra. Clin Proteomics 2009. [DOI: 10.1007/s12014-009-9024-5] [Citation(s) in RCA: 47] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/14/2023] Open
Abstract
Abstract
As the speed of mass spectrometers, sophistication of sample fractionation, and complexity of experimental designs increase, the volume of tandem mass spectra requiring reliable automated analysis continues to grow. Software tools that quickly, effectively, and robustly determine the peptide associated with each spectrum with high confidence are sorely needed. Currently available tools that postprocess the output of sequence-database search engines use three techniques to distinguish the correct peptide identifications from the incorrect: statistical significance re-estimation, supervised machine learning scoring and prediction, and combining or merging of search engine results. We present a unifying framework that encompasses each of these techniques in a single model-free machine-learning framework that can be trained in an unsupervised manner. The predictor is trained on the fly for each new set of search results without user intervention, making it robust for different instruments, search engines, and search engine parameters. We demonstrate the performance of the technique using mixtures of known proteins and by using shuffled databases to estimate false discovery rates, from data acquired on three different instruments with two different ionization technologies. We show that this approach outperforms machine-learning techniques applied to a single search engine’s output, and demonstrate that combining search engine results provides additional benefit. We show that the performance of the commercial Mascot tool can be bested by the machine-learning combination of two open-source tools X!Tandem and OMSSA, but that the use of all three search engines boosts performance further still. The Peptide identification Arbiter by Machine Learning (PepArML) unsupervised, model-free, combining framework can be easily extended to support an arbitrary number of additional searches, search engines, or specialized peptide–spectrum match metrics for each spectrum data set. PepArML is open-source and is available from http://peparml.sourceforge.net.
Collapse
|
36
|
Proteomic analysis of HCV cirrhosis and HCV-induced HCC: identifying biomarkers for monitoring HCV-cirrhotic patients awaiting liver transplantation. Transplantation 2009; 87:143-52. [PMID: 19136905 DOI: 10.1097/tp.0b013e318191c68d] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
BACKGROUND Progression from chronic hepatitis C virus (HCV) infection to cirrhosis and hepatocellular carcinoma (HCC) results in protein changes in the peripheral blood. We evaluated global protein expression in plasma samples of HCV-cirrhotic and HCV-cirrhotic-HCC patients. PATIENTS AND METHODS Plasma samples from 25 HCV-cirrhotic-HCC and 10 HCV-cirrhotic patients were quantitatively evaluated for protein expression. Tryptic peptides were analyzed using Thermo linear ion-trap mass spectrometer (LTQ) coupled with a Surveyor HPLC system (Thermo). SEQUEST and X!Tandem database search algorithms were used for peptide sequence identification. Protein relative quantification was performed using the area under the curve from the select ion chromatogram. A significant fold change between groups was based on controlling the false discovery rate (FDR) at less than 5%. RESULTS We identified and quantified 2320 proteins from the analysis of the different protein pattern between HCV-cirrhosis and HCV-HCC samples. Gene ontology terms classified the more important biologic process related to these proteins as signal transduction, regulation of transcription DNA-dependent, protein amino acid phosphorylation, cell adhesion, transport, and immune response. Seven proteins showed significant expression changes with a FDR less than 5% between cirrhosis and tumor groups. Moreover, 18 proteins showed significant expression changes (FDR <5%) when plasma samples from HCV-cirrhosis were compared with early HCV-HCC. CONCLUSIONS Differential protein expression was observed between samples from HCV patients with cirrhosis with and without HCC. Also, differences were observed between early and advanced HCV-HCC samples. This study provides important information for discovery of potential biomarkers for early HCC diagnosis in HCV cirrhotic patients.
Collapse
|
37
|
Differential effects of ethanol in the nucleus accumbens shell of alcohol-preferring (P), alcohol-non-preferring (NP) and Wistar rats: a proteomics study. Pharmacol Biochem Behav 2009; 92:304-13. [PMID: 19166871 DOI: 10.1016/j.pbb.2008.12.019] [Citation(s) in RCA: 45] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/24/2008] [Revised: 12/15/2008] [Accepted: 12/20/2008] [Indexed: 11/22/2022]
Abstract
The objective of this study was to determine the effects of ethanol injections on protein expression in the nucleus accumbens shell (ACB-sh) of alcohol-preferring (P), alcohol-non-preferring (NP) and Wistar (W) rats. Rats were injected for 5 consecutive days with either saline or 1 g/kg ethanol; 24 h after the last injection, rats were killed and brains obtained. Micro-punch samples of the ACB-sh were homogenized; extracted proteins were subjected to trypsin digestion and analyzed with a liquid chromatography-mass spectrometer procedure. Ethanol changed expression levels (1.15-fold or higher) of 128 proteins in NP rats, 22 proteins in P, and 28 proteins in W rats. Few of the changes observed with ethanol treatment for NP rats were observed for P and W rats. Many of the changes occurred in calcium-calmodulin signaling systems, G-protein signaling systems, synaptic structure and histones. Approximately half the changes observed in the ACB-sh of P rats were also observed for W rats. Overall, the results indicate a unique response to ethanol of the ACB-sh of NP rats compared to P and W rats; this unique response may reflect changes in neuronal function in the ACB-sh that could contribute to the low alcohol drinking behavior of the NP line.
Collapse
|
38
|
Jiang X, Dong X, Ye M, Zou H. Instance Based Algorithm for Posterior Probability Calculation by Target−Decoy Strategy to Improve Protein Identifications. Anal Chem 2008; 80:9326-35. [DOI: 10.1021/ac8017229] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Affiliation(s)
- Xinning Jiang
- National Chromatographic R&A Center, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, Dalian 116023, China, Graduate School of Chinese Academy of Sciences, Beijing 100049, China, and Department of Chemistry, Xixi Campus, Zhejiang University, Hangzhou 310028, China
| | - Xiaoli Dong
- National Chromatographic R&A Center, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, Dalian 116023, China, Graduate School of Chinese Academy of Sciences, Beijing 100049, China, and Department of Chemistry, Xixi Campus, Zhejiang University, Hangzhou 310028, China
| | - Mingliang Ye
- National Chromatographic R&A Center, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, Dalian 116023, China, Graduate School of Chinese Academy of Sciences, Beijing 100049, China, and Department of Chemistry, Xixi Campus, Zhejiang University, Hangzhou 310028, China
| | - Hanfa Zou
- National Chromatographic R&A Center, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, Dalian 116023, China, Graduate School of Chinese Academy of Sciences, Beijing 100049, China, and Department of Chemistry, Xixi Campus, Zhejiang University, Hangzhou 310028, China
| |
Collapse
|
39
|
Käll L, Storey JD, Noble WS. Non-parametric estimation of posterior error probabilities associated with peptides identified by tandem mass spectrometry. Bioinformatics 2008; 24:i42-8. [PMID: 18689838 DOI: 10.1093/bioinformatics/btn294] [Citation(s) in RCA: 115] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION A mass spectrum produced via tandem mass spectrometry can be tentatively matched to a peptide sequence via database search. Here, we address the problem of assigning a posterior error probability (PEP) to a given peptide-spectrum match (PSM). This problem is considerably more dif.cult than the related problem of estimating the error rate associated with a large collection of PSMs. Existing methods for estimating PEPs rely on a parametric or semiparametric model of the underlying score distribution. RESULTS We demonstrate how to apply non-parametric logistic regression to this problem. The method makes no explicit assumptions about the form of the underlying score distribution; instead, the method relies upon decoy PSMs, produced by searching the spectra against a decoy sequence database, to provide a model of the null score distribution. We show that our non-parametric logistic regression method produces accurate PEP estimates for six different commonly used PSM score functions. In particular, the estimates produced by our method are comparable in accuracy to those of PeptideProphet, which uses a parametric or semiparametric model designed speci.cally to work with SEQUEST. The advantage of the non-parametric approach is applicability and robustness to new score functions and new types of data. AVAILABILITY C++ code implementing the method as well as supplementary information is available at http://noble.gs. washington.edu/proj/qvality
Collapse
Affiliation(s)
- Lukas Käll
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | | | | |
Collapse
|
40
|
Kim S, Gupta N, Pevzner PA. Spectral probabilities and generating functions of tandem mass spectra: a strike against decoy databases. J Proteome Res 2008; 7:3354-63. [PMID: 18597511 PMCID: PMC2689316 DOI: 10.1021/pr8001244] [Citation(s) in RCA: 332] [Impact Index Per Article: 20.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
Abstract
A key problem in computational proteomics is distinguishing between correct and false peptide identifications. We argue that evaluating the error rates of peptide identifications is not unlike computing generating functions in combinatorics. We show that the generating functions and their derivatives ( spectral energy and spectral probability) represent new features of tandem mass spectra that, similarly to Delta-scores, significantly improve peptide identifications. Furthermore, the spectral probability provides a rigorous solution to the problem of computing statistical significance of spectral identifications. The spectral energy/probability approach improves the sensitivity-specificity tradeoff of existing MS/MS search tools, addresses the notoriously difficult problem of "one-hit-wonders" in mass spectrometry, and often eliminates the need for decoy database searches. We therefore argue that the generating function approach has the potential to increase the number of peptide identifications in MS/MS searches.
Collapse
Affiliation(s)
- Sangtae Kim
- Department of Computer Science and Engineering, University of California San Diego, La Jolla CA 92093, USA
| | | | | |
Collapse
|
41
|
Saxena C, Zhen E, Higgs RE, Hale JE. An immuno-chemo-proteomics method for drug target deconvolution. J Proteome Res 2008; 7:3490-7. [PMID: 18590316 DOI: 10.1021/pr800222q] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Chemical proteomics is an emerging technique for drug target deconvolution and profiling the toxicity of known drugs. With the use of this technique, the specificity of a small molecule inhibitor toward its potential targets can be characterized and information thus obtained can be used in optimizing lead compounds. Most commonly, small molecules are immobilized on solid supports and used as affinity chromatography resins to bind targets. However, it is difficult to evaluate the effect of immobilization on the affinity of the compounds to their targets. Here, we describe the development and application of a soluble probe where a small molecule was coupled with a peptide epitope which was used to affinity isolate binding proteins from cell lysate. The soluble probe allowed direct verification that the compound after coupling with peptide epitope retained its binding characteristics. The PKC-alpha inhibitor Bisindolylmaleimide-III was coupled with a peptide containing the FLAG epitope. Following incubation with cellular lysates, the compound and associated proteins were affinity isolated using anti-FLAG antibody beads. Using this approach, we identified the known Bisindolylmaleimide-III targets, PKC-alpha, GSK3-beta, CaMKII, adenosine kinase, CDK2, and quinine reductase type 2, as well as previously unidentified targets PKAC-alpha, prohibitin, VDAC and heme binding proteins. This method was directly compared to the solid-phase method (small molecule was immobilized to a solid support) providing an orthogonal strategy to aid in target deconvolution and help to eliminate false positives originating from nonspecific binding of the proteins to the matrix.
Collapse
Affiliation(s)
- Chaitanya Saxena
- Integrative Biology, Greenfield Laboratories, Eli Lilly and Company, Greenfield, IN 46140, USA.
| | | | | | | |
Collapse
|
42
|
Wang M, You J, Bemis KG, Tegeler TJ, Brown DPG. Label-free mass spectrometry-based protein quantification technologies in proteomic analysis. BRIEFINGS IN FUNCTIONAL GENOMICS AND PROTEOMICS 2008; 7:329-39. [PMID: 18579615 DOI: 10.1093/bfgp/eln031] [Citation(s) in RCA: 99] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]
Abstract
Major technological advances have made proteomics an extremely active field for biomarker discovery and validation in recent years. These improvements have lead to an increased emphasis on larger scale, faster and more efficient methods for protein biomarker discoveries in human tissues, cells and biofluids. However, most current proteomic methodologies for biomarker discovery and validation are not highly automated and generally labour intensive and expensive. Improved automation as well as software programs capable of handling a large amount of data are essential in order to reduce the cost of discovery and increase the throughput. In this review, we will discuss and describe the label-free mass spectrometry-based protein quantification technologies and a case study utilizing one of these methods for biomarker discovery.
Collapse
Affiliation(s)
- Mu Wang
- Department of Biochemistry and Molecular Biology, Indiana University School of Medicine, 1345 W, 16th Street, Room 312, Indianapolis, IN 46202, USA.
| | | | | | | | | |
Collapse
|
43
|
Shen Y, Tolić N, Hixson KK, Purvine SO, Pasa-Tolić L, Qian WJ, Adkins JN, Moore RJ, Smith RD. Proteome-wide identification of proteins and their modifications with decreased ambiguities and improved false discovery rates using unique sequence tags. Anal Chem 2008; 80:1871-82. [PMID: 18271604 PMCID: PMC2600587 DOI: 10.1021/ac702328x] [Citation(s) in RCA: 42] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Identifying proteins and their modification states and with known levels of confidence remains as a significant challenge for proteomics. Random or decoy peptide databases are increasingly being used to estimate the false discovery rate (FDR), e.g., from liquid chromatography-tandem mass spectrometry (LC-MS/MS) analyses of tryptic digests. We show that this approach can significantly underestimate the FDR and describe an approach for more confident protein identifications that uses unique partial sequences derived from a combination of database searching and amino acid residue sequencing using high-accuracy MS/MS data. Applied to a Saccharomyces cerevisiae tryptic digest, the approach provided 3 132 confident peptide identifications ( approximately 5% modified in some fashion), covering 575 proteins with an estimated zero FDR. The conventional approach provided 3 359 peptide identifications and 656 proteins with 0.3% FDR based upon a decoy database analysis. However, the present approach revealed approximately 5% of the 3 359 identifications to be incorrect and many more as potentially ambiguous (e.g., due to not considering certain amino acid substitutions and modifications). In addition, 677 peptides and 39 proteins were identified that had been missed by conventional analysis, including nontryptic peptides, peptides with a variety of expected/unexpected chemical modifications, known/unknown post-translational modifications, single nucleotide polymorphisms or gene encoding errors, and multiple modifications of individual peptides.
Collapse
Affiliation(s)
- Yufeng Shen
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, Washington 99352, USA
| | | | | | | | | | | | | | | | | |
Collapse
|
44
|
Higgs RE, Knierman MD, Gelfanova V, Butler JP, Hale JE. Label-free LC-MS method for the identification of biomarkers. Methods Mol Biol 2008; 428:209-230. [PMID: 18287776 DOI: 10.1007/978-1-59745-117-8_12] [Citation(s) in RCA: 33] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/25/2023]
Abstract
Pharmaceutical companies and regulatory agencies are pursuing biomarkers as a means to increase the productivity of drug development. Quantifying differential levels of proteins from complex biological samples like plasma or cerebrospinal fluid is one specific approach being used to identify markers of drug action, efficacy, toxicity, etc. Academic investigators are also interested in markers that are diagnostic or prognostic of disease states. We report a comprehensive, fully automated, and label-free approach to relative protein quantification including: sample preparation, proteolytic protein digestion, LCMS/MS data acquisition, de-noising, mass and charge state estimation, chromatographic alignment, and peptide quantification via integration of extracted ion chromatograms. Additionally, we describe methods for transformation and normalization of the quantitative peptide levels in multiplexed measurements to improve precision for statistical analysis. Lastly, we outline how the described methods can be used to design and power biomarker discovery studies.
Collapse
|
45
|
Choi H, Nesvizhskii AI. Semisupervised Model-Based Validation of Peptide Identifications in Mass Spectrometry-Based Proteomics. J Proteome Res 2008; 7:254-65. [DOI: 10.1021/pr070542g] [Citation(s) in RCA: 119] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
46
|
Ackermann BL, Berna MJ, Eckstein JA, Ott LW, Chaudhary AK. Current applications of liquid chromatography/mass spectrometry in pharmaceutical discovery after a decade of innovation. ANNUAL REVIEW OF ANALYTICAL CHEMISTRY (PALO ALTO, CALIF.) 2008; 1:357-396. [PMID: 20636083 DOI: 10.1146/annurev.anchem.1.031207.112855] [Citation(s) in RCA: 36] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/29/2023]
Abstract
Current drug discovery involves a highly iterative process pertaining to three core disciplines: biology, chemistry, and drug disposition. For most pharmaceutical companies the path to a drug candidate comprises similar stages: target identification, biological screening, lead generation, lead optimization, and candidate selection. Over the past decade, the overall efficiency of drug discovery has been greatly improved by a single instrumental technique, liquid chromatography/mass spectrometry (LC/MS). Transformed by the commercial introduction of the atmospheric pressure ionization interface in the mid-1990s, LC/MS has expanded into almost every area of drug discovery. In many cases, drug discovery workflow has been changed owing to vastly improved efficiency. This review examines recent trends for these three core disciplines and presents seminal examples where LC/MS has altered the current approach to drug discovery.
Collapse
Affiliation(s)
- Bradley L Ackermann
- Eli Lilly and Company, Greenfield Laboratories, Greenfield, Indiana 46140, USA.
| | | | | | | | | |
Collapse
|
47
|
Käll L, Storey JD, MacCoss MJ, Noble WS. Assigning Significance to Peptides Identified by Tandem Mass Spectrometry Using Decoy Databases. J Proteome Res 2008; 7:29-34. [DOI: 10.1021/pr700600n] [Citation(s) in RCA: 472] [Impact Index Per Article: 29.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
48
|
Choi H, Ghosh D, Nesvizhskii AI. Statistical validation of peptide identifications in large-scale proteomics using the target-decoy database search strategy and flexible mixture modeling. J Proteome Res 2007; 7:286-92. [PMID: 18078310 DOI: 10.1021/pr7006818] [Citation(s) in RCA: 95] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/23/2023]
Abstract
Reliable statistical validation of peptide and protein identifications is a top priority in large-scale mass spectrometry based proteomics. PeptideProphet is one of the computational tools commonly used for assessing the statistical confidence in peptide assignments to tandem mass spectra obtained using database search programs such as SEQUEST, MASCOT, or X! TANDEM. We present two flexible methods, the variable component mixture model and the semiparametric mixture model, that remove the restrictive parametric assumptions in the mixture modeling approach of PeptideProphet. Using a control protein mixture data set generated on an linear ion trap Fourier transform (LTQ-FT) mass spectrometer, we demonstrate that both methods improve parametric models in terms of the accuracy of probability estimates and the power to detect correct identifications controlling the false discovery rate to the same degree. The statistical approaches presented here require that the data set contain a sufficient number of decoy (known to be incorrect) peptide identifications, which can be obtained using the target-decoy database search strategy.
Collapse
Affiliation(s)
- Hyungwon Choi
- Department of Pathology and Biostatistics, University of Michigan, Ann Arbor, Michigan, USA
| | | | | |
Collapse
|
49
|
Choi H, Nesvizhskii AI. False discovery rates and related statistical concepts in mass spectrometry-based proteomics. J Proteome Res 2007; 7:47-50. [PMID: 18067251 DOI: 10.1021/pr700747q] [Citation(s) in RCA: 161] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
Abstract
Development of statistical methods for assessing the significance of peptide assignments to tandem mass spectra obtained using database searching remains an important problem. In the past several years, several different approaches have emerged, including the concept of expectation values, target-decoy strategy, and the probability mixture modeling approach of PeptideProphet. In this work, we provide a background on statistical significance analysis in the field of mass spectrometry-based proteomics, and present our perspective on the current and future developments in this area.
Collapse
Affiliation(s)
- Hyungwon Choi
- Department of Pathology, University of Michigan, Ann Arbor, Michigan 48109, USA
| | | |
Collapse
|
50
|
Tanner S, Payne SH, Dasari S, Shen Z, Wilmarth PA, David LL, Loomis WF, Briggs SP, Bafna V. Accurate annotation of peptide modifications through unrestrictive database search. J Proteome Res 2007; 7:170-81. [PMID: 18034453 DOI: 10.1021/pr070444v] [Citation(s) in RCA: 46] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Proteins are extensively modified after translation due to cellular regulation, signal transduction, or chemical damage. Peptide tandem mass spectrometry can discover post-translational modifications, as well as sequence polymorphisms. Recent efforts have studied modifications at the proteomic scale. In this context, it becomes crucial to assess the accuracy of modification discovery. We discuss methods to quantify the false discovery rate from a search and demonstrate how several features can be used to distinguish valid modifications from search artifacts. We present a tool, PTMFinder, which implements these methods. We summarize the corpus of post-translational modifications identified on large data sets. Thousands of known and novel modification sites are identified, including site-specific modifications conserved over vast evolutionary distances.
Collapse
Affiliation(s)
- Stephen Tanner
- Bioinformatics Program, University of California San Diego, La Jolla, California 92093, USA.
| | | | | | | | | | | | | | | | | |
Collapse
|