1
|
Ansari M, White AD. Learning peptide properties with positive examples only. DIGITAL DISCOVERY 2024; 3:977-986. [PMID: 38756224 PMCID: PMC11094695 DOI: 10.1039/d3dd00218g] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/05/2023] [Accepted: 03/30/2024] [Indexed: 05/18/2024]
Abstract
Deep learning can create accurate predictive models by exploiting existing large-scale experimental data, and guide the design of molecules. However, a major barrier is the requirement of both positive and negative examples in the classical supervised learning frameworks. Notably, most peptide databases come with missing information and low number of observations on negative examples, as such sequences are hard to obtain using high-throughput screening methods. To address this challenge, we solely exploit the limited known positive examples in a semi-supervised setting, and discover peptide sequences that are likely to map to certain antimicrobial properties via positive-unlabeled learning (PU). In particular, we use the two learning strategies of adapting base classifier and reliable negative identification to build deep learning models for inferring solubility, hemolysis, binding against SHP-2, and non-fouling activity of peptides, given their sequence. We evaluate the predictive performance of our PU learning method and show that by only using the positive data, it can achieve competitive performance when compared with the classical positive-negative (PN) classification approach, where there is access to both positive and negative examples.
Collapse
Affiliation(s)
- Mehrad Ansari
- Department of Chemical Engineering, University of Rochester Rochester NY 14627 USA
| | - Andrew D White
- Department of Chemical Engineering, University of Rochester Rochester NY 14627 USA
| |
Collapse
|
2
|
Guntuboina C, Das A, Mollaei P, Kim S, Barati Farimani A. PeptideBERT: A Language Model Based on Transformers for Peptide Property Prediction. J Phys Chem Lett 2023; 14:10427-10434. [PMID: 37956397 PMCID: PMC10683064 DOI: 10.1021/acs.jpclett.3c02398] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2023] [Revised: 11/04/2023] [Accepted: 11/07/2023] [Indexed: 11/15/2023]
Abstract
Recent advances in language models have enabled the protein modeling community with a powerful tool that uses transformers to represent protein sequences as text. This breakthrough enables a sequence-to-property prediction for peptides without relying on explicit structural data. Inspired by the recent progress in the field of large language models, we present PeptideBERT, a protein language model specifically tailored for predicting essential peptide properties such as hemolysis, solubility, and nonfouling. The PeptideBERT utilizes the ProtBERT pretrained transformer model with 12 attention heads and 12 hidden layers. Through fine-tuning the pretrained model for the three downstream tasks, our model is state of the art (SOTA) in predicting hemolysis, which is crucial for determining a peptide's potential to induce red blood cells as well as nonfouling properties. Leveraging primarily shorter sequences and a data set with negative samples predominantly associated with insoluble peptides, our model showcases remarkable performance.
Collapse
Affiliation(s)
- Chakradhar Guntuboina
- Department
of Electrical and Computer Engineering, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213, United States
| | - Adrita Das
- Department
of Biomedical Engineering, Carnegie Mellon
University, Pittsburgh, Pennsylvania 15213, United States
| | - Parisa Mollaei
- Department
of Mechanical Engineering, Carnegie Mellon
University, Pittsburgh, Pennsylvania 15213, United States
| | - Seongwon Kim
- Department
of Chemical Engineering, Carnegie Mellon
University, Pittsburgh, Pennsylvania 15213, United States
| | - Amir Barati Farimani
- Department
of Biomedical Engineering, Carnegie Mellon
University, Pittsburgh, Pennsylvania 15213, United States
- Department
of Chemical Engineering, Carnegie Mellon
University, Pittsburgh, Pennsylvania 15213, United States
- Machine
Learning Department, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213, United States
| |
Collapse
|
3
|
Ansari M, White AD. Learning Peptide Properties with Positive Examples Only. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.06.01.543289. [PMID: 37333233 PMCID: PMC10274696 DOI: 10.1101/2023.06.01.543289] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/20/2023]
Abstract
Deep learning can create accurate predictive models by exploiting existing large-scale experimental data, and guide the design of molecules. However, a major barrier is the requirement of both positive and negative examples in the classical supervised learning frameworks. Notably, most peptide databases come with missing information and low number of observations on negative examples, as such sequences are hard to obtain using high-throughput screening methods. To address this challenge, we solely exploit the limited known positive examples in a semi-supervised setting, and discover peptide sequences that are likely to map to certain antimicrobial properties via positive-unlabeled learning (PU). In particular, we use the two learning strategies of adapting base classifier and reliable negative identification to build deep learning models for inferring solubility, hemolysis, binding against SHP-2, and non-fouling activity of peptides, given their sequence. We evaluate the predictive performance of our PU learning method and show that by only using the positive data, it can achieve competitive performance when compared with the classical positive-negative (PN) classification approach, where there is access to both positive and negative examples.
Collapse
Affiliation(s)
- Mehrad Ansari
- Department of Chemical Engineering, University of Rochester, Rochester, NY, 14627, USA
| | - Andrew D. White
- Department of Chemical Engineering, University of Rochester, Rochester, NY, 14627, USA
| |
Collapse
|
4
|
Frutiger A, Tanno A, Hwu S, Tiefenauer RF, Vörös J, Nakatsuka N. Nonspecific Binding-Fundamental Concepts and Consequences for Biosensing Applications. Chem Rev 2021; 121:8095-8160. [PMID: 34105942 DOI: 10.1021/acs.chemrev.1c00044] [Citation(s) in RCA: 89] [Impact Index Per Article: 29.7] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
Nature achieves differentiation of specific and nonspecific binding in molecular interactions through precise control of biomolecules in space and time. Artificial systems such as biosensors that rely on distinguishing specific molecular binding events in a sea of nonspecific interactions have struggled to overcome this issue. Despite the numerous technological advancements in biosensor technologies, nonspecific binding has remained a critical bottleneck due to the lack of a fundamental understanding of the phenomenon. To date, the identity, cause, and influence of nonspecific binding remain topics of debate within the scientific community. In this review, we discuss the evolution of the concept of nonspecific binding over the past five decades based upon the thermodynamic, intermolecular, and structural perspectives to provide classification frameworks for biomolecular interactions. Further, we introduce various theoretical models that predict the expected behavior of biosensors in physiologically relevant environments to calculate the theoretical detection limit and to optimize sensor performance. We conclude by discussing existing practical approaches to tackle the nonspecific binding challenge in vitro for biosensing platforms and how we can both address and harness nonspecific interactions for in vivo systems.
Collapse
Affiliation(s)
- Andreas Frutiger
- Laboratory of Biosensors and Bioelectronics, Institute for Biomedical Engineering, ETH Zürich, Zürich CH-8092, Switzerland
| | - Alexander Tanno
- Laboratory of Biosensors and Bioelectronics, Institute for Biomedical Engineering, ETH Zürich, Zürich CH-8092, Switzerland
| | - Stephanie Hwu
- Laboratory of Biosensors and Bioelectronics, Institute for Biomedical Engineering, ETH Zürich, Zürich CH-8092, Switzerland
| | - Raphael F Tiefenauer
- Laboratory of Biosensors and Bioelectronics, Institute for Biomedical Engineering, ETH Zürich, Zürich CH-8092, Switzerland
| | - János Vörös
- Laboratory of Biosensors and Bioelectronics, Institute for Biomedical Engineering, ETH Zürich, Zürich CH-8092, Switzerland
| | - Nako Nakatsuka
- Laboratory of Biosensors and Bioelectronics, Institute for Biomedical Engineering, ETH Zürich, Zürich CH-8092, Switzerland
| |
Collapse
|
5
|
Kim MK, Yoon CS, Kim SG, Park YW, Lee SS, Lee SK. Effects of 4-Hexylresorcinol on Protein Expressions in RAW 264.7 Cells as Determined by Immunoprecipitation High Performance Liquid Chromatography. Sci Rep 2019; 9:3379. [PMID: 30833641 PMCID: PMC6399215 DOI: 10.1038/s41598-019-38946-4] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2018] [Accepted: 01/11/2019] [Indexed: 01/29/2023] Open
Abstract
4-Hexylresorcinol (4HR) is a small organic compound that is used as an additive antiseptic and antioxidant, but its molecular properties have not been clearly elucidated. The present study explored the cellular effects of 4HR on RAW 264.7 cells by immunoprecipitation high-performance liquid chromatography (IP-HPLC) using 216 antisera. 4HR-treated cells showed significant decreases in the expressions of proliferation-related proteins, cMyc/MAX/MAD network, p53/Rb/E2F and Wnt/β-catenin signalings, epigenetic modifications, and protein translation. Furthermore, 4HR suppressed the expressions of growth factors and proteins associated with RAS signaling, NFkB signaling, inflammation, and osteogenesis, but elevated the expressions of proteins associated with p53-mediated and FAS-mediated apoptosis, T-cell immunity, angiogenesis, antioxidant, and oncogenic signaling. In a 4HR adherence assay, TNFα, PKC, osteopontin, and GADD45 were strongly adherent to 4HR-coated beads, whereas IL-6, c-caspase 3, CDK4, and c-caspase 9 were not. Many 4HR adherent proteins were expressed at lower levels in 4HR treated RAW 264.7 cells than in non-treated controls, whereas 4HR non-adherent proteins were expressed at higher levels. These observations suggest 4HR affects the expressions of proteins in an adhesion-dependent manner and that its effects on proteins are characteristic and global in RAW 264.7 cells.
Collapse
Affiliation(s)
- Min Keun Kim
- Department of Oral and Maxillofacial Surgery, College of Dentistry, Gangneung-Wonju National University, and Institute of Oral Science, Gangneung, Korea
| | - Cheol Soo Yoon
- Department of Oral Pathology, College of Dentistry, Gangneung-Wonju National University, and Institute of Oral Science, Gangneung, Korea
| | - Seong Gon Kim
- Department of Oral and Maxillofacial Surgery, College of Dentistry, Gangneung-Wonju National University, and Institute of Oral Science, Gangneung, Korea
| | - Young Wook Park
- Department of Oral and Maxillofacial Surgery, College of Dentistry, Gangneung-Wonju National University, and Institute of Oral Science, Gangneung, Korea
| | - Sang Shin Lee
- Department of Oral Pathology, College of Dentistry, Gangneung-Wonju National University, and Institute of Oral Science, Gangneung, Korea
| | - Suk Keun Lee
- Department of Oral Pathology, College of Dentistry, Gangneung-Wonju National University, and Institute of Oral Science, Gangneung, Korea.
| |
Collapse
|
6
|
Barrett R, Jiang S, White AD. Classifying antimicrobial and multifunctional peptides with Bayesian network models. Pept Sci (Hoboken) 2018. [DOI: 10.1002/pep2.24079] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/15/2023]
Affiliation(s)
- Rainier Barrett
- Department of Chemical Engineering University of Rochester Rochester New York
| | - Shaoyi Jiang
- Department of Chemical Engineering University of Washington Seattle Washington
| | - Andrew D. White
- Department of Chemical Engineering University of Rochester Rochester New York
| |
Collapse
|
7
|
Wang W, Woodbury NW. Unstructured interactions between peptides and proteins: exploring the role of sequence motifs in affinity and specificity. Acta Biomater 2015; 11:88-95. [PMID: 25266506 DOI: 10.1016/j.actbio.2014.09.039] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2014] [Revised: 09/17/2014] [Accepted: 09/22/2014] [Indexed: 11/30/2022]
Abstract
Unstructured interactions between proteins and other molecules or surfaces are often described as nonspecific, and have received relatively little attention in terms of their role in biology. However, despite their lack of a specific binding structure, these unstructured interactions can in fact be very selective. The lack of a specific structure for these interactions makes them more difficult to study in a chemically meaningful way, but one approach is statistical, i.e. simply looking at a large number of different ligands and using that to understand the chemistry of binding. Surface-bound peptide arrays are useful in this regard, and have been used as a model previously for this purpose (Wang and Woodbury, 2014). In that study, the binding of several proteins, including β-galactosidase, to all possible dipeptides, tripeptides and tetrapeptides (using seven selected amino acids) was performed and analyzed in terms of the charge characteristics, hydrophobicity, etc., of the binding interaction. The current work builds upon that study by starting with a representative subset of the tetrapeptides characterized previously and either extending them by adding all possible combinations of one, two and three amino acids, or by concatenating 57 of the previously characterized tetrapeptides to each other in all possible combinations (including order). The extended and concatenated libraries were analyzed by binding either labeled β-galactosidase to them or by binding a mixture of 10 different labeled proteins of various sizes, hydrophobicities and charge characteristics to the peptide arrays. By comparing the binding signals from the tetrapeptides or amino acid extensions alone to the binding signals from the complete extended or concatenated sequences, it was possible to evaluate the extent to which affinity and specificity of the whole sequence depends on the subsequences that make it up. The conclusion is that while joining two component sequences together can either greatly increase or decrease overall binding and specificity (relative to the component sequences alone), the contribution to the binding affinity and specificity of the individual binding components is strongly dependent on their position in the peptide; component sequences that bind strongly at the C-terminus of the peptide do not necessarily add substantially to binding and specificity when placed at the N-terminus.
Collapse
Affiliation(s)
- Wei Wang
- Department of Chemistry and Biochemistry, Arizona State University, Tempe, AZ 85287-5001, USA; The Center for Innovations in Medicine, The Biodesign Institute at Arizona State University, 1001 S McAllister Ave., Tempe, AZ 85287-5001, USA
| | - Neal W Woodbury
- Department of Chemistry and Biochemistry, Arizona State University, Tempe, AZ 85287-5001, USA; The Center for Innovations in Medicine, The Biodesign Institute at Arizona State University, 1001 S McAllister Ave., Tempe, AZ 85287-5001, USA.
| |
Collapse
|
8
|
Nowinski AK, White AD, Keefe AJ, Jiang S. Biologically inspired stealth peptide-capped gold nanoparticles. LANGMUIR : THE ACS JOURNAL OF SURFACES AND COLLOIDS 2014; 30:1864-1870. [PMID: 24483727 DOI: 10.1021/la404980g] [Citation(s) in RCA: 62] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]
Abstract
Introduction into the human body makes most nanoparticle systems susceptible to aggregation via nonspecific protein binding. Here, we developed a peptide-capped gold nanoparticle platform that withstands aggregation in undiluted human serum at 37 °C for 24 h. This biocompatible and natural system is based on mimicking human proteins which are enriched in negatively charged glutamic acid and positively charged lysine residues on their surface. The multifunctional EKEKEKE-PPPPC-Am peptide sequence consists of a stealth glutamic acid/lysine portion combined with a surface anchoring linker containing four prolines and a cysteine. Particle stability was measured via optical spectroscopy and dynamic light scattering in single protein, high salt, and undiluted human serum solutions. In vitro cell experiments demonstrate EKEKEKE-PPPPC-Am capped gold nanoparticles effectively minimize nonspecific cell uptake by nonphagocytic bovine aortic endothelial cells and phagocytic murine macrophage RAW 264.7 cells. Cytotoxicity studies show that peptide-capped gold nanoparticles do not affect cell viability. Finally, the peptide EKEKEKE-PPPPC-Am was extended with cyclic RGD to demonstrate specific cell targeting and stealth without using poly(ethylene glycol). Adding the functional peptide via peptide sequence extension avoids complex conjugation chemistries that are used for connection to synthetic materials. Inductively coupled plasma mass spectroscopy results indicate high aortic bovine endothelial cell uptake of c[RGDfE(SGG-KEKEKE-PPPPC-Am)] capped gold nanoparticles and low uptake of the control scrambled sequence c[RDGfE(SGG-KEKEKE-PPPPC-Am)] capped gold nanoparticles.
Collapse
Affiliation(s)
- Ann K Nowinski
- Department of Chemical Engineering, University of Washington , Seattle, Washington 98195, United States
| | | | | | | |
Collapse
|
9
|
White AD, Keefe AJ, Ella-Menye JR, Nowinski AK, Shao Q, Pfaendtner J, Jiang S. Free Energy of Solvated Salt Bridges: A Simulation and Experimental Study. J Phys Chem B 2013; 117:7254-9. [DOI: 10.1021/jp4024469] [Citation(s) in RCA: 33] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Affiliation(s)
- Andrew D. White
- Department
of Chemical Engineering, University of Washington, Seattle, Washington, 98195 United States
| | - Andrew J. Keefe
- Department
of Chemical Engineering, University of Washington, Seattle, Washington, 98195 United States
| | - Jean-Rene Ella-Menye
- Department
of Chemical Engineering, University of Washington, Seattle, Washington, 98195 United States
| | - Ann K. Nowinski
- Department
of Chemical Engineering, University of Washington, Seattle, Washington, 98195 United States
| | - Qing Shao
- Department
of Chemical Engineering, University of Washington, Seattle, Washington, 98195 United States
| | - Jim Pfaendtner
- Department
of Chemical Engineering, University of Washington, Seattle, Washington, 98195 United States
| | - Shaoyi Jiang
- Department
of Chemical Engineering, University of Washington, Seattle, Washington, 98195 United States
| |
Collapse
|