Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Sledzieski S, Singh R, Cowen L, Berger B. D-SCRIPT translates genome to phenome with sequence-based, structure-aware, genome-scale predictions of protein-protein interactions. Cell Syst 2021;12:969-982.e6. [PMID: 34536380 PMCID: PMC8586911 DOI: 10.1016/j.cels.2021.08.010] [Citation(s) in RCA: 42] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2021] [Revised: 08/01/2021] [Accepted: 08/19/2021] [Indexed: 11/29/2022]

For:	Sledzieski S, Singh R, Cowen L, Berger B. D-SCRIPT translates genome to phenome with sequence-based, structure-aware, genome-scale predictions of protein-protein interactions. Cell Syst 2021;12:969-982.e6. [PMID: 34536380 PMCID: PMC8586911 DOI: 10.1016/j.cels.2021.08.010] [Citation(s) in RCA: 42] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2021] [Revised: 08/01/2021] [Accepted: 08/19/2021] [Indexed: 11/29/2022]

Number

Cited by Other Article(s)

Cheng P, Mao C, Tang J, Yang S, Cheng Y, Wang W, Gu Q, Han W, Chen H, Li S, Chen Y, Zhou J, Li W, Pan A, Zhao S, Huang X, Zhu S, Zhang J, Shu W, Wang S. Zero-shot prediction of mutation effects with multimodal deep representation learning guides protein engineering. Cell Res 2024;34:630-647. [PMID: 38969803 PMCID: PMC11369238 DOI: 10.1038/s41422-024-00989-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2024] [Accepted: 06/03/2024] [Indexed: 07/07/2024] Open

Abstract

Mutations in amino acid sequences can provoke changes in protein function. Accurate and unsupervised prediction of mutation effects is critical in biotechnology and biomedicine, but remains a fundamental challenge. To resolve this challenge, here we present Protein Mutational Effect Predictor (ProMEP), a general and multiple sequence alignment-free method that enables zero-shot prediction of mutation effects. A multimodal deep representation learning model embedded in ProMEP was developed to comprehensively learn both sequence and structure contexts from ~160 million proteins. ProMEP achieves state-of-the-art performance in mutational effect prediction and accomplishes a tremendous improvement in speed, enabling efficient and intelligent protein engineering. Specifically, ProMEP accurately forecasts mutational consequences on the gene-editing enzymes TnpB and TadA, and successfully guides the development of high-performance gene-editing tools with their engineered variants. The gene-editing efficiency of a 5-site mutant of TnpB reaches up to 74.04% (vs 24.66% for the wild type); and the base editing tool developed on the basis of a TadA 15-site mutant (in addition to the A106V/D108N double mutation that renders deoxyadenosine deaminase activity to TadA) exhibits an A-to-G conversion frequency of up to 77.27% (vs 69.80% for ABE8e, a previous TadA-based adenine base editor) with significantly reduced bystander and off-target effects compared to ABE8e. ProMEP not only showcases superior performance in predicting mutational effects on proteins but also demonstrates a great capability to guide protein engineering. Therefore, ProMEP enables efficient exploration of the gigantic protein space and facilitates practical design of proteins, thereby advancing studies in biomedicine and synthetic biology.

Collapse

Affiliation(s)

Peng Cheng Bioinformatics Center of AMMS, Beijing, China
Cong Mao State Key Laboratory of Reproductive Medicine and Offspring Health, Women's Hospital of Nanjing Medical University, Nanjing Maternity and Child Health Care Hospital, Nanjing Medical University, Nanjing, Jiangsu, China
Jin Tang Zhejiang Lab, Hangzhou, Zhejiang, China
Sen Yang Bioinformatics Center of AMMS, Beijing, China
Yu Cheng State Key Laboratory of Reproductive Medicine and Offspring Health, Women's Hospital of Nanjing Medical University, Nanjing Maternity and Child Health Care Hospital, Nanjing Medical University, Nanjing, Jiangsu, China
Wuke Wang Zhejiang Lab, Hangzhou, Zhejiang, China
Qiuxi Gu State Key Laboratory of Reproductive Medicine and Offspring Health, Women's Hospital of Nanjing Medical University, Nanjing Maternity and Child Health Care Hospital, Nanjing Medical University, Nanjing, Jiangsu, China
Wei Han Zhejiang Lab, Hangzhou, Zhejiang, China
Hao Chen State Key Laboratory of Reproductive Medicine and Offspring Health, Women's Hospital of Nanjing Medical University, Nanjing Maternity and Child Health Care Hospital, Nanjing Medical University, Nanjing, Jiangsu, China
Sihan Li State Key Laboratory of Reproductive Medicine and Offspring Health, Women's Hospital of Nanjing Medical University, Nanjing Maternity and Child Health Care Hospital, Nanjing Medical University, Nanjing, Jiangsu, China
Yaofeng Chen Bioinformatics Center of AMMS, Beijing, China
Jianglin Zhou Bioinformatics Center of AMMS, Beijing, China
Wuju Li Bioinformatics Center of AMMS, Beijing, China
Aimin Pan Zhejiang Lab, Hangzhou, Zhejiang, China
Suwen Zhao iHuman Institute, ShanghaiTech University, Shanghai, China School of Life Science and Technology, ShanghaiTech University, Shanghai, China
Xingxu Huang Zhejiang Lab, Hangzhou, Zhejiang, China School of Life Science and Technology, ShanghaiTech University, Shanghai, China
Shiqiang Zhu Zhejiang Lab, Hangzhou, Zhejiang, China.
Jun Zhang State Key Laboratory of Reproductive Medicine and Offspring Health, Women's Hospital of Nanjing Medical University, Nanjing Maternity and Child Health Care Hospital, Nanjing Medical University, Nanjing, Jiangsu, China.
Wenjie Shu Bioinformatics Center of AMMS, Beijing, China.
Shengqi Wang Bioinformatics Center of AMMS, Beijing, China.

Collapse

Gao Q, Zhang C, Li M, Yu T. Protein-Protein Interaction Prediction Model Based on ProtBert-BiGRU-Attention. J Comput Biol 2024;31:797-814. [PMID: 39069885 DOI: 10.1089/cmb.2023.0297] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/30/2024] Open

Zitnik M, Li MM, Wells A, Glass K, Morselli Gysi D, Krishnan A, Murali TM, Radivojac P, Roy S, Baudot A, Bozdag S, Chen DZ, Cowen L, Devkota K, Gitter A, Gosline SJC, Gu P, Guzzi PH, Huang H, Jiang M, Kesimoglu ZN, Koyuturk M, Ma J, Pico AR, Pržulj N, Przytycka TM, Raphael BJ, Ritz A, Sharan R, Shen Y, Singh M, Slonim DK, Tong H, Yang XH, Yoon BJ, Yu H, Milenković T. Current and future directions in network biology. BIOINFORMATICS ADVANCES 2024;4:vbae099. [PMID: 39143982 PMCID: PMC11321866 DOI: 10.1093/bioadv/vbae099] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/27/2023] [Revised: 05/31/2024] [Accepted: 07/08/2024] [Indexed: 08/16/2024]

Affiliation(s)

Marinka Zitnik Department of Biomedical Informatics, Harvard Medical School, Boston, MA 02115, United States
Michelle M Li Department of Biomedical Informatics, Harvard Medical School, Boston, MA 02115, United States
Aydin Wells Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN 46556, United States Lucy Family Institute for Data and Society, University of Notre Dame, Notre Dame, IN 46556, United States Eck Institute for Global Health, University of Notre Dame, Notre Dame, IN 46556, United States
Kimberly Glass Channing Division of Network Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA 02115, United States
Deisy Morselli Gysi Channing Division of Network Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA 02115, United States Department of Statistics, Federal University of Paraná, Curitiba, Paraná 81530-015, Brazil Department of Physics, Northeastern University, Boston, MA 02115, United States
Arjun Krishnan Department of Biomedical Informatics, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, United States
T M Murali Department of Computer Science, Virginia Tech, Blacksburg, VA 24061, United States
Predrag Radivojac Khoury College of Computer Sciences, Northeastern University, Boston, MA 02115, United States
Sushmita Roy Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI 53715, United States Wisconsin Institute for Discovery, Madison, WI 53715, United States
Anaïs Baudot Aix Marseille Université, INSERM, MMG, Marseille, France
Serdar Bozdag Department of Computer Science and Engineering, University of North Texas, Denton, TX 76203, United States Department of Mathematics, University of North Texas, Denton, TX 76203, United States
Danny Z Chen Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN 46556, United States
Lenore Cowen Department of Computer Science, Tufts University, Medford, MA 02155, United States
Kapil Devkota Department of Computer Science, Tufts University, Medford, MA 02155, United States
Anthony Gitter Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI 53715, United States Morgridge Institute for Research, Madison, WI 53715, United States
Sara J C Gosline Biological Sciences Division, Pacific Northwest National Laboratory, Seattle, WA 98109, United States
Pengfei Gu Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN 46556, United States
Pietro H Guzzi Department of Medical and Surgical Sciences, University Magna Graecia of Catanzaro, Catanzaro, 88100, Italy
Heng Huang Department of Computer Science, University of Maryland College Park, College Park, MD 20742, United States
Meng Jiang Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN 46556, United States
Ziynet Nesibe Kesimoglu Department of Computer Science and Engineering, University of North Texas, Denton, TX 76203, United States National Center of Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20814, United States
Mehmet Koyuturk Department of Computer and Data Sciences, Case Western Reserve University, Cleveland, OH 44106, United States
Jian Ma Ray and Stephanie Lane Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, United States
Alexander R Pico Institute of Data Science and Biotechnology, Gladstone Institutes, San Francisco, CA 94158, United States
Nataša Pržulj Department of Computer Science, University College London, London, WC1E 6BT, England ICREA, Catalan Institution for Research and Advanced Studies, Barcelona, 08010, Spain Barcelona Supercomputing Center (BSC), Barcelona, 08034, Spain
Teresa M Przytycka National Center of Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20814, United States
Benjamin J Raphael Department of Computer Science, Princeton University, Princeton, NJ 08544, United States
Anna Ritz Department of Biology, Reed College, Portland, OR 97202, United States
Roded Sharan School of Computer Science, Tel Aviv University, Tel Aviv, 69978, Israel
Yang Shen Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX 77843, United States
Mona Singh Department of Computer Science, Princeton University, Princeton, NJ 08544, United States Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ 08544, United States
Donna K Slonim Department of Computer Science, Tufts University, Medford, MA 02155, United States
Hanghang Tong Department of Computer Science, University of Illinois Urbana-Champaign, Urbana, IL 61801, United States
Xinan Holly Yang Department of Pediatrics, University of Chicago, Chicago, IL 60637, United States
Byung-Jun Yoon Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX 77843, United States Computational Science Initiative, Brookhaven National Laboratory, Upton, NY 11973, United States
Haiyuan Yu Department of Computational Biology, Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, NY 14853, United States
Tijana Milenković Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN 46556, United States Lucy Family Institute for Data and Society, University of Notre Dame, Notre Dame, IN 46556, United States Eck Institute for Global Health, University of Notre Dame, Notre Dame, IN 46556, United States

Collapse

Ko YS, Parkinson J, Liu C, Wang W. TUnA: an uncertainty-aware transformer model for sequence-based protein-protein interaction prediction. Brief Bioinform 2024;25:bbae359. [PMID: 39051117 PMCID: PMC11269822 DOI: 10.1093/bib/bbae359] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2024] [Revised: 05/31/2024] [Accepted: 07/10/2024] [Indexed: 07/27/2024] Open

Szymborski J, Emad A. INTREPPPID-an orthologue-informed quintuplet network for cross-species prediction of protein-protein interaction. Brief Bioinform 2024;25:bbae405. [PMID: 39171984 PMCID: PMC11339867 DOI: 10.1093/bib/bbae405] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2024] [Revised: 07/25/2024] [Accepted: 07/31/2024] [Indexed: 08/23/2024] Open

Volzhenin K, Bittner L, Carbone A. SENSE-PPI reconstructs interactomes within, across, and between species at the genome scale. iScience 2024;27:110371. [PMID: 39055916 PMCID: PMC11269938 DOI: 10.1016/j.isci.2024.110371] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2023] [Revised: 05/04/2024] [Accepted: 06/21/2024] [Indexed: 07/28/2024] Open

Si Y, Zou J, Gao Y, Chuai G, Liu Q, Chen L. Foundation models in molecular biology. BIOPHYSICS REPORTS 2024;10:135-151. [PMID: 39027316 PMCID: PMC11252241 DOI: 10.52601/bpr.2024.240006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2024] [Accepted: 03/04/2024] [Indexed: 07/20/2024] Open

Affiliation(s)

Yunda Si Key Laboratory of Systems Health Science of Zhejiang Province, School of Life Science, Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Hangzhou 310024, China
Jiawei Zou Shanghai Institute of Biochemistry and Cell Biology, Center for Excellence in Molecular Cell Science, Chinese Academy of Sciences, University of Chinese Academy of Sciences, Shanghai 200031, China
Yicheng Gao Translational Medical Center for Stem Cell Therapy and Institute for Regenerative Medicine, Shanghai East Hospital, Frontier Science Center for Stem Cell Research, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai 200092, China Shanghai Research Institute for Intelligent Autonomous Systems, Shanghai 201804, China
Guohui Chuai Translational Medical Center for Stem Cell Therapy and Institute for Regenerative Medicine, Shanghai East Hospital, Frontier Science Center for Stem Cell Research, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai 200092, China Shanghai Research Institute for Intelligent Autonomous Systems, Shanghai 201804, China
Qi Liu Translational Medical Center for Stem Cell Therapy and Institute for Regenerative Medicine, Shanghai East Hospital, Frontier Science Center for Stem Cell Research, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai 200092, China Shanghai Research Institute for Intelligent Autonomous Systems, Shanghai 201804, China
Luonan Chen Key Laboratory of Systems Health Science of Zhejiang Province, School of Life Science, Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Hangzhou 310024, China Shanghai Institute of Biochemistry and Cell Biology, Center for Excellence in Molecular Cell Science, Chinese Academy of Sciences, University of Chinese Academy of Sciences, Shanghai 200031, China

Collapse

Sledzieski S, Kshirsagar M, Baek M, Dodhia R, Lavista Ferres J, Berger B. Democratizing protein language models with parameter-efficient fine-tuning. Proc Natl Acad Sci U S A 2024;121:e2405840121. [PMID: 38900798 PMCID: PMC11214071 DOI: 10.1073/pnas.2405840121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2024] [Accepted: 05/09/2024] [Indexed: 06/22/2024] Open

Yang T, Wang Y, He Y. TEC-miTarget: enhancing microRNA target prediction based on deep learning of ribonucleic acid sequences. BMC Bioinformatics 2024;25:159. [PMID: 38643080 PMCID: PMC11032603 DOI: 10.1186/s12859-024-05780-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2023] [Accepted: 04/12/2024] [Indexed: 04/22/2024] Open

Abstract

BACKGROUND

MicroRNAs play a critical role in regulating gene expression by binding to specific target sites within gene transcripts, making the identification of microRNA targets a prominent focus of research. Conventional experimental methods for identifying microRNA targets are both time-consuming and expensive, prompting the development of computational tools for target prediction. However, the existing computational tools exhibit limited performance in meeting the demands of practical applications, highlighting the need to improve the performance of microRNA target prediction models.

RESULTS

In this paper, we utilize the most popular natural language processing and computer vision technologies to propose a novel approach, called TEC-miTarget, for microRNA target prediction based on transformer encoder and convolutional neural networks. TEC-miTarget treats RNA sequences as a natural language and encodes them using a transformer encoder, a widely used encoder in natural language processing. It then combines the representations of a pair of microRNA and its candidate target site sequences into a contact map, which is a three-dimensional array similar to a multi-channel image. Therefore, the contact map's features are extracted using a four-layer convolutional neural network, enabling the prediction of interactions between microRNA and its candidate target sites. We applied a series of comparative experiments to demonstrate that TEC-miTarget significantly improves microRNA target prediction, compared with existing state-of-the-art models. Our approach is the first approach to perform comparisons with other approaches at both sequence and transcript levels. Furthermore, it is the first approach compared with both deep learning-based and seed-match-based methods. We first compared TEC-miTarget's performance with approaches at the sequence level, and our approach delivers substantial improvements in performance using the same datasets and evaluation metrics. Moreover, we utilized TEC-miTarget to predict microRNA targets in long mRNA sequences, which involves two steps: selecting candidate target site sequences and applying sequence-level predictions. We finally showed that TEC-miTarget outperforms other approaches at the transcript level, including the popular seed match methods widely used in previous years.

CONCLUSIONS

We propose a novel approach for predicting microRNA targets at both sequence and transcript levels, and demonstrate that our approach outperforms other methods based on deep learning or seed match. We also provide our approach as an easy-to-use software, TEC-miTarget, at https://github.com/tingpeng17/TEC-miTarget . Our results provide new perspectives for microRNA target prediction.

Collapse

Si Y, Yan C. Protein language model-embedded geometric graphs power inter-protein contact prediction. eLife 2024;12:RP92184. [PMID: 38564241 PMCID: PMC10987090 DOI: 10.7554/elife.92184] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/04/2024] Open

Lin P, Li H, Huang SY. Deep learning in modeling protein complex structures: From contact prediction to end-to-end approaches. Curr Opin Struct Biol 2024;85:102789. [PMID: 38402744 DOI: 10.1016/j.sbi.2024.102789] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2023] [Revised: 01/16/2024] [Accepted: 02/06/2024] [Indexed: 02/27/2024]

Jia P, Zhang F, Wu C, Li M. A comprehensive review of protein-centric predictors for biomolecular interactions: from proteins to nucleic acids and beyond. Brief Bioinform 2024;25:bbae162. [PMID: 38739759 PMCID: PMC11089422 DOI: 10.1093/bib/bbae162] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/01/2024] [Revised: 02/17/2024] [Accepted: 03/31/2024] [Indexed: 05/16/2024] Open

Dang TH, Vu TA. xCAPT5: protein-protein interaction prediction using deep and wide multi-kernel pooling convolutional neural networks with protein language model. BMC Bioinformatics 2024;25:106. [PMID: 38461247 PMCID: PMC10924985 DOI: 10.1186/s12859-024-05725-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2024] [Accepted: 02/28/2024] [Indexed: 03/11/2024] Open

Waksman T, Astin E, Fisher SR, Hunter WN, Bos JIB. Computational Prediction of Structure, Function, and Interaction of Myzus persicae (Green Peach Aphid) Salivary Effector Proteins. MOLECULAR PLANT-MICROBE INTERACTIONS : MPMI 2024;37:338-346. [PMID: 38171380 DOI: 10.1094/mpmi-10-23-0154-fi] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/05/2024]

Ghosh S, Mitra P. MaTPIP: A deep-learning architecture with eXplainable AI for sequence-driven, feature mixed protein-protein interaction prediction. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2024;244:107955. [PMID: 38064959 DOI: 10.1016/j.cmpb.2023.107955] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/07/2023] [Revised: 11/09/2023] [Accepted: 11/26/2023] [Indexed: 01/26/2024]

Abstract

BACKGROUND AND OBJECTIVE

Protein-protein interaction (PPI) is a vital process in all living cells, controlling essential cell functions such as cell cycle regulation, signal transduction, and metabolic processes with broad applications that include antibody therapeutics, vaccines, and drug discovery. The problem of sequence-based PPI prediction has been a long-standing issue in computational biology.

METHODS

We introduce MaTPIP, a cutting-edge deep-learning framework for predicting PPI. MaTPIP stands out due to its innovative design, fusing pre-trained Protein Language Model (PLM)-based features with manually curated protein sequence attributes, emphasizing the part-whole relationship by incorporating two-dimensional granular part (amino-acid) level features and one-dimensional whole-level (protein) features. What sets MaTPIP apart is its ability to integrate these features across three different input terminals seamlessly. MatPIP also includes a distinctive configuration of Convolutional Neural Network (CNN) with Transformer components for concurrent utilization of CNN and sequential characteristics in each iteration and a one-dimensional to two-dimensional converter followed by a unified embedding. The statistical significance of this classifier is validated using McNemar's test.

RESULTS

MaTPIP outperformed the existing methods on both the Human PPI benchmark and cross-species PPI testing datasets, demonstrating its immense generalization capability for PPI prediction. We used seven diverse datasets with varying PPI target class distributions. Notably, within the novel PPI scenario, the most challenging category for Human PPI Benchmark, MaTPIP improves the existing state-of-the-art score from 74.1% to 78.6% (measured in Area under ROC Curve), from 23.2% to 32.8% (in average precision) and from 4.9% to 9.5% (in precision at 3% recall) for 50%, 10% and 0.3% target class distributions, respectively. In cross-species PPI evaluation, hybrid MaTPIP establishes a new benchmark score (measured in Area Under precision-recall curve) of 81.1% from the previous 60.9% for Mouse, 80.9% from 56.2% for Fly, 78.1% from 55.9% for Worm, 59.9% from 41.7% for Yeast, and 66.2% from 58.8% for E.coli. Our eXplainable AI-based assessment reveals an average contribution of different feature families per prediction on these datasets.

CONCLUSIONS

MaTPIP mixes manually curated features with the feature extracted from the pre-trained PLM to predict sequence-based protein-protein association. Furthermore, MaTPIP demonstrates strong generalization capabilities for cross-species PPI predictions.

Collapse

Lannelongue L, Inouye M. Pitfalls of machine learning models for protein-protein interaction networks. Bioinformatics 2024;40:btae012. [PMID: 38200587 PMCID: PMC10868344 DOI: 10.1093/bioinformatics/btae012] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2023] [Revised: 11/24/2023] [Accepted: 01/09/2024] [Indexed: 01/12/2024] Open

Bernett J, Blumenthal DB, List M. Cracking the black box of deep sequence-based protein-protein interaction prediction. Brief Bioinform 2024;25:bbae076. [PMID: 38446741 PMCID: PMC10939362 DOI: 10.1093/bib/bbae076] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2023] [Revised: 01/09/2024] [Indexed: 03/08/2024] Open

Lei C, Zhou K, Zheng J, Zhao M, Huang Y, He H, Yang S, Zhang Z. AraPathogen2.0: An Improved Prediction of Plant-Pathogen Protein-Protein Interactions Empowered by the Natural Language Processing Technique. J Proteome Res 2024;23:494-499. [PMID: 38069805 DOI: 10.1021/acs.jproteome.3c00364] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2024]

Sledzieski S, Kshirsagar M, Baek M, Berger B, Dodhia R, Ferres JL. Democratizing Protein Language Models with Parameter-Efficient Fine-Tuning. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.11.09.566187. [PMID: 37986761 PMCID: PMC10659351 DOI: 10.1101/2023.11.09.566187] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/22/2023]

Abstract

Proteomics has been revolutionized by large pre-trained protein language models, which learn unsupervised representations from large corpora of sequences. The parameters of these models are then fine-tuned in a supervised setting to tailor the model to a specific downstream task. However, as model size increases, the computational and memory footprint of fine-tuning becomes a barrier for many research groups. In the field of natural language processing, which has seen a similar explosion in the size of models, these challenges have been addressed by methods for parameter-efficient fine-tuning (PEFT). In this work, we newly bring parameter-efficient fine-tuning methods to proteomics. Using the parameter-efficient method LoRA, we train new models for two important proteomic tasks: predicting protein-protein interactions (PPI) and predicting the symmetry of homooligomers. We show that for homooligomer symmetry prediction, these approaches achieve performance competitive with traditional fine-tuning while requiring reduced memory and using three orders of magnitude fewer parameters. On the PPI prediction task, we surprisingly find that PEFT models actually outperform traditional fine-tuning while using two orders of magnitude fewer parameters. Here, we go even further to show that freezing the parameters of the language model and training only a classification head also outperforms fine-tuning, using five orders of magnitude fewer parameters, and that both of these models outperform state-of-the-art PPI prediction methods with substantially reduced compute. We also demonstrate that PEFT is robust to variations in training hyper-parameters, and elucidate where best practices for PEFT in proteomics differ from in natural language processing. Thus, we provide a blueprint to democratize the power of protein language model tuning to groups which have limited computational resources.

Collapse

Smith CH, Mejia-Trujillo R, Breton S, Pinto BJ, Kirkpatrick M, Havird JC. Mitonuclear Sex Determination? Empirical Evidence from Bivalves. Mol Biol Evol 2023;40:msad240. [PMID: 37935058 PMCID: PMC10653589 DOI: 10.1093/molbev/msad240] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2023] [Revised: 10/04/2023] [Accepted: 10/26/2023] [Indexed: 11/09/2023] Open

Sledzieski S, Devkota K, Singh R, Cowen L, Berger B. TT3D: Leveraging precomputed protein 3D sequence models to predict protein-protein interactions. Bioinformatics 2023;39:btad663. [PMID: 37897686 PMCID: PMC10640393 DOI: 10.1093/bioinformatics/btad663] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2023] [Revised: 09/24/2023] [Accepted: 10/27/2023] [Indexed: 10/30/2023] Open

Xie S, Xie X, Zhao X, Liu F, Wang Y, Ping J, Ji Z. HNSPPI: a hybrid computational model combing network and sequence information for predicting protein-protein interaction. Brief Bioinform 2023;24:bbad261. [PMID: 37480553 DOI: 10.1093/bib/bbad261] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2023] [Revised: 06/24/2023] [Accepted: 06/26/2023] [Indexed: 07/24/2023] Open

Will I, Beckerson WC, de Bekker C. Using machine learning to predict protein-protein interactions between a zombie ant fungus and its carpenter ant host. Sci Rep 2023;13:13821. [PMID: 37620441 PMCID: PMC10449854 DOI: 10.1038/s41598-023-40764-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2023] [Accepted: 08/16/2023] [Indexed: 08/26/2023] Open

Xie L, Xie L. Elucidation of genome-wide understudied proteins targeted by PROTAC-induced degradation using interpretable machine learning. PLoS Comput Biol 2023;19:e1010974. [PMID: 37590332 PMCID: PMC10464998 DOI: 10.1371/journal.pcbi.1010974] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2023] [Revised: 08/29/2023] [Accepted: 07/27/2023] [Indexed: 08/19/2023] Open

Smith CH, Mejia-Trujillo R, Breton S, Pinto BJ, Kirkpatrick M, Havird JC. Mitonuclear sex determination? Empirical evidence from bivalves. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.07.05.547839. [PMID: 37461691 PMCID: PMC10349986 DOI: 10.1101/2023.07.05.547839] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 07/24/2023]

Lee M. Recent Advances in Deep Learning for Protein-Protein Interaction Analysis: A Comprehensive Review. Molecules 2023;28:5169. [PMID: 37446831 DOI: 10.3390/molecules28135169] [Citation(s) in RCA: 12] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2023] [Revised: 06/30/2023] [Accepted: 06/30/2023] [Indexed: 07/15/2023] Open

Singh R, Sledzieski S, Bryson B, Cowen L, Berger B. Contrastive learning in protein language space predicts interactions between drugs and protein targets. Proc Natl Acad Sci U S A 2023;120:e2220778120. [PMID: 37289807 PMCID: PMC10268324 DOI: 10.1073/pnas.2220778120] [Citation(s) in RCA: 24] [Impact Index Per Article: 24.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2022] [Accepted: 04/10/2023] [Indexed: 06/10/2023] Open

Zheng J, Yang X, Huang Y, Yang S, Wuchty S, Zhang Z. Deep learning-assisted prediction of protein-protein interactions in Arabidopsis thaliana. THE PLANT JOURNAL : FOR CELL AND MOLECULAR BIOLOGY 2023;114:984-994. [PMID: 36919205 DOI: 10.1111/tpj.16188] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/17/2022] [Revised: 02/20/2023] [Accepted: 03/09/2023] [Indexed: 05/27/2023]

Durham J, Zhang J, Humphreys IR, Pei J, Cong Q. Recent advances in predicting and modeling protein-protein interactions. Trends Biochem Sci 2023;48:527-538. [PMID: 37061423 DOI: 10.1016/j.tibs.2023.03.003] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2022] [Revised: 03/03/2023] [Accepted: 03/17/2023] [Indexed: 04/17/2023]

Huang Y, Wuchty S, Zhou Y, Zhang Z. SGPPI: structure-aware prediction of protein-protein interactions in rigorous conditions with graph convolutional network. Brief Bioinform 2023;24:6995378. [PMID: 36682013 DOI: 10.1093/bib/bbad020] [Citation(s) in RCA: 9] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2022] [Revised: 11/17/2022] [Accepted: 01/05/2023] [Indexed: 01/23/2023] Open

Chen S, Liang C, Li H, Yu W, Prothiwa M, Kopczynski D, Loroch S, Fransen M, Verhelst SHL. Pepstatin-Based Probes for Photoaffinity Labeling of Aspartic Proteases and Application to Target Identification. ACS Chem Biol 2023;18:686-692. [PMID: 36920024 DOI: 10.1021/acschembio.2c00946] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/16/2023]

Kibar G, Vingron M. Prediction of protein-protein interactions using sequences of intrinsically disordered regions. Proteins 2023. [PMID: 36908253 DOI: 10.1002/prot.26486] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2022] [Revised: 02/15/2023] [Accepted: 03/07/2023] [Indexed: 03/14/2023]

Petrey D, Zhao H, Trudeau S, Murray D, Honig B. PrePPI: A structure informed proteome-wide database of protein-protein interactions. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.02.27.530276. [PMID: 36909476 PMCID: PMC10002632 DOI: 10.1101/2023.02.27.530276] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/04/2023]

A graph neural network model for deciphering the biological mechanisms of plant electrical signal classification. Appl Soft Comput 2023. [DOI: 10.1016/j.asoc.2023.110153] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/06/2023]

Soleymani F, Paquet E, Viktor HL, Michalowski W, Spinello D. ProtInteract: A deep learning framework for predicting protein-protein interactions. Comput Struct Biotechnol J 2023;21:1324-1348. [PMID: 36817951 PMCID: PMC9929211 DOI: 10.1016/j.csbj.2023.01.028] [Citation(s) in RCA: 9] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2022] [Revised: 01/20/2023] [Accepted: 01/20/2023] [Indexed: 01/26/2023] Open

Abstract

Proteins mainly perform their functions by interacting with other proteins. Protein-protein interactions underpin various biological activities such as metabolic cycles, signal transduction, and immune response. However, due to the sheer number of proteins, experimental methods for finding interacting and non-interacting protein pairs are time-consuming and costly. We therefore developed the ProtInteract framework to predict protein-protein interaction. ProtInteract comprises two components: first, a novel autoencoder architecture that encodes each protein's primary structure to a lower-dimensional vector while preserving its underlying sequence attributes. This leads to faster training of the second network, a deep convolutional neural network (CNN) that receives encoded proteins and predicts their interaction under three different scenarios. In each scenario, the deep CNN predicts the class of a given encoded protein pair. Each class indicates different ranges of confidence scores corresponding to the probability of whether a predicted interaction occurs or not. The proposed framework features significantly low computational complexity and relatively fast response. The contributions of this work are twofold. First, ProtInteract assimilates the protein's primary structure into a pseudo-time series. Therefore, we leverage the nature of the time series of proteins and their physicochemical properties to encode a protein's amino acid sequence into a lower-dimensional vector space. This approach enables extracting highly informative sequence attributes while reducing computational complexity. Second, the ProtInteract framework utilises this information to identify protein interactions with other proteins based on its amino acid configuration. Our results suggest that the proposed framework performs with high accuracy and efficiency in predicting protein-protein interactions.

Collapse

Rogers JR, Nikolényi G, AlQuraishi M. Growing ecosystem of deep learning methods for modeling protein-protein interactions. Protein Eng Des Sel 2023;36:gzad023. [PMID: 38102755 DOI: 10.1093/protein/gzad023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2023] [Revised: 12/06/2023] [Accepted: 12/07/2023] [Indexed: 12/17/2023] Open

Ortiz-Vilchis P, De-la-Cruz-García JS, Ramirez-Arellano A. Identification of Relevant Protein Interactions with Partial Knowledge: A Complex Network and Deep Learning Approach. BIOLOGY 2023;12:140. [PMID: 36671832 PMCID: PMC9856098 DOI: 10.3390/biology12010140] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/15/2022] [Revised: 01/11/2023] [Accepted: 01/12/2023] [Indexed: 01/18/2023]

Murakami Y, Mizuguchi K. Recent developments of sequence-based prediction of protein-protein interactions. Biophys Rev 2022;14:1393-1411. [PMID: 36589735 PMCID: PMC9789376 DOI: 10.1007/s12551-022-01038-1] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 12/08/2022] [Indexed: 12/25/2022] Open

Huang Y, Zhang Z, Zhou Y. AbAgIntPre: A deep learning method for predicting antibody-antigen interactions based on sequence information. Front Immunol 2022;13:1053617. [PMID: 36618397 PMCID: PMC9813736 DOI: 10.3389/fimmu.2022.1053617] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2022] [Accepted: 12/14/2022] [Indexed: 12/24/2022] Open

Yang X, Zheng Y, Xing X, Sui X, Jia W, Pan H. Immune subtype identification and multi-layer perceptron classifier construction for breast cancer. Front Oncol 2022;12:943874. [PMID: 36568197 PMCID: PMC9780074 DOI: 10.3389/fonc.2022.943874] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2022] [Accepted: 11/17/2022] [Indexed: 12/13/2022] Open

Abstract

Introduction

Breast cancer is a heterogeneous tumor. Tumor microenvironment (TME) has an important effect on the proliferation, metastasis, treatment, and prognosis of breast cancer.

Methods

In this study, we calculated the relative proportion of tumor infiltrating immune cells (TIICs) in the breast cancer TME, and used the consensus clustering algorithm to cluster the breast cancer subtypes. We also developed a multi-layer perceptron (MLP) classifier based on a deep learning framework to detect breast cancer subtypes, which 70% of the breast cancer research cohort was used for the model training and 30% for validation.

Results

By performing the K-means clustering algorithm, the research cohort was clustered into two subtypes. The Kaplan-Meier survival estimate analysis showed significant differences in the overall survival (OS) between the two identified subtypes. Estimating the difference in the relative proportion of TIICs showed that the two subtypes had significant differences in multiple immune cells, such as CD8, CD4, and regulatory T cells. Further, the expression level of immune checkpoint molecules (PDL1, CTLA4, LAG3, TIGIT, CD27, IDO1, ICOS) and tumor mutational burden (TMB) also showed significant differences between the two subtypes, indicating the clinical value of the two subtypes. Finally, we identified a 38-gene signature and developed a multilayer perceptron (MLP) classifier that combined multi-gene signature to identify breast cancer subtypes. The results showed that the classifier had an accuracy rate of 93.56% and can be robustly used for the breast cancer subtype diagnosis.

Conclusion

Identification of breast cancer subtypes based on the immune signature in the tumor microenvironment can assist clinicians to effectively and accurately assess the progression of breast cancer and formulate different treatment strategies for different subtypes.

Collapse

Koca MB, Nourani E, Abbasoğlu F, Karadeniz İ, Sevilgen FE. Graph convolutional network based virus-human protein-protein interaction prediction for novel viruses. Comput Biol Chem 2022;101:107755. [PMID: 36037723 DOI: 10.1016/j.compbiolchem.2022.107755] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2022] [Revised: 07/07/2022] [Accepted: 08/10/2022] [Indexed: 11/03/2022]

Bernhofer M, Rost B. TMbed: transmembrane proteins predicted through language model embeddings. BMC Bioinformatics 2022;23:326. [PMID: 35941534 PMCID: PMC9358067 DOI: 10.1186/s12859-022-04873-x] [Citation(s) in RCA: 16] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2022] [Accepted: 08/03/2022] [Indexed: 12/30/2022] Open

Abstract

BACKGROUND

Despite the immense importance of transmembrane proteins (TMP) for molecular biology and medicine, experimental 3D structures for TMPs remain about 4-5 times underrepresented compared to non-TMPs. Today's top methods such as AlphaFold2 accurately predict 3D structures for many TMPs, but annotating transmembrane regions remains a limiting step for proteome-wide predictions.

RESULTS

Here, we present TMbed, a novel method inputting embeddings from protein Language Models (pLMs, here ProtT5), to predict for each residue one of four classes: transmembrane helix (TMH), transmembrane strand (TMB), signal peptide, or other. TMbed completes predictions for entire proteomes within hours on a single consumer-grade desktop machine at performance levels similar or better than methods, which are using evolutionary information from multiple sequence alignments (MSAs) of protein families. On the per-protein level, TMbed correctly identified 94 ± 8% of the beta barrel TMPs (53 of 57) and 98 ± 1% of the alpha helical TMPs (557 of 571) in a non-redundant data set, at false positive rates well below 1% (erred on 30 of 5654 non-membrane proteins). On the per-segment level, TMbed correctly placed, on average, 9 of 10 transmembrane segments within five residues of the experimental observation. Our method can handle sequences of up to 4200 residues on standard graphics cards used in desktop PCs (e.g., NVIDIA GeForce RTX 3060).

CONCLUSIONS

Based on embeddings from pLMs and two novel filters (Gaussian and Viterbi), TMbed predicts alpha helical and beta barrel TMPs at least as accurately as any other method but at lower false positive rates. Given the few false positives and its outstanding speed, TMbed might be ideal to sieve through millions of 3D structures soon to be predicted, e.g., by AlphaFold2.

Collapse

Bell EW, Schwartz JH, Freddolino PL, Zhang Y. PEPPI: Whole-proteome Protein-protein Interaction Prediction through Structure and Sequence Similarity, Functional Association, and Machine Learning. J Mol Biol 2022;434:167530. [PMID: 35662463 PMCID: PMC8897833 DOI: 10.1016/j.jmb.2022.167530] [Citation(s) in RCA: 14] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2021] [Revised: 02/17/2022] [Accepted: 03/03/2022] [Indexed: 01/31/2023]

Harmalkar A, Mahajan SP, Gray JJ. Induced fit with replica exchange improves protein complex structure prediction. PLoS Comput Biol 2022;18:e1010124. [PMID: 35658008 PMCID: PMC9200320 DOI: 10.1371/journal.pcbi.1010124] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2021] [Revised: 06/15/2022] [Accepted: 04/20/2022] [Indexed: 11/19/2022] Open

Cowen LJ, Putnam HM. Bioinformatics of Corals: Investigating Heterogeneous Omics Data from Coral Holobionts for Insight into Reef Health and Resilience. Annu Rev Biomed Data Sci 2022;5:205-231. [PMID: 35537462 DOI: 10.1146/annurev-biodatasci-122120-030732] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]

Casadio R, Martelli PL, Savojardo C. Machine learning solutions for predicting protein–protein interactions. WIRES COMPUTATIONAL MOLECULAR SCIENCE 2022. [DOI: 10.1002/wcms.1618] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]

Singh R, Devkota K, Sledzieski S, Berger B, Cowen L. OUP accepted manuscript. Bioinformatics 2022;38:i264-i272. [PMID: 35758793 PMCID: PMC9235477 DOI: 10.1093/bioinformatics/btac258] [Citation(s) in RCA: 15] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open

Abstract

Summary

Computational methods to predict protein–protein interaction (PPI) typically segregate into sequence-based ‘bottom-up’ methods that infer properties from the characteristics of the individual protein sequences, or global ‘top-down’ methods that infer properties from the pattern of already known PPIs in the species of interest. However, a way to incorporate top-down insights into sequence-based bottom-up PPI prediction methods has been elusive. We thus introduce Topsy-Turvy, a method that newly synthesizes both views in a sequence-based, multi-scale, deep-learning model for PPI prediction. While Topsy-Turvy makes predictions using only sequence data, during the training phase it takes a transfer-learning approach by incorporating patterns from both global and molecular-level views of protein interaction. In a cross-species context, we show it achieves state-of-the-art performance, offering the ability to perform genome-scale, interpretable PPI prediction for non-model organisms with no existing experimental PPI data. In species with available experimental PPI data, we further present a Topsy-Turvy hybrid (TT-Hybrid) model which integrates Topsy-Turvy with a purely network-based model for link prediction that provides information about species-specific network rewiring. TT-Hybrid makes accurate predictions for both well- and sparsely-characterized proteins, outperforming both its constituent components as well as other state-of-the-art PPI prediction methods. Furthermore, running Topsy-Turvy and TT-Hybrid screens is feasible for whole genomes, and thus these methods scale to settings where other methods (e.g. AlphaFold-Multimer) might be infeasible. The generalizability, accuracy and genome-level scalability of Topsy-Turvy and TT-Hybrid unlocks a more comprehensive map of protein interaction and organization in both model and non-model organisms.

Availability and implementation

https://topsyturvy.csail.mit.edu.

Supplementary information

Supplementary data are available at Bioinformatics online.

Collapse

Hu X, Feng C, Ling T, Chen M. Deep learning frameworks for protein–protein interaction prediction. Comput Struct Biotechnol J 2022;20:3223-3233. [PMID: 35832624 PMCID: PMC9249595 DOI: 10.1016/j.csbj.2022.06.025] [Citation(s) in RCA: 19] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2022] [Revised: 05/27/2022] [Accepted: 06/12/2022] [Indexed: 11/26/2022] Open

Kulmanov M, Hoehndorf R. OUP accepted manuscript. Bioinformatics 2022;38:i238-i245. [PMID: 35758802 PMCID: PMC9235501 DOI: 10.1093/bioinformatics/btac256] [Citation(s) in RCA: 17] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open