Tam C, Zhang KYJ. FPredX: Interpretable models for the prediction of spectral maxima, brightness, and oligomeric states of fluorescent proteins.
Proteins 2021;
90:732-746. [PMID:
34676905 DOI:
10.1002/prot.26270]
[Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2021] [Revised: 09/19/2021] [Accepted: 10/15/2021] [Indexed: 11/06/2022]
Abstract
Fluorescent protein (FP) design is among the challenging protein design problems due to the tradeoffs among multiple properties to be optimized. Despite the accumulated efforts in design and characterization, progress has been slow in gaining a full understanding of sequence-property relationships to tackle the multiobjective design problem in FPs. In this study, we approach this problem by developing FPredX, a collection of gradient-boosted decision tree models, which mapped FP sequences to four major design targets of FPs, including excitation maximum, emission maximum, brightness, and oligomeric state. By training using one-hot encoded multiple aligned sequences with hyperparameters optimization in each model, FPredX models showed excellent prediction performance for all target properties compared with existing methods. We further interpreted the FPredX models by comparing the importance of positions along the aligned FP sequence to the predictive performance and suggested positions, which showed differential importance deemed by FPredX models to the prediction of each target property.
Collapse