Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Teufel F, Gíslason MH, Almagro Armenteros JJ, Johansen A, Winther O, Nielsen H. GraphPart: homology partitioning for biological sequence analysis. NAR Genom Bioinform 2023;5:lqad088. [PMID: 37850036 PMCID: PMC10578201 DOI: 10.1093/nargab/lqad088] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2023] [Revised: 08/25/2023] [Accepted: 09/19/2023] [Indexed: 10/19/2023] Open

For:	Teufel F, Gíslason MH, Almagro Armenteros JJ, Johansen A, Winther O, Nielsen H. GraphPart: homology partitioning for biological sequence analysis. NAR Genom Bioinform 2023;5:lqad088. [PMID: 37850036 PMCID: PMC10578201 DOI: 10.1093/nargab/lqad088] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2023] [Revised: 08/25/2023] [Accepted: 09/19/2023] [Indexed: 10/19/2023] Open

Number

Cited by Other Article(s)

Fernández-Díaz R, Cossio-Pérez R, Agoni C, Lam HT, Lopez V, Shields DC. AutoPeptideML: a study on how to build more trustworthy peptide bioactivity predictors. BIOINFORMATICS (OXFORD, ENGLAND) 2024;40:btae555. [PMID: 39292535 PMCID: PMC11438549 DOI: 10.1093/bioinformatics/btae555] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/24/2024] [Revised: 08/08/2024] [Accepted: 09/17/2024] [Indexed: 09/20/2024]

Abstract

MOTIVATION

Automated machine learning (AutoML) solutions can bridge the gap between new computational advances and their real-world applications by enabling experimental scientists to build their own custom models. We examine different steps in the development life-cycle of peptide bioactivity binary predictors and identify key steps where automation cannot only result in a more accessible method, but also more robust and interpretable evaluation leading to more trustworthy models.

RESULTS

We present a new automated method for drawing negative peptides that achieves better balance between specificity and generalization than current alternatives. We study the effect of homology-based partitioning for generating the training and testing data subsets and demonstrate that model performance is overestimated when no such homology correction is used, which indicates that prior studies may have overestimated their performance when applied to new peptide sequences. We also conduct a systematic analysis of different protein language models as peptide representation methods and find that they can serve as better descriptors than a naive alternative, but that there is no significant difference across models with different sizes or algorithms. Finally, we demonstrate that an ensemble of optimized traditional machine learning algorithms can compete with more complex neural network models, while being more computationally efficient. We integrate these findings into AutoPeptideML, an easy-to-use AutoML tool to allow researchers without a computational background to build new predictive models for peptide bioactivity in a matter of minutes.

AVAILABILITY AND IMPLEMENTATION

Source code, documentation, and data are available at https://github.com/IBM/AutoPeptideML and a dedicated web-server at http://peptide.ucd.ie/AutoPeptideML. A static version of the software to ensure the reproduction of the results is available at https://zenodo.org/records/13363975.

Collapse

Ferrer Florensa A, Almagro Armenteros J, Nielsen H, Aarestrup F, Clausen P. SpanSeq: similarity-based sequence data splitting method for improved development and assessment of deep learning projects. NAR Genom Bioinform 2024;6:lqae106. [PMID: 39157582 PMCID: PMC11327874 DOI: 10.1093/nargab/lqae106] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2024] [Revised: 07/26/2024] [Accepted: 08/05/2024] [Indexed: 08/20/2024] Open

Bernett J, Blumenthal DB, Grimm DG, Haselbeck F, Joeres R, Kalinina OV, List M. Guiding questions to avoid data leakage in biological machine learning applications. Nat Methods 2024;21:1444-1453. [PMID: 39122953 DOI: 10.1038/s41592-024-02362-y] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2024] [Accepted: 06/26/2024] [Indexed: 08/12/2024]

Ødum MT, Teufel F, Thumuluri V, Almagro Armenteros JJ, Johansen AR, Winther O, Nielsen H. DeepLoc 2.1: multi-label membrane protein type prediction using protein language models. Nucleic Acids Res 2024;52:W215-W220. [PMID: 38587188 PMCID: PMC11223819 DOI: 10.1093/nar/gkae237] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2024] [Revised: 03/06/2024] [Accepted: 03/21/2024] [Indexed: 04/09/2024] Open

Nielsen H, Teufel F, Brunak S, von Heijne G. SignalP: The Evolution of a Web Server. Methods Mol Biol 2024;2836:331-367. [PMID: 38995548 DOI: 10.1007/978-1-0716-4007-4_17] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/13/2024]

Klie A, Laub D, Talwar JV, Stites H, Jores T, Solvason JJ, Farley EK, Carter H. Predictive analyses of regulatory sequences with EUGENe. NATURE COMPUTATIONAL SCIENCE 2023;3:946-956. [PMID: 38177592 PMCID: PMC10768637 DOI: 10.1038/s43588-023-00544-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/12/2023] [Accepted: 09/27/2023] [Indexed: 01/06/2024]