Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Raimondi D, Orlando G, Vranken WF, Moreau Y. Exploring the limitations of biophysical propensity scales coupled with machine learning for protein sequence analysis. Sci Rep 2019;9:16932. [PMID: 31729443 PMCID: PMC6858301 DOI: 10.1038/s41598-019-53324-w] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2019] [Accepted: 10/25/2019] [Indexed: 11/21/2022] Open

For:	Raimondi D, Orlando G, Vranken WF, Moreau Y. Exploring the limitations of biophysical propensity scales coupled with machine learning for protein sequence analysis. Sci Rep 2019;9:16932. [PMID: 31729443 PMCID: PMC6858301 DOI: 10.1038/s41598-019-53324-w] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2019] [Accepted: 10/25/2019] [Indexed: 11/21/2022] Open

Number

Cited by Other Article(s)

Breimann S, Kamp F, Steiner H, Frishman D. AAontology: An Ontology of Amino Acid Scales for Interpretable Machine Learning. J Mol Biol 2024;436:168717. [PMID: 39053689 DOI: 10.1016/j.jmb.2024.168717] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2024] [Revised: 07/15/2024] [Accepted: 07/19/2024] [Indexed: 07/27/2024]

Wagner A. Genotype sampling for deep-learning assisted experimental mapping of a combinatorially complete fitness landscape. Bioinformatics 2024;40:btae317. [PMID: 38745436 PMCID: PMC11132821 DOI: 10.1093/bioinformatics/btae317] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2024] [Revised: 03/21/2024] [Accepted: 05/14/2024] [Indexed: 05/16/2024] Open

Kouba P, Kohout P, Haddadi F, Bushuiev A, Samusevich R, Sedlar J, Damborsky J, Pluskal T, Sivic J, Mazurenko S. Machine Learning-Guided Protein Engineering. ACS Catal 2023;13:13863-13895. [PMID: 37942269 PMCID: PMC10629210 DOI: 10.1021/acscatal.3c02743] [Citation(s) in RCA: 23] [Impact Index Per Article: 23.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2023] [Revised: 09/20/2023] [Indexed: 11/10/2023]

Affiliation(s)

Petr Kouba Loschmidt Laboratories, Department of Experimental Biology and RECETOX, Faculty of Science, Masaryk University, Kamenice 5, 625 00 Brno, Czech Republic Czech Institute of Informatics, Robotics and Cybernetics, Czech Technical University in Prague, Jugoslavskych partyzanu 1580/3, 160 00 Prague 6, Czech Republic Faculty of Electrical Engineering, Czech Technical University in Prague, Technicka 2, 166 27 Prague 6, Czech Republic
Pavel Kohout Loschmidt Laboratories, Department of Experimental Biology and RECETOX, Faculty of Science, Masaryk University, Kamenice 5, 625 00 Brno, Czech Republic International Clinical Research Center, St. Anne’s University Hospital Brno, Pekarska 53, 656 91 Brno, Czech Republic
Faraneh Haddadi Loschmidt Laboratories, Department of Experimental Biology and RECETOX, Faculty of Science, Masaryk University, Kamenice 5, 625 00 Brno, Czech Republic International Clinical Research Center, St. Anne’s University Hospital Brno, Pekarska 53, 656 91 Brno, Czech Republic
Anton Bushuiev Czech Institute of Informatics, Robotics and Cybernetics, Czech Technical University in Prague, Jugoslavskych partyzanu 1580/3, 160 00 Prague 6, Czech Republic
Raman Samusevich Czech Institute of Informatics, Robotics and Cybernetics, Czech Technical University in Prague, Jugoslavskych partyzanu 1580/3, 160 00 Prague 6, Czech Republic Institute of Organic Chemistry and Biochemistry of the Czech Academy of Sciences, Flemingovo nám. 2, 160 00 Prague 6, Czech Republic
Jiri Sedlar Czech Institute of Informatics, Robotics and Cybernetics, Czech Technical University in Prague, Jugoslavskych partyzanu 1580/3, 160 00 Prague 6, Czech Republic
Jiri Damborsky Loschmidt Laboratories, Department of Experimental Biology and RECETOX, Faculty of Science, Masaryk University, Kamenice 5, 625 00 Brno, Czech Republic International Clinical Research Center, St. Anne’s University Hospital Brno, Pekarska 53, 656 91 Brno, Czech Republic
Tomas Pluskal Institute of Organic Chemistry and Biochemistry of the Czech Academy of Sciences, Flemingovo nám. 2, 160 00 Prague 6, Czech Republic
Josef Sivic Czech Institute of Informatics, Robotics and Cybernetics, Czech Technical University in Prague, Jugoslavskych partyzanu 1580/3, 160 00 Prague 6, Czech Republic
Stanislav Mazurenko Loschmidt Laboratories, Department of Experimental Biology and RECETOX, Faculty of Science, Masaryk University, Kamenice 5, 625 00 Brno, Czech Republic International Clinical Research Center, St. Anne’s University Hospital Brno, Pekarska 53, 656 91 Brno, Czech Republic

Collapse

Bourdillon AT, Shah HP, Cohen O, Hajek MA, Mehra S. Novel Machine Learning Model to Predict Interval of Oral Cancer Recurrence for Surveillance Stratification. Laryngoscope 2022. [DOI: 10.1002/lary.30351] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2022] [Revised: 07/29/2022] [Accepted: 08/01/2022] [Indexed: 12/24/2022]

Prediction of B cell epitopes in proteins using a novel sequence similarity-based method. Sci Rep 2022;12:13739. [PMID: 35962028 PMCID: PMC9374694 DOI: 10.1038/s41598-022-18021-1] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2022] [Accepted: 08/03/2022] [Indexed: 11/29/2022] Open

Mckenna A, P N Dubey S. Machine Learning Based Predictive Model for the Analysis of Sequence Activity Relationships Using Protein Spectra and Protein Descriptors. J Biomed Inform 2022;128:104016. [PMID: 35143999 DOI: 10.1016/j.jbi.2022.104016] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2021] [Revised: 12/13/2021] [Accepted: 02/03/2022] [Indexed: 11/26/2022]

Griffith D, Holehouse AS. PARROT is a flexible recurrent neural network framework for analysis of large protein datasets. eLife 2021;10:e70576. [PMID: 34533455 PMCID: PMC8448528 DOI: 10.7554/elife.70576] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2021] [Accepted: 09/06/2021] [Indexed: 11/29/2022] Open

Akbar S, Ahmad A, Hayat M, Rehman AU, Khan S, Ali F. iAtbP-Hyb-EnC: Prediction of antitubercular peptides via heterogeneous feature representation and genetic algorithm based ensemble learning model. Comput Biol Med 2021;137:104778. [PMID: 34481183 DOI: 10.1016/j.compbiomed.2021.104778] [Citation(s) in RCA: 42] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2021] [Revised: 08/16/2021] [Accepted: 08/17/2021] [Indexed: 11/26/2022]

Abstract

Tuberculosis (TB) is a worldwide illness caused by the bacteria Mycobacterium tuberculosis. Owing to the high prevalence of multidrug-resistant tuberculosis, numerous traditional strategies for developing novel alternative therapies have been presented. The effectiveness and dependability of these procedures are not always consistent. Peptide-based therapy has recently been regarded as a preferable alternative due to its excellent selectivity in targeting specific cells without affecting the normal cells. However, due to the rapid growth of the peptide samples, predicting TB accurately has become a challenging task. To effectively identify antitubercular peptides, an intelligent and reliable prediction model is indispensable. An ensemble learning approach was used in this study to improve expected results by compensating for the shortcomings of individual classification algorithms. Initially, three distinct representation approaches were used to formulate the training samples: k-space amino acid composition, composite physiochemical properties, and one-hot encoding. The feature vectors of the applied feature extraction methods are then combined to generate a heterogeneous vector. Finally, utilizing individual and heterogeneous vectors, five distinct nature classification models were used to evaluate prediction rates. In addition, a genetic algorithm-based ensemble model was used to improve the suggested model's prediction and training capabilities. Using Training and independent datasets, the proposed ensemble model achieved an accuracy of 94.47% and 92.68%, respectively. It was observed that our proposed "iAtbP-Hyb-EnC" model outperformed and reported ~10% highest training accuracy than existing predictors. The "iAtbP-Hyb-EnC" model is suggested to be a reliable tool for scientists and might play a valuable role in academic research and drug discovery. The source code and all datasets are publicly available at https://github.com/Farman335/iAtbP-Hyb-EnC.

Collapse

Raimondi D, Orlando G, Michiels E, Pakravan D, Bratek-Skicki A, Van Den Bosch L, Moreau Y, Rousseau F, Schymkowitz J. In-silico prediction of in-vitro protein liquid-liquid phase separation experiments outcomes with multi-head neural attention. Bioinformatics 2021;37:3473-3479. [PMID: 33983381 DOI: 10.1093/bioinformatics/btab350] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2020] [Revised: 04/15/2021] [Accepted: 05/10/2021] [Indexed: 01/21/2023] Open

Pertseva M, Gao B, Neumeier D, Yermanos A, Reddy ST. Applications of Machine and Deep Learning in Adaptive Immunity. Annu Rev Chem Biomol Eng 2021;12:39-62. [PMID: 33852352 DOI: 10.1146/annurev-chembioeng-101420-125021] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]

Auslander N, Gussow AB, Koonin EV. Incorporating Machine Learning into Established Bioinformatics Frameworks. Int J Mol Sci 2021;22:2903. [PMID: 33809353 PMCID: PMC8000113 DOI: 10.3390/ijms22062903] [Citation(s) in RCA: 35] [Impact Index Per Article: 11.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2021] [Revised: 03/08/2021] [Accepted: 03/10/2021] [Indexed: 12/23/2022] Open

Raimondi D, Passemiers A, Fariselli P, Moreau Y. Current cancer driver variant predictors learn to recognize driver genes instead of functional variants. BMC Biol 2021;19:3. [PMID: 33441128 PMCID: PMC7807764 DOI: 10.1186/s12915-020-00930-0] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2020] [Accepted: 11/19/2020] [Indexed: 12/11/2022] Open

ElAbd H, Bromberg Y, Hoarfrost A, Lenz T, Franke A, Wendorff M. Amino acid encoding for deep learning applications. BMC Bioinformatics 2020;21:235. [PMID: 32517697 PMCID: PMC7285590 DOI: 10.1186/s12859-020-03546-x] [Citation(s) in RCA: 34] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2020] [Accepted: 05/12/2020] [Indexed: 12/25/2022] Open

Abstract

BACKGROUND

The number of applications of deep learning algorithms in bioinformatics is increasing as they usually achieve superior performance over classical approaches, especially, when bigger training datasets are available. In deep learning applications, discrete data, e.g. words or n-grams in language, or amino acids or nucleotides in bioinformatics, are generally represented as a continuous vector through an embedding matrix. Recently, learning this embedding matrix directly from the data as part of the continuous iteration of the model to optimize the target prediction - a process called 'end-to-end learning' - has led to state-of-the-art results in many fields. Although usage of embeddings is well described in the bioinformatics literature, the potential of end-to-end learning for single amino acids, as compared to more classical manually-curated encoding strategies, has not been systematically addressed. To this end, we compared classical encoding matrices, namely one-hot, VHSE8 and BLOSUM62, to end-to-end learning of amino acid embeddings for two different prediction tasks using three widely used architectures, namely recurrent neural networks (RNN), convolutional neural networks (CNN), and the hybrid CNN-RNN.

RESULTS

By using different deep learning architectures, we show that end-to-end learning is on par with classical encodings for embeddings of the same dimension even when limited training data is available, and might allow for a reduction in the embedding dimension without performance loss, which is critical when deploying the models to devices with limited computational capacities. We found that the embedding dimension is a major factor in controlling the model performance. Surprisingly, we observed that deep learning models are capable of learning from random vectors of appropriate dimension.

CONCLUSION

Our study shows that end-to-end learning is a flexible and powerful method for amino acid encoding. Further, due to the flexibility of deep learning systems, amino acid encoding schemes should be benchmarked against random vectors of the same dimension to disentangle the information content provided by the encoding scheme from the distinguishability effect provided by the scheme.

Collapse

Raimondi D, Orlando G, Fariselli P, Moreau Y. Insight into the protein solubility driving forces with neural attention. PLoS Comput Biol 2020;16:e1007722. [PMID: 32352965 PMCID: PMC7217484 DOI: 10.1371/journal.pcbi.1007722] [Citation(s) in RCA: 20] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2019] [Revised: 05/12/2020] [Accepted: 02/10/2020] [Indexed: 12/29/2022] Open

Abstract

Protein solubility is a key aspect for many biotechnological, biomedical and industrial processes, such as the production of active proteins and antibodies. In addition, understanding the molecular determinants of the solubility of proteins may be crucial to shed light on the molecular mechanisms of diseases caused by aggregation processes such as amyloidosis. Here we present SKADE, a novel Neural Network protein solubility predictor and we show how it can provide novel insight into the protein solubility mechanisms, thanks to its neural attention architecture. First, we show that SKADE positively compares with state of the art tools while using just the protein sequence as input. Then, thanks to the neural attention mechanism, we use SKADE to investigate the patterns learned during training and we analyse its decision process. We use this peculiarity to show that, while the attention profiles do not correlate with obvious sequence aspects such as biophysical properties of the aminoacids, they suggest that N- and C-termini are the most relevant regions for solubility prediction and are predictive for complex emergent properties such as aggregation-prone regions involved in beta-amyloidosis and contact density. Moreover, SKADE is able to identify mutations that increase or decrease the overall solubility of the protein, allowing it to be used to perform large scale in-silico mutagenesis of proteins in order to maximize their solubility.

The solubility of proteins is a crucial biophysical aspect when it comes to understanding many human diseases and to improve the industrial processes for protein production. Due to its relevance, computational methods have been devised in order to study and possibly optimize the solubility of proteins. In this work we apply a deep-learning technique, called neural attention to predict protein solubility while “opening” the model itself to interpretability, even though Machine Learning models are usually considered black boxes. Thank to the attention mechanism, we show that i) our model implicitly learns complex patterns related to emergent, protein folding-related, aspects such as to recognize β-amyloidosis regions and that ii) the N-and C-termini are the regions with the highes signal fro solubility prediction. When it comes to enhancing the solubility of proteins, we, for the first time, propose to investigate the synergistic effects of tandem mutations instead of “single” mutations, suggesting that this could minimize the number of required proposed mutations.

Collapse