1
|
Medina-Ortiz D, Contreras S, Amado-Hinojosa J, Torres-Almonacid J, Asenjo JA, Navarrete M, Olivera-Nappa Á. Generalized Property-Based Encoders and Digital Signal Processing Facilitate Predictive Tasks in Protein Engineering. Front Mol Biosci 2022; 9:898627. [PMID: 35911960 PMCID: PMC9329607 DOI: 10.3389/fmolb.2022.898627] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2022] [Accepted: 06/23/2022] [Indexed: 11/13/2022] Open
Abstract
Computational methods in protein engineering often require encoding amino acid sequences, i.e., converting them into numeric arrays. Physicochemical properties are a typical choice to define encoders, where we replace each amino acid by its value for a given property. However, what property (or group thereof) is best for a given predictive task remains an open problem. In this work, we generalize property-based encoding strategies to maximize the performance of predictive models in protein engineering. First, combining text mining and unsupervised learning, we partitioned the AAIndex database into eight semantically-consistent groups of properties. We then applied a non-linear PCA within each group to define a single encoder to represent it. Then, in several case studies, we assess the performance of predictive models for protein and peptide function, folding, and biological activity, trained using the proposed encoders and classical methods (One Hot Encoder and TAPE embeddings). Models trained on datasets encoded with our encoders and converted to signals through the Fast Fourier Transform (FFT) increased their precision and reduced their overfitting substantially, outperforming classical approaches in most cases. Finally, we propose a preliminary methodology to create de novo sequences with desired properties. All these results offer simple ways to increase the performance of general and complex predictive tasks in protein engineering without increasing their complexity.
Collapse
Affiliation(s)
- David Medina-Ortiz
- Centre for Biotechnology and Bioengineering, Universidad de Chile, Santiago, Chile
- Departamento de Ingeniería en Computación, Universidad de Magallanes, Punta Arenas, Chile
| | - Sebastian Contreras
- Max Planck Institute for Dynamics and Self-Organization, Göttingen, Germany
- *Correspondence: Sebastian Contreras, ; Álvaro Olivera-Nappa,
| | - Juan Amado-Hinojosa
- Centre for Biotechnology and Bioengineering, Universidad de Chile, Santiago, Chile
- Departamento de Ingeniería Química, Biotecnología y Materiales, Facultad de Ciencias Físicas y Matemáticas, Universidad de Chile, Santiago, Chile
| | - Jorge Torres-Almonacid
- Departamento de Ingeniería en Computación, Universidad de Magallanes, Punta Arenas, Chile
| | - Juan A. Asenjo
- Centre for Biotechnology and Bioengineering, Universidad de Chile, Santiago, Chile
- Departamento de Ingeniería Química, Biotecnología y Materiales, Facultad de Ciencias Físicas y Matemáticas, Universidad de Chile, Santiago, Chile
| | | | - Álvaro Olivera-Nappa
- Centre for Biotechnology and Bioengineering, Universidad de Chile, Santiago, Chile
- Departamento de Ingeniería Química, Biotecnología y Materiales, Facultad de Ciencias Físicas y Matemáticas, Universidad de Chile, Santiago, Chile
- *Correspondence: Sebastian Contreras, ; Álvaro Olivera-Nappa,
| |
Collapse
|
2
|
Olivera-Nappa Á, Contreras S, Tevy MF, Medina-Ortiz D, Leschot A, Vigil P, Conca C. Patient-Wise Methodology to Assess Glycemic Health Status: Applications to Quantify the Efficacy and Physiological Targets of Polyphenols on Glycemic Control. Front Nutr 2022; 9:831696. [PMID: 35252308 PMCID: PMC8892255 DOI: 10.3389/fnut.2022.831696] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2021] [Accepted: 01/25/2022] [Indexed: 11/13/2022] Open
Abstract
A growing body of evidence indicates that dietary polyphenols could be used as an early intervention to treat glucose-insulin (G-I) dysregulation. However, studies report heterogeneous information, and the targets of the intervention remain largely elusive. In this work, we provide a general methodology to quantify the effects of any given polyphenol-rich food or formulae over glycemic regulation in a patient-wise manner using an Oral Glucose Tolerance Test (OGTT). We use a mathematical model to represent individual OGTT curves as the coordinated action of subsystems, each one described by a parameter with physiological interpretation. Using the parameter values calculated for a cohort of 1198 individuals, we propose a statistical model to calculate the risk of dysglycemia and the coordination among subsystems for each subject, thus providing a continuous and individual health assessment. This method allows identifying individuals at high risk of dysglycemia—which would have been missed with traditional binary diagnostic methods—enabling early nutritional intervention with a polyphenol-supplemented diet where it is most effective and desirable. Besides, the proposed methodology assesses the effectiveness of interventions over time when applied to the OGTT curves of a treated individual. We illustrate the use of this method in a case study to assess the dose-dependent effects of Delphinol® on reducing dysglycemia risk and improving the coordination between subsystems. Finally, this strategy enables, on the one hand, the use of low-cost, non-invasive methods in population-scale nutritional studies. On the other hand, it will help practitioners assess the effectiveness of an intervention based on individual vulnerabilities and adapt the treatment to manage dysglycemia and avoid its progression into disease.
Collapse
Affiliation(s)
- Álvaro Olivera-Nappa
- Centre for Biotechnology and Bioengineering (CeBiB), University of Chile, Santiago, Chile
- Department of Chemical Engineering, Biotechnology and Materials, University of Chile, Santiago, Chile
- *Correspondence: Álvaro Olivera-Nappa
| | - Sebastian Contreras
- Max Planck Institute for Dynamics and Self-Organization, Göttingen, Germany
- Sebastian Contreras
| | - María Florencia Tevy
- Laboratory of Cell Biology, Institute of Nutrition and Food Technology (INTA), University of Chile, Santiago, Chile
| | - David Medina-Ortiz
- Centre for Biotechnology and Bioengineering (CeBiB), University of Chile, Santiago, Chile
- Department of Chemical Engineering, Biotechnology and Materials, University of Chile, Santiago, Chile
| | | | - Pilar Vigil
- Reproductive Health Research Institute, Santiago, Chile
| | - Carlos Conca
- Centre for Biotechnology and Bioengineering (CeBiB), University of Chile, Santiago, Chile
- Center for Mathematical Modelling (CMM), University of Chile, Santiago, Chile
| |
Collapse
|
3
|
Quiroz C, Saavedra YB, Armijo-Galdames B, Amado-Hinojosa J, Olivera-Nappa Á, Sanchez-Daza A, Medina-Ortiz D. Peptipedia: a user-friendly web application and a comprehensive database for peptide research supported by Machine Learning approach. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2021; 2021:6363751. [PMID: 34478499 PMCID: PMC8415426 DOI: 10.1093/database/baab055] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/30/2021] [Revised: 06/30/2021] [Accepted: 08/11/2021] [Indexed: 12/12/2022]
Abstract
Peptides have attracted attention during the last decades due to their extraordinary therapeutic properties. Different computational tools have been developed to take advantage of existing information, compiling knowledge and making available the information for common users. Nevertheless, most related tools available are not user-friendly, present redundant information, do not clearly display the data, and usually are specific for particular biological activities, not existing so far, an integrated database with consolidated information to help research peptide sequences. To solve these necessities, we developed Peptipedia, a user-friendly web application and comprehensive database to search, characterize and analyse peptide sequences. Our tool integrates the information from 30 previously reported databases with a total of 92 055 amino acid sequences, making it the biggest repository of peptides with recorded activities to date. Furthermore, we make available a variety of bioinformatics services and statistical modules to increase our tool’s usability. Moreover, we incorporated a robust assembled binary classification system to predict putative biological activities for peptide sequences. Our tools’ significant differences with other existing alternatives become a substantial contribution for developing biotechnological and bioengineering applications for peptides. Peptipedia is available for non-commercial use as an open-access software, licensed under the GNU General Public License, version GPL 3.0. The web platform is publicly available at peptipedia.cl. Database URL: Both the source code and sample data sets are available in the GitHub repository https://github.com/ProteinEngineering-PESB2/peptipedia
Collapse
Affiliation(s)
- Cristofer Quiroz
- Facultad de Ingeniería, Universidad Autonóma de Chile, Cinco Pte. 1670, Talca 3467987, Chile
| | - Yasna Barrera Saavedra
- Escuela de Ingeniería en Bioinformática, Universidad de Talca, Avenida Lircay SN, Talca 3460000, Chile
| | - Benjamín Armijo-Galdames
- Centre for Biotechnology and Bioengineering, Universidad de Chile, Beauchef 851, Santiago 8370448, Chile.,Department of Chemical Engineering, Biotechnology and Materials, Universidad de Chile, Beauchef 851, Santiago 8370448, Chile
| | - Juan Amado-Hinojosa
- Centre for Biotechnology and Bioengineering, Universidad de Chile, Beauchef 851, Santiago 8370448, Chile.,Department of Chemical Engineering, Biotechnology and Materials, Universidad de Chile, Beauchef 851, Santiago 8370448, Chile
| | - Álvaro Olivera-Nappa
- Centre for Biotechnology and Bioengineering, Universidad de Chile, Beauchef 851, Santiago 8370448, Chile.,Department of Chemical Engineering, Biotechnology and Materials, Universidad de Chile, Beauchef 851, Santiago 8370448, Chile
| | - Anamaria Sanchez-Daza
- Centre for Biotechnology and Bioengineering, Universidad de Chile, Beauchef 851, Santiago 8370448, Chile.,Institute for Cell Dynamics and Biotechnology, Beauchef 851, Santiago 8370456, Chile
| | - David Medina-Ortiz
- Centre for Biotechnology and Bioengineering, Universidad de Chile, Beauchef 851, Santiago 8370448, Chile.,Department of Chemical Engineering, Biotechnology and Materials, Universidad de Chile, Beauchef 851, Santiago 8370448, Chile
| |
Collapse
|
4
|
Contreras S, Biron-Lattes JP, Villavicencio HA, Medina-Ortiz D, Llanovarced-Kawles N, Olivera-Nappa Á. Statistically-based methodology for revealing real contagion trends and correcting delay-induced errors in the assessment of COVID-19 pandemic. CHAOS, SOLITONS, AND FRACTALS 2020; 139:110087. [PMID: 32834623 PMCID: PMC7341964 DOI: 10.1016/j.chaos.2020.110087] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/26/2020] [Revised: 06/19/2020] [Accepted: 07/02/2020] [Indexed: 05/14/2023]
Abstract
COVID-19 pandemic has reshaped our world in a timescale much shorter than what we can understand. Particularities of SARS-CoV-2, such as its persistence in surfaces and the lack of a curative treatment or vaccine against COVID-19, have pushed authorities to apply restrictive policies to control its spreading. As data drove most of the decisions made in this global contingency, their quality is a critical variable for decision-making actors, and therefore should be carefully curated. In this work, we analyze the sources of error in typically reported epidemiological variables and usual tests used for diagnosis, and their impact on our understanding of COVID-19 spreading dynamics. We address the existence of different delays in the report of new cases, induced by the incubation time of the virus and testing-diagnosis time gaps, and other error sources related to the sensitivity/specificity of the tests used to diagnose COVID-19. Using a statistically-based algorithm, we perform a temporal reclassification of cases to avoid delay-induced errors, building up new epidemiologic curves centered in the day where the contagion effectively occurred. We also statistically enhance the robustness behind the discharge/recovery clinical criteria in the absence of a direct test, which is typically the case of non-first world countries, where the limited testing capabilities are fully dedicated to the evaluation of new cases. Finally, we applied our methodology to assess the evolution of the pandemic in Chile through the Effective Reproduction Number Rt , identifying different moments in which data was misleading governmental actions. In doing so, we aim to raise public awareness of the need for proper data reporting and processing protocols for epidemiological modelling and predictions.
Collapse
Affiliation(s)
- Sebastián Contreras
- Laboratory for Rheology and Fluid Dynamics, Universidad de Chile, Beauchef 850, Santiago 8370448, Chile
- Centre for Biotechnology and Bioengineering, Universidad de Chile, Beauchef 851, Santiago 8370448, Chile
| | - Juan Pablo Biron-Lattes
- Centre for Biotechnology and Bioengineering, Universidad de Chile, Beauchef 851, Santiago 8370448, Chile
- Department of Chemical Engineering, Biotechnology, and Materials, Universidad de Chile, Beauchef 851, Santiago,8370448 Chile
| | - H Andrés Villavicencio
- Centre for Biotechnology and Bioengineering, Universidad de Chile, Beauchef 851, Santiago 8370448, Chile
| | - David Medina-Ortiz
- Centre for Biotechnology and Bioengineering, Universidad de Chile, Beauchef 851, Santiago 8370448, Chile
- Division of Chemistry and Chemical Engineering, California Institute of Technology, Pasadena, CA 91125, USA
| | - Nyna Llanovarced-Kawles
- Centre for Biotechnology and Bioengineering, Universidad de Chile, Beauchef 851, Santiago 8370448, Chile
- Department of Chemical Engineering, Biotechnology, and Materials, Universidad de Chile, Beauchef 851, Santiago,8370448 Chile
| | - Álvaro Olivera-Nappa
- Centre for Biotechnology and Bioengineering, Universidad de Chile, Beauchef 851, Santiago 8370448, Chile
- Department of Chemical Engineering, Biotechnology, and Materials, Universidad de Chile, Beauchef 851, Santiago,8370448 Chile
| |
Collapse
|