1
|
Pande A, Patiyal S, Lathwal A, Arora C, Kaur D, Dhall A, Mishra G, Kaur H, Sharma N, Jain S, Usmani SS, Agrawal P, Kumar R, Kumar V, Raghava GPS. Pfeature: A Tool for Computing Wide Range of Protein Features and Building Prediction Models. J Comput Biol 2023; 30:204-222. [PMID: 36251780 DOI: 10.1089/cmb.2022.0241] [Citation(s) in RCA: 11] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/24/2023] Open
Abstract
In the last three decades, a wide range of protein features have been discovered to annotate a protein. Numerous attempts have been made to integrate these features in a software package/platform so that the user may compute a wide range of features from a single source. To complement the existing methods, we developed a method, Pfeature, for computing a wide range of protein features. Pfeature allows to compute more than 200,000 features required for predicting the overall function of a protein, residue-level annotation of a protein, and function of chemically modified peptides. It has six major modules, namely, composition, binary profiles, evolutionary information, structural features, patterns, and model building. Composition module facilitates to compute most of the existing compositional features, plus novel features. The binary profile of amino acid sequences allows to compute the fraction of each type of residue as well as its position. The evolutionary information module allows to compute evolutionary information of a protein in the form of a position-specific scoring matrix profile generated using Position-Specific Iterative Basic Local Alignment Search Tool (PSI-BLAST); fit for annotation of a protein and its residues. A structural module was developed for computing of structural features/descriptors from a tertiary structure of a protein. These features are suitable to predict the therapeutic potential of a protein containing non-natural or chemically modified residues. The model-building module allows to implement various machine learning techniques for developing classification and regression models as well as feature selection. Pfeature also allows the generation of overlapping patterns and features from a protein. A user-friendly Pfeature is available as a web server python library and stand-alone package.
Collapse
Affiliation(s)
- Akshara Pande
- Department of Computational Biology, Indraprastha Institute of Information Technology, New Delhi, India
| | - Sumeet Patiyal
- Department of Computational Biology, Indraprastha Institute of Information Technology, New Delhi, India
| | - Anjali Lathwal
- Department of Computational Biology, Indraprastha Institute of Information Technology, New Delhi, India
| | - Chakit Arora
- Department of Computational Biology, Indraprastha Institute of Information Technology, New Delhi, India
| | - Dilraj Kaur
- Department of Computational Biology, Indraprastha Institute of Information Technology, New Delhi, India
| | - Anjali Dhall
- Department of Computational Biology, Indraprastha Institute of Information Technology, New Delhi, India
| | - Gaurav Mishra
- Department of Computational Biology, Indraprastha Institute of Information Technology, New Delhi, India.,Department of Electrical Engineering, Shiv Nadar University, Greater Noida, India
| | - Harpreet Kaur
- Department of Computational Biology, Indraprastha Institute of Information Technology, New Delhi, India.,Bioinformatics Centre, CSIR-Institute of Microbial Technology, Chandigarh, India
| | - Neelam Sharma
- Department of Computational Biology, Indraprastha Institute of Information Technology, New Delhi, India
| | - Shipra Jain
- Department of Computational Biology, Indraprastha Institute of Information Technology, New Delhi, India
| | - Salman Sadullah Usmani
- Department of Computational Biology, Indraprastha Institute of Information Technology, New Delhi, India.,Bioinformatics Centre, CSIR-Institute of Microbial Technology, Chandigarh, India
| | - Piyush Agrawal
- Department of Computational Biology, Indraprastha Institute of Information Technology, New Delhi, India.,Bioinformatics Centre, CSIR-Institute of Microbial Technology, Chandigarh, India
| | - Rajesh Kumar
- Department of Computational Biology, Indraprastha Institute of Information Technology, New Delhi, India.,Bioinformatics Centre, CSIR-Institute of Microbial Technology, Chandigarh, India
| | - Vinod Kumar
- Department of Computational Biology, Indraprastha Institute of Information Technology, New Delhi, India.,Bioinformatics Centre, CSIR-Institute of Microbial Technology, Chandigarh, India
| | - Gajendra P S Raghava
- Department of Computational Biology, Indraprastha Institute of Information Technology, New Delhi, India
| |
Collapse
|
2
|
He P, Hou L, Tao H, Dai Q, Yao Y. An Analysis Model of Protein Mass Spectrometry Data and its Application. Curr Bioinform 2020. [DOI: 10.2174/1574893614666191202150844] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Backgroud:
The impact of cancer in society created the necessity of new and faster
theoretical models for the early diagnosis of cancer.
Methods:
In this work, a mass spectrometry (MS) data analysis method based on the star-like
graph of protein and support vector machine (SVM) was proposed and applied to the ovarian
cancer early classification in the MS data set. Firstly, the MS data is reduced and transformed into
the corresponding protein sequence. Then, the topological indexes of the star-like graph are
calculated to describe each MS data of the cancer sample. Finally, the SVM model is suggested to
classify the MS data.
Results:
Using independent training and testing experiments 10 times to evaluate the ovarian
cancer detection models, the average prediction accuracy, sensitivity, and specificity of the model
were 96.45%, 96.88%, and 95.67%, respectively, for [0,1] normalization data, and 94.43%,
96.25%, and 91.11% for [-1,1] normalization data.
Conclusion:
The model combined with the SELDI-TOF-MS technology has a prospect in early
clinical detection and diagnosis of ovarian cancer.
Collapse
Affiliation(s)
- Pingan He
- School of Science, Zhejiang Sci-Tech University, Hangzhou 310018,China
| | - Longao Hou
- School of Science, Zhejiang Sci-Tech University, Hangzhou 310018,China
| | - Hong Tao
- School of Science, Zhejiang Sci-Tech University, Hangzhou 310018,China
| | - Qi Dai
- College of Life Science, Zhejiang Sci-Tech University, Hangzhou 310018,China
| | - Yuhua Yao
- School of Mathematics and Statistics, Hainan Normal University, Haikou 570100,China
| |
Collapse
|
3
|
Agüero-Chapin G, Galpert D, Molina-Ruiz R, Ancede-Gallardo E, Pérez-Machado G, De la Riva GA, Antunes A. Graph Theory-Based Sequence Descriptors as Remote Homology Predictors. Biomolecules 2019; 10:E26. [PMID: 31878100 PMCID: PMC7022958 DOI: 10.3390/biom10010026] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2019] [Revised: 12/16/2019] [Accepted: 12/18/2019] [Indexed: 12/23/2022] Open
Abstract
Alignment-free (AF) methodologies have increased in popularity in the last decades as alternative tools to alignment-based (AB) algorithms for performing comparative sequence analyses. They have been especially useful to detect remote homologs within the twilight zone of highly diverse gene/protein families and superfamilies. The most popular alignment-free methodologies, as well as their applications to classification problems, have been described in previous reviews. Despite a new set of graph theory-derived sequence/structural descriptors that have been gaining relevance in the detection of remote homology, they have been omitted as AF predictors when the topic is addressed. Here, we first go over the most popular AF approaches used for detecting homology signals within the twilight zone and then bring out the state-of-the-art tools encoding graph theory-derived sequence/structure descriptors and their success for identifying remote homologs. We also highlight the tendency of integrating AF features/measures with the AB ones, either into the same prediction model or by assembling the predictions from different algorithms using voting/weighting strategies, for improving the detection of remote signals. Lastly, we briefly discuss the efforts made to scale up AB and AF features/measures for the comparison of multiple genomes and proteomes. Alongside the achieved experiences in remote homology detection by both the most popular AF tools and other less known ones, we provide our own using the graphical-numerical methodologies, MARCH-INSIDE, TI2BioP, and ProtDCal. We also present a new Python-based tool (SeqDivA) with a friendly graphical user interface (GUI) for delimiting the twilight zone by using several similar criteria.
Collapse
Affiliation(s)
- Guillermin Agüero-Chapin
- CIIMAR/CIMAR, Interdisciplinary Centre of Marine and Environmental Research, University of Porto, Terminal de Cruzeiros do Porto de Leixões, Av. General Norton de Matos s/n 4450-208 Porto, Portugal
- Department of Biology, Faculty of Sciences, University of Porto, Rua do Campo Alegre, 4169-007 Porto, Portugal
| | - Deborah Galpert
- Departamento de Ciencia de la Computación. Universidad Central ¨Marta Abreu¨ de Las Villas (UCLV), Santa Clara 54830, Cuba;
| | - Reinaldo Molina-Ruiz
- Centro de Bioactivos Químicos (CBQ), Universidad Central ¨Marta Abreu¨ de Las Villas (UCLV), Santa Clara 54830, Cuba;
| | - Evys Ancede-Gallardo
- Programa de Doctorado en Fisicoquímica Molecular, Facultad de Ciencias Exactas, Universidad Andrés Bello, Av. República 239, Santiago 8370146, Chile;
| | - Gisselle Pérez-Machado
- EpiDisease S.L. Spin-Off of Centro de Investigación Biomédica en Red de Enfermedades Raras (CIBERER), 46980 Valencia, Spain;
| | - Gustavo A. De la Riva
- Laboratorio de Biotecnología Aplicada S. de R.L. de C.V., GRECA Inc., Carretera La Piedad-Carapán, km 3.5, La Piedad, Michoacán 59300, Mexico;
- Tecnológico Nacional de México, Instituto Tecnológico de la Piedad, Av. Ricardo Guzmán Romero, Santa Fe, La Piedad de Cavadas, Michoacán 59370, Mexico
| | - Agostinho Antunes
- CIIMAR/CIMAR, Interdisciplinary Centre of Marine and Environmental Research, University of Porto, Terminal de Cruzeiros do Porto de Leixões, Av. General Norton de Matos s/n 4450-208 Porto, Portugal
- Department of Biology, Faculty of Sciences, University of Porto, Rua do Campo Alegre, 4169-007 Porto, Portugal
| |
Collapse
|
4
|
Ma R, Liu Z, Zhang Q, Liu Z, Luo T. Evaluating Polymer Representations via Quantifying Structure-Property Relationships. J Chem Inf Model 2019; 59:3110-3119. [PMID: 31268306 DOI: 10.1021/acs.jcim.9b00358] [Citation(s) in RCA: 21] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
Machine learning techniques are being applied in quantifying structure-property relationships for a wide variety of materials, where the properly represented materials play key roles. Although algorithms for representation learning are extensively studied, their applications to domain-specific areas, such as polymers, are limited largely due to the lack of benchmark databases. In this work, we investigate different types of polymer representations, including Morgan fingerprint (MF), molecular embedding (ME), and molecular graph (MG), based on the benchmark database from a subset of the well-known web-based polymer databases, PolyInfo. We evaluate the quality of different polymer representations via quantifying the relationships between the representations and polymer properties, including density, melting temperature, and glass transition temperature. Different representation learning schemes for MEs, such as supervised learning, semisupervised learning, and transfer learning, are investigated. In supervised learning, only labeled molecules in our benchmark database are used for representation learning, in semisupervised learning, both labeled and unlabeled molecules in our benchmark database are used, and in transfer learning, molecules from an external database that is different from the benchmark database are used for representation learning. It is found that ME (with the R2 of 0.724 in the density case, 0.684 in the melting temperature case, and 0.865 in the glass transition temperature case) outperforms the other representations for structure-property relationship quantification in all cases studied, and MG (with the R2 of 0.260 in the density case, -0.149 in the melting temperature case, and 0.711 in the glass transition case) is shown to be much inferior to ME and MF (with the R2 of 0.562 in the density case, 0.645 in the melting temperature case, and 0.849 in the glass transition case), likely due to the relatively small volumes of training data available. For MEs, it is found that the similarities of substructure MEs under different learning schemes (e.g., SL, SSL, and TL) are differently estimated, thus leading to different performance scores in structure-property relation quantification. Combinations of MEs show little effect on predictive performance when comparing to the single MEs in the corresponding regression tasks, proving no information gain in mixing MEs.
Collapse
Affiliation(s)
- Ruimin Ma
- Department of Aerospace and Mechanical Engineering , University of Notre Dame , Notre Dame , Indiana 46556 , United States
| | - Zeyu Liu
- Department of Aerospace and Mechanical Engineering , University of Notre Dame , Notre Dame , Indiana 46556 , United States
| | - Quanwei Zhang
- Department of Aerospace and Mechanical Engineering , University of Notre Dame , Notre Dame , Indiana 46556 , United States
| | - Zhiyu Liu
- Department of Aerospace and Mechanical Engineering , University of Notre Dame , Notre Dame , Indiana 46556 , United States
| | - Tengfei Luo
- Department of Aerospace and Mechanical Engineering , University of Notre Dame , Notre Dame , Indiana 46556 , United States.,Department of Chemical and Biomolecular Engineering , University of Notre Dame , Notre Dame , Indiana 46556 , United States
| |
Collapse
|
5
|
Blanco JL, Porto-Pazos AB, Pazos A, Fernandez-Lozano C. Prediction of high anti-angiogenic activity peptides in silico using a generalized linear model and feature selection. Sci Rep 2018; 8:15688. [PMID: 30356060 PMCID: PMC6200741 DOI: 10.1038/s41598-018-33911-z] [Citation(s) in RCA: 33] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2018] [Accepted: 10/06/2018] [Indexed: 12/22/2022] Open
Abstract
Screening and in silico modeling are critical activities for the reduction of experimental costs. They also speed up research notably and strengthen the theoretical framework, thus allowing researchers to numerically quantify the importance of a particular subset of information. For example, in fields such as cancer and other highly prevalent diseases, having a reliable prediction method is crucial. The objective of this paper is to classify peptide sequences according to their anti-angiogenic activity to understand the underlying principles via machine learning. First, the peptide sequences were converted into three types of numerical molecular descriptors based on the amino acid composition. We performed different experiments with the descriptors and merged them to obtain baseline results for the performance of the models, particularly of each molecular descriptor subset. A feature selection process was applied to reduce the dimensionality of the problem and remove noisy features – which are highly present in biological problems. After a robust machine learning experimental design under equal conditions (nested resampling, cross-validation, hyperparameter tuning and different runs), we statistically and significantly outperformed the best previously published anti-angiogenic model with a generalized linear model via coordinate descent (glmnet), achieving a mean AUC value greater than 0.96 and with an accuracy of 0.86 with 200 molecular descriptors, mixed from the three groups. A final analysis with the top-40 discriminative anti-angiogenic activity peptides is presented along with a discussion of the feature selection process and the individual importance of each molecular descriptors According to our findings, anti-angiogenic activity peptides are strongly associated with amino acid sequences SP, LSL, PF, DIT, PC, GH, RQ, QD, TC, SC, AS, CLD, ST, MF, GRE, IQ, CQ and HG.
Collapse
Affiliation(s)
- Jose Liñares Blanco
- Department of Computer Science, Faculty of Computer Science, University of A Coruña, A Coruña, 15071, Spain
| | - Ana B Porto-Pazos
- Department of Computer Science, Faculty of Computer Science, University of A Coruña, A Coruña, 15071, Spain.,Instituto de Investigación Biomédica de A Coruña (INIBIC). Complexo Hospitalario Universitario de A Coruña, A Coruña, Spain
| | - Alejandro Pazos
- Department of Computer Science, Faculty of Computer Science, University of A Coruña, A Coruña, 15071, Spain.,Instituto de Investigación Biomédica de A Coruña (INIBIC). Complexo Hospitalario Universitario de A Coruña, A Coruña, Spain
| | - Carlos Fernandez-Lozano
- Department of Computer Science, Faculty of Computer Science, University of A Coruña, A Coruña, 15071, Spain. .,Instituto de Investigación Biomédica de A Coruña (INIBIC). Complexo Hospitalario Universitario de A Coruña, A Coruña, Spain.
| |
Collapse
|
6
|
Barreiro E, Munteanu CR, Cruz-Monteagudo M, Pazos A, González-Díaz H. Net-Net Auto Machine Learning (AutoML) Prediction of Complex Ecosystems. Sci Rep 2018; 8:12340. [PMID: 30120369 PMCID: PMC6098100 DOI: 10.1038/s41598-018-30637-w] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2018] [Accepted: 07/24/2018] [Indexed: 11/09/2022] Open
Abstract
Biological Ecosystem Networks (BENs) are webs of biological species (nodes) establishing trophic relationships (links). Experimental confirmation of all possible links is difficult and generates a huge volume of information. Consequently, computational prediction becomes an important goal. Artificial Neural Networks (ANNs) are Machine Learning (ML) algorithms that may be used to predict BENs, using as input Shannon entropy information measures (Shk) of known ecosystems to train them. However, it is difficult to select a priori which ANN topology will have a higher accuracy. Interestingly, Auto Machine Learning (AutoML) methods focus on the automatic selection of the more efficient ML algorithms for specific problems. In this work, a preliminary study of a new approach to AutoML selection of ANNs is proposed for the prediction of BENs. We call it the Net-Net AutoML approach, because it uses for the first time Shk values of both networks involving BENs (networks to be predicted) and ANN topologies (networks to be tested). Twelve types of classifiers have been tested for the Net-Net model including linear, Bayesian, trees-based methods, multilayer perceptrons and deep neuronal networks. The best Net-Net AutoML model for 338,050 outputs of 10 ANN topologies for links of 69 BENs was obtained with a deep fully connected neuronal network, characterized by a test accuracy of 0.866 and a test AUROC of 0.935. This work paves the way for the application of Net-Net AutoML to other systems or ML algorithms.
Collapse
Affiliation(s)
- Enrique Barreiro
- Department of Computation, Computer Science Faculty, University of A Coruna (UDC), 15071, A Coruña, Spain.,Center for Computational Science (CCS), University of Miami (UM), Miami, 33136, FL, USA.,West Coast University, Miami Campus, 33178, FL, USA
| | - Cristian R Munteanu
- Department of Computation, Computer Science Faculty, University of A Coruna (UDC), 15071, A Coruña, Spain
| | - Maykel Cruz-Monteagudo
- Center for Computational Science (CCS), University of Miami (UM), Miami, 33136, FL, USA.,West Coast University, Miami Campus, 33178, FL, USA
| | - Alejandro Pazos
- Biomedical Research Institute of A Coruña (INIBIC), University Hospital Complex of A Coruña (CHUAC), A Coruña, 15006, Spain
| | - Humbert González-Díaz
- Faculty of Science and Technology, University of the Basque Country (UPV/EHU), 48940, Biscay, Spain. .,IKERBASQUE, Basque Foundation for Science, 48011, Bilbao, Biscay, Spain.
| |
Collapse
|
7
|
Jalilvand A, Akbari B, Zare Mirakabad F. S-FLN: A sequence-based hierarchical approach for functional linkage network construction. J Theor Biol 2018; 437:149-162. [PMID: 29080781 DOI: 10.1016/j.jtbi.2017.10.021] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2016] [Revised: 07/27/2017] [Accepted: 10/18/2017] [Indexed: 11/24/2022]
Abstract
The functional linkage network (FLN) construction is a primary and important step in drug discovery and disease gene prioritization methods. In order to construct FLN, several methods have been introduced based on integration of various biological data. Although, there are impressive ideas behind these methods, they suffer from low quality of the biological data. In this paper, a hierarchical sequence-based approach is proposed to construct FLN. The proposed approach, denoted as S-FLN (Sequence-based Functional Linkage Network), uses the sequence of proteins as the primary data in three main steps. Firstly, the physicochemical properties of amino-acids are employed to describe the functionality of proteins. As the sequence of proteins is a more comprehensive and accurate primary data, more reliable relations are achieved. Secondly, seven different descriptor methods are used to extract feature vectors from the proteins sequences. Advantage of different descriptor methods lead to obtain diverse ensemble learners in the next step. Finally, a two-layer ensemble learning structure is proposed to calculated the score of protein pairs. The proposed approach has been evaluated using two biological datasets, S.Cerevisiae and H.Pylori, and resulted in 93.9% and 91.15% precision rates, respectively. The results of various experiments indicate the efficiency and validity of the proposed approach.
Collapse
Affiliation(s)
- A Jalilvand
- Department of Electronic and computer engineering,Tarbiat Modares University, Tehran, Iran
| | - B Akbari
- Department of Electronic and computer engineering,Tarbiat Modares University, Tehran, Iran.
| | - F Zare Mirakabad
- Department of Mathematics and Computer Science, Amirkabir University of Technology, Tehran, Iran
| |
Collapse
|
8
|
Akbar S, Hayat M, Iqbal M, Jan MA. iACP-GAEnsC: Evolutionary genetic algorithm based ensemble classification of anticancer peptides by utilizing hybrid feature space. Artif Intell Med 2017; 79:62-70. [PMID: 28655440 DOI: 10.1016/j.artmed.2017.06.008] [Citation(s) in RCA: 90] [Impact Index Per Article: 12.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2017] [Revised: 06/12/2017] [Accepted: 06/16/2017] [Indexed: 01/10/2023]
Abstract
Cancer is a fatal disease, responsible for one-quarter of all deaths in developed countries. Traditional anticancer therapies such as, chemotherapy and radiation, are highly expensive, susceptible to errors and ineffective techniques. These conventional techniques induce severe side-effects on human cells. Due to perilous impact of cancer, the development of an accurate and highly efficient intelligent computational model is desirable for identification of anticancer peptides. In this paper, evolutionary intelligent genetic algorithm-based ensemble model, 'iACP-GAEnsC', is proposed for the identification of anticancer peptides. In this model, the protein sequences are formulated, using three different discrete feature representation methods, i.e., amphiphilic Pseudo amino acid composition, g-Gap dipeptide composition, and Reduce amino acid alphabet composition. The performance of the extracted feature spaces are investigated separately and then merged to exhibit the significance of hybridization. In addition, the predicted results of individual classifiers are combined together, using optimized genetic algorithm and simple majority technique in order to enhance the true classification rate. It is observed that genetic algorithm-based ensemble classification outperforms than individual classifiers as well as simple majority voting base ensemble. The performance of genetic algorithm-based ensemble classification is highly reported on hybrid feature space, with an accuracy of 96.45%. In comparison to the existing techniques, 'iACP-GAEnsC' model has achieved remarkable improvement in terms of various performance metrics. Based on the simulation results, it is observed that 'iACP-GAEnsC' model might be a leading tool in the field of drug design and proteomics for researchers.
Collapse
Affiliation(s)
- Shahid Akbar
- Department of Computer Science, Abdul Wali Khan University Mardan, KP 23200, Pakistan.
| | - Maqsood Hayat
- Department of Computer Science, Abdul Wali Khan University Mardan, KP 23200, Pakistan.
| | - Muhammad Iqbal
- Department of Computer Science, Abdul Wali Khan University Mardan, KP 23200, Pakistan.
| | - Mian Ahmad Jan
- Department of Computer Science, Abdul Wali Khan University Mardan, KP 23200, Pakistan.
| |
Collapse
|
9
|
Fernandez-Lozano C, Cuiñas RF, Seoane JA, Fernández-Blanco E, Dorado J, Munteanu CR. Classification of signaling proteins based on molecular star graph descriptors using Machine Learning models. J Theor Biol 2015; 384:50-8. [DOI: 10.1016/j.jtbi.2015.07.038] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2015] [Revised: 07/20/2015] [Accepted: 07/27/2015] [Indexed: 12/11/2022]
|
10
|
Martins F, Gonçalves R, Oliveira J, Cruz-Monteagudo M, Nieto-Villar JM, Paz-y-Miño C, Rebelo I, Tejera E. Unravelling the relationship between protein sequence and low-complexity regions entropies: Interactome implications. J Theor Biol 2015; 382:320-7. [PMID: 26164061 DOI: 10.1016/j.jtbi.2015.06.049] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2015] [Revised: 06/12/2015] [Accepted: 06/28/2015] [Indexed: 10/23/2022]
Abstract
Low-complexity regions are sub-sequences of biased composition in a protein sequence. The influence of these regions over protein evolution, specific functions and highly interactive capacities is well known. Although protein sequence entropy has been largely studied, its relationship with low-complexity regions and the subsequent effects on protein function remains unclear. In this work we propose a theoretical and empirical model integrating the sequence entropy with local complexity parameters. Our results indicate that the protein sequence entropy is related with the protein length, the entropies inside and outside the low-complexity regions as well as their number and average size. We found a small but significant increment in the sequence entropy of hubs proteins. In agreement with our theoretical model, this increment is highly dependent of the balance between the increment of protein length and average size of the low-complexity regions. Finally, our models and proteins analysis provide evidence supporting that modifications in the average size is more relevant in hubs proteins than changes in the number of low-complexity regions.
Collapse
Affiliation(s)
- F Martins
- Department of Biochemistry, Faculty of Pharmacy, University of Porto, Portugal
| | - R Gonçalves
- Department of Biochemistry, Faculty of Pharmacy, University of Porto, Portugal
| | - J Oliveira
- Department of Biochemistry, Faculty of Pharmacy, University of Porto, Portugal
| | - M Cruz-Monteagudo
- Instituto de Investigaciones Biomédicas, Universidad de las Américas, Quito, Ecuador
| | - J M Nieto-Villar
- Dpto. de Química-Física, Fac. de Química, Universidad de La Habana, Cuba. Cátedra de Sistemas Complejos "H. Poincaré", Universidad de La Habana, Cuba
| | - C Paz-y-Miño
- Instituto de Investigaciones Biomédicas, Universidad de las Américas, Quito, Ecuador
| | - I Rebelo
- Department of Biochemistry, Faculty of Pharmacy, University of Porto, Portugal; UCIBIO@REQUIMTE, Portugal.
| | - E Tejera
- Instituto de Investigaciones Biomédicas, Universidad de las Américas, Quito, Ecuador
| |
Collapse
|
11
|
Cai ZQ, Si SB, Chen C, Zhao Y, Ma YY, Wang L, Geng ZM. Analysis of prognostic factors for survival after hepatectomy for hepatocellular carcinoma based on a bayesian network. PLoS One 2015; 10:e0120805. [PMID: 25826337 PMCID: PMC4380349 DOI: 10.1371/journal.pone.0120805] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2014] [Accepted: 02/06/2015] [Indexed: 12/22/2022] Open
Abstract
Background The prognosis of hepatocellular carcinoma (HCC) after hepatectomy involves many factors. Previous studies have evaluated the separate influences of single factors; few have considered the combined influence of various factors. This paper combines the Bayesian network (BN) with importance measures to identify key factors that have significant effects on survival time. Methods A dataset of 299 patients with HCC after hepatectomy was studied to establish a BN using a tree-augmented naïve Bayes algorithm that could mine relationships between factors. The composite importance measure was applied to rank the impact of factors on survival time. Results 124 patients (>10 months) and 77 patients (≤10 months) were correctly classified. The accuracy of BN model was 67.2%. For patients with long survival time (>10 months), the true-positive rate of the model was 83.22% and the false-positive rate was 48.67%. According to the model, the preoperative alpha fetoprotein (AFP) level and postoperative performance of transcatheter arterial chemoembolization (TACE) were independent factors for survival of HCC patients. The grade of preoperative liver function reflected the tendency for postoperative complications. Intraoperative blood loss, tumor size, portal vein tumor thrombosis (PVTT), time of clamping the porta hepatis, tumor number, operative method, and metastasis were dependent variables in survival time prediction. PVTT was considered the most significant for the prognosis of survival time. Conclusions Using the BN and importance measures, PVTT was identified as the most significant predictor of survival time for patients with HCC after hepatectomy.
Collapse
Affiliation(s)
- Zhi-qiang Cai
- Department of Hepatobiliary Surgery, First Affiliated Hospital of Xi’an Jiaotong University, College of Medicine, Xi’an 710061, Shaanxi, China
- Department of Industrial Engineering, School of Mechanical Engineering, Northwestern Polytechnical University, Xi’an 710072, Shaanxi, China
| | - Shu-bin Si
- Department of Industrial Engineering, School of Mechanical Engineering, Northwestern Polytechnical University, Xi’an 710072, Shaanxi, China
| | - Chen Chen
- Department of Hepatobiliary Surgery, First Affiliated Hospital of Xi’an Jiaotong University, College of Medicine, Xi’an 710061, Shaanxi, China
| | - Yaling Zhao
- Department of Epidemiology and Biostatistics, School of Public Health, Xi’an Jiaotong University Health Science Center, Xi’an 710061, Shaanxi, China
| | - Yong-yi Ma
- Department of Hepatobiliary Surgery, First Affiliated Hospital of Xi’an Jiaotong University, College of Medicine, Xi’an 710061, Shaanxi, China
| | - Lin Wang
- Department of Hepatobiliary Surgery, First Affiliated Hospital of Xi’an Jiaotong University, College of Medicine, Xi’an 710061, Shaanxi, China
| | - Zhi-min Geng
- Department of Hepatobiliary Surgery, First Affiliated Hospital of Xi’an Jiaotong University, College of Medicine, Xi’an 710061, Shaanxi, China
- * E-mail:
| |
Collapse
|
12
|
Barresi V, Bonaccorso C, Consiglio G, Goracci L, Musso N, Musumarra G, Satriano C, Fortuna CG. Modeling, design and synthesis of new heteroaryl ethylenes active against the MCF-7 breast cancer cell-line. MOLECULAR BIOSYSTEMS 2013; 9:2426-9. [DOI: 10.1039/c3mb70151d] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
|
13
|
Fernández-Blanco E, Aguiar-Pulido V, Munteanu CR, Dorado J. Random Forest classification based on star graph topological indices for antioxidant proteins. J Theor Biol 2013; 317:331-7. [DOI: 10.1016/j.jtbi.2012.10.006] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2012] [Revised: 09/17/2012] [Accepted: 10/02/2012] [Indexed: 10/27/2022]
|
14
|
Ghosh A, Chattopadhyay S, Chawla-Sarkar M, Nandy P, Nandy A. In silico study of rotavirus VP7 surface accessible conserved regions for antiviral drug/vaccine design. PLoS One 2012; 7:e40749. [PMID: 22844409 PMCID: PMC3406019 DOI: 10.1371/journal.pone.0040749] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2012] [Accepted: 06/12/2012] [Indexed: 11/23/2022] Open
Abstract
Background Rotaviral diarrhoea kills about half a million children annually in developing countries and accounts for one third of diarrhea related hospitalizations. Drugs and vaccines against the rotavirus are handicapped, as in all viral diseases, by the rapid mutational changes that take place in the DNA and protein sequences rendering most of these ineffective. As of now only two vaccines are licensed and approved by the WHO (World Health Organization), but display reduced efficiencies in the underdeveloped countries where the disease is more prevalent. We approached this issue by trying to identify regions of surface exposed conserved segments on the surface glycoproteins of the virion, which may then be targeted by specific peptide vaccines. We had developed a bioinformatics protocol for these kinds of problems with reference to the influenza neuraminidase protein, which we have refined and expanded to analyze the rotavirus issue. Results Our analysis of 433 VP7 (Viral Protein 7 from rotavirus) surface protein sequences across 17 subtypes encompassing mammalian hosts using a 20D Graphical Representation and Numerical Characterization method, identified four possible highly conserved peptide segments. Solvent accessibility prediction servers were used to identify that these are predominantly surface situated. These regions analyzed through selected epitope prediction servers for their epitopic properties towards possible T-cell and B-cell activation showed good results as epitopic candidates (only dry lab confirmation). Conclusions The main reasons for the development of alternative vaccine strategies for the rotavirus are the failure of current vaccines and high production costs that inhibit their application in developing countries. We expect that it would be possible to use the protein surface exposed regions identified in our study as targets for peptide vaccines and drug designs for stable immunity against divergent strains of the rotavirus. Though this study is fully dependent on computational prediction algorithms, it provides a platform for wet lab experiments.
Collapse
Affiliation(s)
- Ambarnil Ghosh
- Physics Department, Jadavpur University, Kolkata, West Bengal, India
| | - Shiladitya Chattopadhyay
- Division of Virology, National Institute of Cholera and Enteric Diseases, Kolkata, West Bengal, India
| | - Mamta Chawla-Sarkar
- Division of Virology, National Institute of Cholera and Enteric Diseases, Kolkata, West Bengal, India
| | - Papiya Nandy
- Physics Department, Jadavpur University, Kolkata, West Bengal, India
| | - Ashesh Nandy
- Centre for Interdisciplinary Research and Education, Kolkata, West Bengal, India
- * E-mail:
| |
Collapse
|