1
|
Shahid M, Ilyas M, Hussain W, Khan YD. ORI-Deep: improving the accuracy for predicting origin of replication sites by using a blend of features and long short-term memory network. Brief Bioinform 2022; 23:6511972. [PMID: 35048955 DOI: 10.1093/bib/bbac001] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2021] [Revised: 12/30/2021] [Accepted: 01/02/2022] [Indexed: 11/14/2022] Open
Abstract
Replication of DNA is an important process for the cell division cycle, gene expression regulation and other biological evolution processes. It also has a crucial role in a living organism's physical growth and structure. Replication of DNA comprises of three stages known as initiation, elongation and termination, whereas the origin of replication sites (ORI) is the location of initiation of the DNA replication process. There exist various methodologies to identify ORIs in the genomic sequences, however, these methods have used either extensive computations for execution, or have limited optimization for the large datasets. Herein, a model called ORI-Deep is proposed to identify ORIs from the multiple cell type genomic sequence benchmark data. An efficient method is proposed using a deep neural network to identify ORIs for four different eukaryotic species. For better representation of data, a feature vector is constructed using statistical moments for the training and testing of data and is further fed to a long short-term memory (LSTM) network. To prove the effectiveness of the proposed model, we applied several validation techniques at different levels to obtain seven accuracy metrics, and the accuracy score for self-consistency, 10-fold cross-validation, jackknife and the independent set test is observed to be 0.977, 0.948, 0.976 and 0.977, respectively. Based on the results, it can be concluded that ORI-Deep can efficiently predict the sites of origin replication in DNA sequence with high accuracy. Webserver for ORI-Deep is available at (https://share.streamlit.io/waqarhusain/orideep/main/app.py), whereas source code is available at (https://github.com/WaqarHusain/OriDeep).
Collapse
Affiliation(s)
- Mahwish Shahid
- School of Systems and Technologies, University of Management and Technology, Lahore, Pakistan
| | - Maham Ilyas
- University of Management and Technology, Lahore, Pakistan
| | - Waqar Hussain
- University of Management and Technology, Lahore, Pakistan
| | - Yaser Daanial Khan
- Department of Computer Science, University of Management and Technology, Lahore, Pakistan
| |
Collapse
|
2
|
Malebary SJ, Khan YD. Evaluating machine learning methodologies for identification of cancer driver genes. Sci Rep 2021; 11:12281. [PMID: 34112883 PMCID: PMC8192921 DOI: 10.1038/s41598-021-91656-8] [Citation(s) in RCA: 35] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2021] [Accepted: 05/19/2021] [Indexed: 02/06/2023] Open
Abstract
Cancer is driven by distinctive sorts of changes and basic variations in genes. Recognizing cancer driver genes is basic for accurate oncological analysis. Numerous methodologies to distinguish and identify drivers presently exist, but efficient tools to combine and optimize them on huge datasets are few. Most strategies for prioritizing transformations depend basically on frequency-based criteria. Strategies are required to dependably prioritize organically dynamic driver changes over inert passengers in high-throughput sequencing cancer information sets. This study proposes a model namely PCDG-Pred which works as a utility capable of distinguishing cancer driver and passenger attributes of genes based on sequencing data. Keeping in view the significance of the cancer driver genes an efficient method is proposed to identify the cancer driver genes. Further, various validation techniques are applied at different levels to establish the effectiveness of the model and to obtain metrics like accuracy, Mathew's correlation coefficient, sensitivity, and specificity. The results of the study strongly indicate that the proposed strategy provides a fundamental functional advantage over other existing strategies for cancer driver genes identification. Subsequently, careful experiments exhibit that the accuracy metrics obtained for self-consistency, independent set, and cross-validation tests are 91.08%., 87.26%, and 92.48% respectively.
Collapse
Affiliation(s)
- Sharaf J Malebary
- Department of Information Technology, Faculty of Computing and Information Technology, King Abdulaziz University, P.O. Box 344, Rabigh, 21911, Saudi Arabia
| | - Yaser Daanial Khan
- Department of Computer Science, School of Systems and Technology, University of Management and Technology, Lahore, Pakistan.
| |
Collapse
|
3
|
Liu GH, Zhang BW, Qian G, Wang B, Mao B, Bichindaritz I. Bioimage-Based Prediction of Protein Subcellular Location in Human Tissue with Ensemble Features and Deep Networks. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2020; 17:1966-1980. [PMID: 31107658 DOI: 10.1109/tcbb.2019.2917429] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Prediction of protein subcellular location has currently become a hot topic because it has been proven to be useful for understanding both the disease mechanisms and novel drug design. With the rapid development of automated microscopic imaging technology in recent years, classification methods of bioimage-based protein subcellular location have attracted considerable attention for images can describe the protein distribution intuitively and in detail. In the current study, a prediction method of protein subcellular location was proposed based on multi-view image features that are extracted from three different views, including the four texture features of the original image, the global and local features of the protein extracted from the protein channel images after color segmentation, and the global features of DNA extracted from the DNA channel image. Finally, the extracted features were combined together to improve the performance of subcellular localization prediction. From the performance comparison of different combination features under the same classifier, the best ensemble features could be obtained. In this work, a classifier based on Stacked Auto-encoders and the random forest was also put forward. To improve the prediction results, the deep network was combined with the traditional statistical classification methods. Stringent cross-validation and independent validation tests on the benchmark dataset demonstrated the efficacy of the proposed method.
Collapse
|
4
|
Agrawal K, Shankar J, Kumar R, Verma P. Insight into multicopper oxidase laccase from Myrothecium verrucaria ITCC-8447: a case study using in silico and experimental analysis. JOURNAL OF ENVIRONMENTAL SCIENCE AND HEALTH. PART. B, PESTICIDES, FOOD CONTAMINANTS, AND AGRICULTURAL WASTES 2020; 55:1048-1060. [PMID: 32877269 DOI: 10.1080/03601234.2020.1812334] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
The oxidation activity of multicopper-oxidases overlaps with different substrates of laccases and bilirubin oxidases, thus in the present study an integrated approach of bioinformatics using homology modeling, docking, and experimental validation was used to confirm the type of multicopper-oxidase in Myrothecium verrucaria ITCC-8447. The result of peptide sequence of M. verrucaria ITCC-8447 enabled to predict the 3 D-structure of multicopper-oxidase. It was overlapped with the structure of laccase and root mean square deviation (RMSD) was 1.53 Å for 533 and, 171 residues. The low binding energy with azino-bis (3-ethylbenzothiazoline-6-sulfonic acid) (ABTS) (-5.64) as compared to bilirubin (-4.39) suggested that M. verrucaria ITCC-8447 have laccase-like activity. The experimental analysis confirmed high activity with laccase specific substrates, phenol (18.3 U/L), ampyrone (172.4 U/L) and, ampyrone phenol coupling (50 U/L) as compared to bilirubin oxidase substrate bilirubin (16.6 U/L). In addition, lowest binding energy with ABTS (-5.64), syringaldazine SYZ (-4.83), guaiacol GCL (-4.42), and 2,6-dimethoxyphenol DMP (-4.41) confirmed the presence of laccase. Further, complete remediation of two hazardous model pollutants i.e., phenol and resorcinol (1.5 mM) after 12 h of incubation and low binding energy of -4.32 and, -4.85 respectively confirmed its removal by laccase. The results confirmed the presence of laccase in M. verrucaria ITCC-8447 and its effective bioremediation potential.
Collapse
Affiliation(s)
- Komal Agrawal
- Bioprocess and Bioenergy Laboratory, Department of Microbiology, Central University of Rajasthan, Ajmer, India
| | - Jata Shankar
- Genomics Laboratory, Department of Biotechnology and Bioinformatics, Jaypee University of Information Technology, Solan, India
| | - Raj Kumar
- Genomics Laboratory, Department of Biotechnology and Bioinformatics, Jaypee University of Information Technology, Solan, India
| | - Pradeep Verma
- Bioprocess and Bioenergy Laboratory, Department of Microbiology, Central University of Rajasthan, Ajmer, India
| |
Collapse
|
5
|
Agrawal K, Shankar J, Verma P. Multicopper oxidase (MCO) laccase from Stropharia sp. ITCC-8422: an apparent authentication using integrated experimental and in silico analysis. 3 Biotech 2020; 10:413. [PMID: 32983824 DOI: 10.1007/s13205-020-02399-8] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2020] [Accepted: 08/17/2020] [Indexed: 11/29/2022] Open
Abstract
In the present study, specificity of laccase from Stropharia sp. ITCC-8422 against various substrates, i.e. 2,2-azino-bis (3-ethylbenzothiazoline-6-sulfonic acid) (ABTS), 2,6-dimethoxyphenol (DMP), guaiacol (GCL) and syringaldazine (SYZ) was determined. It exhibited maximum affinity against ABTS, followed by DMP and negligible activity for GCL and SYZ. As the concentration of substrate increased from 0.5 to 1.5 mM (ABTS) and 1 to 5 mM (DMP), the activity increased from 301.1 to 567.8 U/L and 254.4 to 436.2 U/L. Further, quadrupole time-of-flight liquid chromatography mass spectrometry (QTOF-LCMS) analysis of the extracellular proteome of Stropharia sp. ITCC-8422 identified eighty-four (84) extracellular proteins. The peptide sequence for the enzyme of interest exhibited sequence similarity with laccase-5 of Trametes pubescens. Using high molecular mass sequence of laccase-5, the protein structure of laccase was modelled and binding energy of laccase with four substrates, i.e. ABTS (- 5.65), DMP (- 4.65), GCL (- 4.66) and SYZ (- 5.5) was determined using autodock tool. The experimental and in silico analyses revealed maximum activity of laccase and lowest binding energy with ABTS. Besides, laccase was purified and it exhibited 2.1-fold purification with purification yield of 20.4% and had stability of 70% at pH 5-9 and 30-40 ℃. In addition, the bioremediation potential of laccase was explored by in silico analysis, where the binding energy of laccase with alizarin cyanine green was - 6.37 and both in silico work and experimental work were in agreement.
Collapse
Affiliation(s)
- Komal Agrawal
- Bioprocess and Bioenergy Laboratory, Department of Microbiology, Central University of Rajasthan, NH-8, Bandarsindari, Kishangarh, Ajmer, 305817 India
| | - Jata Shankar
- Genomics Laboratory, Department of Biotechnology and Bioinformatics, Jaypee University of Information Technology, Solan, 173234 Himachal Pradesh India
| | - Pradeep Verma
- Bioprocess and Bioenergy Laboratory, Department of Microbiology, Central University of Rajasthan, NH-8, Bandarsindari, Kishangarh, Ajmer, 305817 India
| |
Collapse
|
6
|
Ganaie IA, Malik MZ, Naqvi SH, Jain SK, Wajid S. Differential levels of Alpha-1-inhibitor III, Immunoglobulin heavy chain variable region, and Hypertrophied skeletal muscle protein GTF3 in rat mammary tumorigenesis. Biochimie 2020; 174:57-68. [PMID: 32325114 DOI: 10.1016/j.biochi.2020.04.013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2019] [Revised: 04/10/2020] [Accepted: 04/12/2020] [Indexed: 11/15/2022]
Abstract
Early detection of breast cancer can be best facilitated by the development of precancerous markers. Serum proteins being the sensitive signatures, can be the ideal choice. We previously demonstrated the reduced levels of two serum proteins at a very early stage of tumorigenesis in a breast cancer model, developed in Wistar rats by 7,12-dimethylbenz[a]anthracene (DMBA) administration. Here we report the dysregulation of three more proteins in the serum collected at another early stage (15 weeks) of tumorigenesis in the same model. The proteins were identified (as Alpha-1-inhibitor III (Mug1), Immunoglobulin heavy chain variable region (IGHV), and Hypertrophied skeletal muscle protein GTF3) by MALDI-TOF MS after the screening and fingerprinting of serum samples by one-dimensional (1D) and two-dimensional (2D) electrophoresis respectively. Relative expression analysis of corresponding genes was also carried out, and the results were found as supporting the proteomic findings. In addition, the candidate proteins of the study and their corresponding ribonucleic acids (RNAs) were subjected to homology modelling and docking (using softwares like MODELLER, 3dRNA, Autodock4.0, and GROMACS etc), which revealed the binding sites for carcinogen (DMBA) and its nature of interaction with proteins and RNAs. Moreover, the network analysis by GeneMANIA unraveled the protein/gene functional network in which Mug1, IGHV, and GTF3 are involved. Based on the significant protein and gene expression alterations in early tumorigenesis, these proteins may prove very effective in search for biomarkers for the early detection of mammary cancer. Further, these proteins can also be tried as targets for chemotherapy.
Collapse
Affiliation(s)
- Ishfaq Ahmad Ganaie
- Department of Biotechnology, School of Chemical and Life Sciences, JamiaHamdard, New Delhi, 110062, India
| | - Md Zubbair Malik
- School of Computational and Integrative Sciences, Jawaharlal Nehru University, New Delhi, 110067, India
| | | | - Swatantra Kumar Jain
- Department of Biochemistry, Hamdard Institute of Medical Sciences and Research, JamiaHamdard, New Delhi, 110062, India
| | - Saima Wajid
- Department of Biotechnology, School of Chemical and Life Sciences, JamiaHamdard, New Delhi, 110062, India.
| |
Collapse
|
7
|
Redkar S, Mondal S, Joseph A, Hareesha KS. A Machine Learning Approach for Drug-target Interaction Prediction using Wrapper Feature Selection and Class Balancing. Mol Inform 2020; 39:e1900062. [PMID: 32003548 DOI: 10.1002/minf.201900062] [Citation(s) in RCA: 27] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2019] [Accepted: 01/28/2020] [Indexed: 01/19/2023]
Abstract
Drug-Target interaction (DTI) plays a crucial role in drug discovery, drug repositioning and understanding the drug side effects which helps to identify new therapeutic profiles for various diseases. However, the exponential growth in the genomic and drugs data makes it difficult to identify the new associations between drugs and targets. Therefore, we use computational methods as it helps in accelerating the DTI identification process. Usually, available data driven sources consisting of known DTI is used to train the classifier to predict the new DTIs. Such datasets often face the problem of class imbalance. Therefore, in this study we address two challenges faced by such datasets, i. e., class imbalance and high dimensionality to develop a predictive model for DTI prediction. The study is carried out on four protein classes namely Enzyme, Ion Channel, G Protein-Coupled Receptor (GPCR) and Nuclear Receptor. We encoded the target protein sequence using the dipeptide composition and drug with a molecular descriptor. A machine learning approach is employed to predict the DTI using wrapper feature selection and synthetic minority oversampling technique (SMOTE). The ensemble approach achieved at the best an accuracy of 95.9 %, 93.4 %, 90.8 % and 90.6 % and 96.3 %, 92.8 %, 90.1 %, and 90.2 % of precision on Enzyme, Ion Channel, GPCR and Nuclear Receptor datasets, respectively, when evaluated excluding SMOTE samples with 10-fold cross validation. Furthermore, our method could predict new drug-target interactions not contained in training dataset. Selected features using wrapper feature selection may be important to understand the DTI for the protein categories under this study. Based on our evaluation, the proposed method can be used for understanding and identifying new drug-target interactions. We provide the readers with a standalone package available at https://github.com/shwetagithub1/predDTI which will be able to provide the DTI predictions to user for new query DTI pairs.
Collapse
Affiliation(s)
- Shweta Redkar
- Department of Computer Applications, Manipal Institute of Technology, Manipal Academy of Higher Education, 576104, Manipal, Karnataka, India
| | - Sukanta Mondal
- Department of Biological Sciences, Birla Institute of Technology and Science-Pilani, K.K.Birla Goa Campus, 403726, Zuarinagar, Goa, -India
| | - Alex Joseph
- Department of Pharmaceutical Chemistry, Manipal College of Pharmaceutical Sciences, Manipal Academy of Higher Education, 576104, Manipal, Karnataka, India
| | - K S Hareesha
- Department of Computer Applications, Manipal Institute of Technology, Manipal Academy of Higher Education, 576104, Manipal, Karnataka, India
| |
Collapse
|
8
|
Bagherian M, Sabeti E, Wang K, Sartor MA, Nikolovska-Coleska Z, Najarian K. Machine learning approaches and databases for prediction of drug-target interaction: a survey paper. Brief Bioinform 2020; 22:247-269. [PMID: 31950972 PMCID: PMC7820849 DOI: 10.1093/bib/bbz157] [Citation(s) in RCA: 189] [Impact Index Per Article: 37.8] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2019] [Revised: 11/01/2019] [Accepted: 11/07/2019] [Indexed: 12/12/2022] Open
Abstract
The task of predicting the interactions between drugs and targets plays a key role in the process of drug discovery. There is a need to develop novel and efficient prediction approaches in order to avoid costly and laborious yet not-always-deterministic experiments to determine drug–target interactions (DTIs) by experiments alone. These approaches should be capable of identifying the potential DTIs in a timely manner. In this article, we describe the data required for the task of DTI prediction followed by a comprehensive catalog consisting of machine learning methods and databases, which have been proposed and utilized to predict DTIs. The advantages and disadvantages of each set of methods are also briefly discussed. Lastly, the challenges one may face in prediction of DTI using machine learning approaches are highlighted and we conclude by shedding some lights on important future research directions.
Collapse
Affiliation(s)
- Maryam Bagherian
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, 48109, USA
| | - Elyas Sabeti
- Michigan Institute for Data Science, University of Michigan, Ann Arbor, MI, 48109, USA
| | - Kai Wang
- Department of Biostatistics, School of Public Health, University of Michigan, Ann Arbor, MI, 48109, USA
| | - Maureen A Sartor
- Department of Pathology, University of Michigan, Ann Arbor, MI, 48109, USA
| | | | - Kayvan Najarian
- Department of Electrical Engineering and Computer Science, College of Engineering, University of Michigan, Ann Arbor, MI, 48109, USA
| |
Collapse
|
9
|
Terán JE, Marrero-Ponce Y, Contreras-Torres E, García-Jacas CR, Vivas-Reyes R, Terán E, Torres FJ. Tensor Algebra-based Geometrical (3D) Biomacro-Molecular Descriptors for Protein Research: Theory, Applications and Comparison with other Methods. Sci Rep 2019; 9:11391. [PMID: 31388082 PMCID: PMC6684663 DOI: 10.1038/s41598-019-47858-2] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2019] [Accepted: 07/22/2019] [Indexed: 11/16/2022] Open
Abstract
In this report, a new type of tridimensional (3D) biomacro-molecular descriptors for proteins are proposed. These descriptors make use of multi-linear algebra concepts based on the application of 3-linear forms (i.e., Canonical Trilinear (Tr), Trilinear Cubic (TrC), Trilinear-Quadratic-Bilinear (TrQB) and so on) as a specific case of the N-linear algebraic forms. The definition of the kth 3-tuple similarity-dissimilarity spatial matrices (Tensor's Form) are used for the transformation and for the representation of the existing chemical information available in the relationships between three amino acids of a protein. Several metrics (Minkowski-type, wave-edge, etc) and multi-metrics (Triangle area, Bond-angle, etc) are proposed for the interaction information extraction, as well as probabilistic transformations (e.g., simple stochastic and mutual probability) to achieve matrix normalization. A generalized procedure considering amino acid level-based indices that can be fused together by using aggregator operators for descriptors calculations is proposed. The obtained results demonstrated that the new proposed 3D biomacro-molecular indices perform better than other approaches in the SCOP-based discrimination and the prediction of folding rate of proteins by using simple linear parametrical models. It can be concluded that the proposed method allows the definition of 3D biomacro-molecular descriptors that contain orthogonal information capable of providing better models for applications in protein science.
Collapse
Affiliation(s)
- Julio E Terán
- Universidad San Francisco de Quito (USFQ), Grupo de Medicina Molecular y Translacional (MeM&T), Colegio de Ciencias de la Salud (COCSA), Escuela de Medicina, Edificio de Especialidades Médicas, Quito, Pichincha, Ecuador
- Universidad San Francisco de Quito (USFQ), Grupo de Química Computacional y Teórica (QCT-USFQ), Departamento de Ingeniería Química, and Instituto de Simulación Computacional (ISC-USFQ), Quito, Pichincha, Ecuador
| | - Yovani Marrero-Ponce
- Universidad San Francisco de Quito (USFQ), Grupo de Medicina Molecular y Translacional (MeM&T), Colegio de Ciencias de la Salud (COCSA), Escuela de Medicina, Edificio de Especialidades Médicas, Quito, Pichincha, Ecuador.
- Universidad de San Buenaventura - Cartagena - Facultad de Ciencias de la Salud - Grupo de Investigación Microbiología & Ambiente (GIMA) - Calle Real de Ternera, Diagonal 32, No. 30-966, Cartagena, Código postal: 1300 10, Colombia.
| | - Ernesto Contreras-Torres
- Universidad San Francisco de Quito (USFQ), Grupo de Medicina Molecular y Translacional (MeM&T), Colegio de Ciencias de la Salud (COCSA), Escuela de Medicina, Edificio de Especialidades Médicas, Quito, Pichincha, Ecuador
| | - César R García-Jacas
- Cátedras CONACYT - Departamento de Ciencia de la Computación, Centro de Investigación Científica y de Educación Superior de Ensenada (CICESE), Ensenada, Baja California, Mexico
| | - Ricardo Vivas-Reyes
- Grupo de Química Cuántica y Teórica de la Universidad de Cartagena-Facultad de Ciencias Exactas y Naturales. Programa de Química. Campus de San Pablo and Grupo GINUMED Corporacion Universitaria Rafal Nuñez. Facultad de Salud. Programa de Medicina., Cartagena, Colombia
- Grupo CipTec, Facultad de Ingenierias. Fundacion Universitaria Tecnologico Comfenalco - Cartagena, Cartagena, Bolívar, Colombia
| | - Enrique Terán
- Universidad San Francisco de Quito (USFQ), Grupo de Medicina Molecular y Translacional (MeM&T), Colegio de Ciencias de la Salud (COCSA), Escuela de Medicina, Edificio de Especialidades Médicas, Quito, Pichincha, Ecuador
| | - F Javier Torres
- Universidad San Francisco de Quito (USFQ), Grupo de Química Computacional y Teórica (QCT-USFQ), Departamento de Ingeniería Química, and Instituto de Simulación Computacional (ISC-USFQ), Quito, Pichincha, Ecuador
| |
Collapse
|
10
|
Liu B, Han L, Liu X, Wu J, Ma Q. Computational Prediction of Sigma-54 Promoters in Bacterial Genomes by Integrating Motif Finding and Machine Learning Strategies. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2019; 16:1211-1218. [PMID: 29993815 DOI: 10.1109/tcbb.2018.2816032] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Sigma factor, as a unit of RNA polymerase holoenzyme, is a critical factor in the process of gene transcriptional regulation. It recognizes the specific DNA sites and brings the core enzyme of RNA polymerase to the upstream regions of target genes. Therefore, the prediction of the promoters for a particular sigma factor is essential for interpreting functional genomic data and observation. This paper develops a new method to predict sigma-54 promoters in bacterial genomes. The new method organically integrates motif finding and machine learning strategies to capture the intrinsic features of sigma-54 promoters. The experiments on E. coli benchmark test set show that our method has good capability to distinguish sigma-54 promoters from surrounding or randomly selected DNA sequences. The applications of the other three bacterial genomes indicate the potential robustness and applicative power of our method on a large number of bacterial genomes. The source code of our method can be freely downloaded at https://github.com/maqin2001/PromotePredictor.
Collapse
|
11
|
Docking analysis of hexanoic acid and quercetin with seven domains of polyketide synthase A provided insight into quercetin-mediated aflatoxin biosynthesis inhibition in Aspergillus flavus. 3 Biotech 2019; 9:149. [PMID: 30944796 DOI: 10.1007/s13205-019-1675-y] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2018] [Accepted: 03/13/2019] [Indexed: 12/11/2022] Open
Abstract
Studies on phytochemicals as anti-aflatoxigenic agents have gained importance including quercetin. Thus, to understand the molecular mechanism behind inhibition of aflatoxin biosynthesis by quercetin, interaction study with polyketide synthase A (PksA) of Aspergillus flavus was undertaken. The 3D structure of seven domains of PksA was modeled using SWISS-MODEL server and docking studies were performed by Autodock tools-1.5.6. Docking energies of both the ligands (quercetin and hexanoic acid) were compared with each of the domains of PksA enzyme. Binding energy for quercetin was lesser that ranged from - 7.1 to - 5.25 kcal/mol in comparison to hexanoic acid (- 4.74 to - 3.54 kcal/mol). LigPlot analysis showed the formation of 12 H bonds in case of quercetin and 8 H bonds in hexanoic acid. During an interaction with acyltransferase domain, both ligands showed H bond formation at Arg63 position. Also, in product template domain, quercetin creates four H bonds in comparison to one in hexanoic acid. Our quantitative RT-PCR analysis of genes from aflatoxin biosynthesis showed downregulation of pksA, aflD, aflR, aflP and aflS at 24 h time point in comparison to 7 h in quercetin-treated A. flavus. Overall results revealed that quercetin exhibited the highest level of binding potential (more number of H bonds) with PksA domain in comparison to hexanoic acid; thus, quercetin possibly inhibits via competitively binding to the domains of polyketide synthase, a key enzyme of aflatoxin biosynthetic pathway. Further, we propose that key enzymes from aflatoxin biosynthetic pathway in aflatoxin-producing Aspergilli could be explored further using other phytochemicals as inhibitors.
Collapse
|
12
|
Wu J, Mai G, Deng B, Younseo J, Du D, Chen F, Ma Q. Quantitative Structure-activity Relationship of Acetylcholinesterase Inhibitors based on mRMR Combined with Support Vector Regression. LETT ORG CHEM 2019. [DOI: 10.2174/1570178615666181008125341] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
In this work, support vector regression (SVR), an effective machine learning method, proposed by Vapnik was applied to establish QSAR model for a series of AchEI. Fourteen descriptors were selected for constructing the SVR mode by using mRMR-Forward feature selection method. The parameters (ε, C) were adjusted by leave-one-out cross validation (LOOCV) method which was used to judge the predictive power of different models. After optimization, one optimal SVR-QSAR model was attained, and the mean relative errors (MRE) of LOOCV by using SVR is 1.72%. As a result, LogP negatively affected the activity, Refractivity and Water Accessible Surface Area positively affected the activity.
Collapse
Affiliation(s)
- Jiaxiang Wu
- Shanghai Key Laboratory of Bio-Crops, College of Life Science, Shanghai University, Shanghai, China
| | - Guozhao Mai
- Department of Rehabilitation Medicine, The People's Hospital of Heshan, Guangdong, China
| | - Bowen Deng
- Shanghai Key Laboratory of Bio-Crops, College of Life Science, Shanghai University, Shanghai, China
| | - Jeong Younseo
- Center for Bioinformatics and Computational Biology, Pai Chai University, Daejeon, South Korea
| | - Dongsu Du
- Shanghai Key Laboratory of Bio-Crops, College of Life Science, Shanghai University, Shanghai, China
| | - Fuxue Chen
- Shanghai Key Laboratory of Bio-Crops, College of Life Science, Shanghai University, Shanghai, China
| | - Qiaorong Ma
- Department of Clinical Laboratory, Minzu Hospital of Guangxi Zhuang Autonomous Region, Affiliated Minzu Hospital of Guangxi Medical University, Nanning, Guangxi, China
| |
Collapse
|
13
|
Jin W, Li QZ, Zuo YC, Cao YN, Zhang LQ, Hou R, Su WX. Relationship Between DNA Methylation in Key Region and the Differential Expressions of Genes in Human Breast Tumor Tissue. DNA Cell Biol 2018; 38:49-62. [PMID: 30346835 DOI: 10.1089/dna.2018.4276] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/27/2023] Open
Abstract
Breast cancer has a high mortality rate for females. Aberrant DNA methylation plays a crucial role in the occurrence and progression of breast carcinoma. By comparing DNA methylation differences between tumor breast tissue and normal breast tissue, we calculate and analyze the distributions of the hyper- and hypomethylation sites in different function regions. Results indicate that enhancer regions are often hypomethylated in breast cancer. CpG islands (CGIs) are mainly hypermethylated, while the flanking CGI (shores and shelves) is more easily hypomethylated. The hypomethylation in gene body region is related to the upregulation of gene expression, and the hypomethylation of enhancer regions is closely associated with gene expression upregulation in breast cancer. Some key hypomethylation sites in enhancer regions and key hypermethylation sites in CGIs for regulating key genes are, respectively, found, such as oncogenes ESR1 and ERBB2 and tumor suppressor genes FBLN2, CEBPA, and FAT4. This suggests that the recognizing methylation status of these genes will be useful for the diagnosis of breast cancer.
Collapse
Affiliation(s)
- Wen Jin
- 1 Laboratory of Theoretical Biophysics, School of Physical Science and Technology, Inner Mongolia University , Hohhot, China
| | - Qian-Zhong Li
- 1 Laboratory of Theoretical Biophysics, School of Physical Science and Technology, Inner Mongolia University , Hohhot, China .,2 The State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, Inner Mongolia University , Hohhot, China
| | - Yong-Chun Zuo
- 2 The State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, Inner Mongolia University , Hohhot, China
| | - Yan-Ni Cao
- 1 Laboratory of Theoretical Biophysics, School of Physical Science and Technology, Inner Mongolia University , Hohhot, China
| | - Lu-Qiang Zhang
- 1 Laboratory of Theoretical Biophysics, School of Physical Science and Technology, Inner Mongolia University , Hohhot, China
| | - Rui Hou
- 1 Laboratory of Theoretical Biophysics, School of Physical Science and Technology, Inner Mongolia University , Hohhot, China
| | - Wen-Xia Su
- 3 College of Science, Inner Mongolia Agricultural University , Hohhot, China
| |
Collapse
|
14
|
Rahman MS, Aktar U, Jani MR, Shatabda S. iPromoter-FSEn: Identification of bacterial σ 70 promoter sequences using feature subspace based ensemble classifier. Genomics 2018; 111:1160-1166. [PMID: 30059731 DOI: 10.1016/j.ygeno.2018.07.011] [Citation(s) in RCA: 26] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2018] [Revised: 07/07/2018] [Accepted: 07/12/2018] [Indexed: 10/28/2022]
Abstract
Sigma promoter sequences in bacterial genomes are important due to their role in transcription initiation. Sigma 70 is one of the most important and crucial sigma factors. In this paper, we address the problem of identification of σ70 promoter sequences in bacterial genome. We propose iPromoter-FSEn, a novel predictor for identification of σ70 promoter sequences. Our proposed method is based on a feature subspace based ensemble classifier. A large set of of features extracted from the sequence of nucleotides are divided into subsets and each subset is given to individual single classifiers to learn. Based on the decisions of the ensemble an aggregate decision is made by the ensemble voting classifier. We tested our method on a standard benchmark dataset extracted from experimentally validated results. Experimental results shows that iPromoter-FSEn significantly improves over the state-of-the art σ70 promoter sequence predictors. The accuracy and area under receiver operating characteristic curve of iPromoter-FSEn are 86.32% and 0.9319 respectively. We have also made our method readily available for use as an web application from: http://ipromoterfsen.pythonanywhere.com/server.
Collapse
Affiliation(s)
- Md Siddiqur Rahman
- Department of Computer Science and Engineering, United International University Madani Avenue, Satarkul, Badda, Dhaka 1212, Bangladesh
| | - Usma Aktar
- Department of Computer Science and Engineering, United International University Madani Avenue, Satarkul, Badda, Dhaka 1212, Bangladesh
| | - Md Rafsan Jani
- Department of Computer Science and Engineering, United International University Madani Avenue, Satarkul, Badda, Dhaka 1212, Bangladesh
| | - Swakkhar Shatabda
- Department of Computer Science and Engineering, United International University Madani Avenue, Satarkul, Badda, Dhaka 1212, Bangladesh.
| |
Collapse
|
15
|
Zhang Q, Wang S, Pan Y, Su D, Lu Q, Zuo Y, Yang L. Characterization of proteins in different subcellular localizations for Escherichia coli K12. Genomics 2018; 111:1134-1141. [PMID: 30026105 DOI: 10.1016/j.ygeno.2018.07.008] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2018] [Revised: 07/07/2018] [Accepted: 07/11/2018] [Indexed: 10/28/2022]
Abstract
Knowing the comprehensive knowledge about the protein subcellular localization is an important step to understand the function of the proteins. Recent advances in system biology have allowed us to develop more accurate methods for characterizing the proteins at subcellular localization level. In this study, the analysis method was developed to characterize the topological properties and biological properties of the cytoplasmic proteins, inner membrane proteins, outer membrane proteins and periplasmic proteins in Escherichia coli (E. coli). Statistical significant differences were found in all topological properties and biological properties among proteins in different subcellular localizations. In addition, investigation was carried out to analyze the differences in 20 amino acid compositions for four protein categories. We also found that there were significant differences in all of the 20 amino acid compositions. These findings may be helpful for understanding the comprehensive relationship between protein subcellular localization and biological function.
Collapse
Affiliation(s)
- Qi Zhang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, China
| | - Shiyuan Wang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, China
| | - Yi Pan
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, China
| | - Dongqing Su
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, China
| | - Qianzi Lu
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, China
| | - Yongchun Zuo
- The State key Laboratory of Reproductive Regulation, Breeding of Grassland Livestock, College of Life Sciences, Inner Mongolia University, Hohhot 010070, China.
| | - Lei Yang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, China.
| |
Collapse
|
16
|
Ning Q, Zhao X, Bao L, Ma Z, Zhao X. Detecting Succinylation sites from protein sequences using ensemble support vector machine. BMC Bioinformatics 2018; 19:237. [PMID: 29940836 PMCID: PMC6016146 DOI: 10.1186/s12859-018-2249-4] [Citation(s) in RCA: 26] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2017] [Accepted: 06/14/2018] [Indexed: 12/14/2022] Open
Abstract
Background Lysine succinylation is a new kind of post-translational modification which plays a key role in protein conformation regulation and cellular function control. To understand the mechanism of succinylation profoundly, it is necessary to identify succinylation sites in proteins accurately. However, traditional methods, experimental approaches, are labor-intensive and time-consuming. Computational prediction methods have been proposed recent years, and they are popular because of their convenience and high speed. In this study, we developed a new method to predict succinylation sites in protein combining multiple features, including amino acid composition, binary encoding, physicochemical property and grey pseudo amino acid composition, with a feature selection scheme (information gain). And then, it was trained using SVM (Support Vector Machine) and an ensemble learning algorithm. Results The performance of this method was measured with an accuracy of 89.14% and a MCC (Matthew Correlation Coefficient) of 0.79 using 10-fold cross validation on training dataset and an accuracy of 84.5% and a MCC of 0.2 on independent dataset. Conclusions The conclusions made from this study can help to understand more of the succinylation mechanism. These results suggest that our method was very promising for predicting succinylation sites. The source code and data of this paper are freely available athttps://github.com/ningq669/PSuccE. Electronic supplementary material The online version of this article (10.1186/s12859-018-2249-4) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Qiao Ning
- School of Information Science and Technology, Northeast Normal University, Changchun, 130117, China
| | - Xiaosa Zhao
- School of Information Science and Technology, Northeast Normal University, Changchun, 130117, China
| | - Lingling Bao
- School of Information Science and Technology, Northeast Normal University, Changchun, 130117, China
| | - Zhiqiang Ma
- School of Information Science and Technology, Northeast Normal University, Changchun, 130117, China.
| | - Xiaowei Zhao
- Key Laboratory of Intelligent Information Processing of Jilin Universities, Northeast Normal University, Changchun, 130117, China.
| |
Collapse
|
17
|
Khan YD, Rasool N, Hussain W, Khan SA, Chou KC. iPhosT-PseAAC: Identify phosphothreonine sites by incorporating sequence statistical moments into PseAAC. Anal Biochem 2018; 550:109-116. [DOI: 10.1016/j.ab.2018.04.021] [Citation(s) in RCA: 72] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2018] [Revised: 04/19/2018] [Accepted: 04/21/2018] [Indexed: 01/29/2023]
|
18
|
pLoc_bal-mGpos: Predict subcellular localization of Gram-positive bacterial proteins by quasi-balancing training dataset and PseAAC. Genomics 2018; 111:886-892. [PMID: 29842950 DOI: 10.1016/j.ygeno.2018.05.017] [Citation(s) in RCA: 79] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2018] [Revised: 05/14/2018] [Accepted: 05/18/2018] [Indexed: 12/12/2022]
Abstract
Knowledge of protein subcellular localization is vitally important for both basic research and drug development. With the avalanche of protein sequences emerging in the post-genomic age, it is highly desired to develop computational tools for timely and effectively identifying their subcellular localization purely based on the sequence information alone. Recently, a predictor called "pLoc-mGpos" was developed for identifying the subcellular localization of Gram-positive bacterial proteins. Its performance is overwhelmingly better than that of the other predictors for the same purpose, particularly in dealing with multi-label systems in which some proteins, called "multiplex proteins", may simultaneously occur in two or more subcellular locations. Although it is indeed a very powerful predictor, more efforts are definitely needed to further improve it. This is because pLoc-mGpos was trained by an extremely skewed dataset in which some subset (subcellular location) was over 11 times the size of the other subsets. Accordingly, it cannot avoid the bias consequence caused by such an uneven training dataset. To alleviate such bias consequence, we have developed a new and bias-reducing predictor called pLoc_bal-mGpos by quasi-balancing the training dataset. Rigorous target jackknife tests on exactly the same experiment-confirmed dataset have indicated that the proposed new predictor is remarkably superior to pLoc-mGpos, the existing state-of-the-art predictor in identifying the subcellular localization of Gram-positive bacterial proteins. To maximize the convenience for most experimental scientists, a user-friendly web-server for the new predictor has been established at http://www.jci-bioinfo.cn/pLoc_bal-mGpos/, by which users can easily get their desired results without the need to go through the detailed mathematics.
Collapse
|
19
|
Muthu Krishnan S. Using Chou's general PseAAC to analyze the evolutionary relationship of receptor associated proteins (RAP) with various folding patterns of protein domains. J Theor Biol 2018; 445:62-74. [DOI: 10.1016/j.jtbi.2018.02.008] [Citation(s) in RCA: 59] [Impact Index Per Article: 8.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2017] [Revised: 01/24/2018] [Accepted: 02/12/2018] [Indexed: 01/31/2023]
|
20
|
Sabooh MF, Iqbal N, Khan M, Khan M, Maqbool HF. Identifying 5-methylcytosine sites in RNA sequence using composite encoding feature into Chou's PseKNC. J Theor Biol 2018; 452:1-9. [PMID: 29727634 DOI: 10.1016/j.jtbi.2018.04.037] [Citation(s) in RCA: 78] [Impact Index Per Article: 11.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2018] [Revised: 04/24/2018] [Accepted: 04/27/2018] [Indexed: 02/02/2023]
Abstract
This study examines accurate and efficient computational method for identification of 5-methylcytosine sites in RNA modification. The occurrence of 5-methylcytosine (m5C) plays a vital role in a number of biological processes. For better comprehension of the biological functions and mechanism it is necessary to recognize m5C sites in RNA precisely. The laboratory techniques and procedures are available to identify m5C sites in RNA, but these procedures require a lot of time and resources. This study develops a new computational method for extracting the features of RNA sequence. In this method, first the RNA sequence is encoded via composite feature vector, then, for the selection of discriminate features, the minimum-redundancy-maximum-relevance algorithm was used. Secondly, the classification method used has been based on a support vector machine by using jackknife cross validation test. The suggested method efficiently identifies m5C sites from non- m5C sites and the outcome of the suggested algorithm is 93.33% with sensitivity of 90.0 and specificity of 96.66 on bench mark datasets. The result exhibits that proposed algorithm shown significant identification performance compared to the existing computational techniques. This study extends the knowledge about the occurrence sites of RNA modification which paves the way for better comprehension of the biological uses and mechanism.
Collapse
Affiliation(s)
- M Fazli Sabooh
- Department of Computer Science, Abdul Wali Khan University Mardan, Pakistan
| | - Nadeem Iqbal
- Department of Computer Science, Abdul Wali Khan University Mardan, Pakistan.
| | - Mukhtaj Khan
- Department of Computer Science, Abdul Wali Khan University Mardan, Pakistan
| | - Muslim Khan
- Department of Computer Science, Abdul Wali Khan University Mardan, Pakistan
| | - H F Maqbool
- University of Engineering & Technology Lahore, Pakistan
| |
Collapse
|
21
|
Khan A, Shah S, Wahid F, Khan FG, Jabeen S. Identification of microRNA precursors using reduced and hybrid features. MOLECULAR BIOSYSTEMS 2018; 13:1640-1645. [PMID: 28686281 DOI: 10.1039/c7mb00115k] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
MicroRNAs (also called miRNAs) are a group of short non-coding RNA molecules. They play a vital role in the gene expression of transcriptional and post-transcriptional processes. However, abnormality of their expression has been observed in cancer, heart diseases and nervous system disorders. Therefore for basic research and microRNA based therapy, it is imperative to separate real pre-miRNAs from false ones (hairpin sequences similar to pre-miRNA stem loops). Different conservation and machine learning methods have been applied for the identification of miRNAs. However, machine learning algorithms have gained more popularity than conservative based algorithms in terms of sensitivity and overall performance. Due to the avalanche of RNA sequences discovered in a post-genomic age, it is necessary to construct a predictor for the identification of pre-microRNAs in humans. We have developed a predictor called MicroR-Pred in which the RNA sequences are formulated by a hybrid feature vector. The novelty of the new predictor is in the use of the partial least squares technique followed by the Random Forest and SVM (Support Vector Machine) algorithms for dimension reduction and classification. The performance of the MicroR-Pred model is quite promising compared to other state-of-the-art miRNA predictors. It has achieved 88.40% and 93.90% accuracies for RF and SVM.
Collapse
Affiliation(s)
- Asad Khan
- Department of Computer Science COMSATS Institute of IT, Abbottabad 22060, Pakistan.
| | - Sajid Shah
- Department of Computer Science COMSATS Institute of IT, Abbottabad 22060, Pakistan.
| | - Fazli Wahid
- Department of Environmental Sciences COMSATS Institute of IT, Abbottabad 22060, Pakistan
| | - Fiaz Gul Khan
- Department of Computer Science COMSATS Institute of IT, Abbottabad 22060, Pakistan.
| | - Saima Jabeen
- Department of Computer Science COMSATS Institute of IT, Abbottabad 22060, Pakistan.
| |
Collapse
|
22
|
Abstract
Cancer is a serious health issue worldwide. Traditional treatment methods focus on killing cancer cells by using anticancer drugs or radiation therapy, but the cost of these methods is quite high, and in addition there are side effects. With the discovery of anticancer peptides, great progress has been made in cancer treatment. For the purpose of prompting the application of anticancer peptides in cancer treatment, it is necessary to use computational methods to identify anticancer peptides (ACPs). In this paper, we propose a sequence-based model for identifying ACPs (SAP). In our proposed SAP, the peptide is represented by 400D features or 400D features with g-gap dipeptide features, and then the unrelated features are pruned using the maximum relevance-maximum distance method. The experimental results demonstrate that our model performs better than some existing methods. Furthermore, our model has also been extended to other classifiers, and the performance is stable compared with some state-of-the-art works.
Collapse
Affiliation(s)
- Lei Xu
- School of Electronic and Communication Engineering, Shenzhen Polytechnic, Shenzhen 518060, China.
| | - Guangmin Liang
- School of Electronic and Communication Engineering, Shenzhen Polytechnic, Shenzhen 518060, China.
| | - Longjie Wang
- School of Electronic and Communication Engineering, Shenzhen Polytechnic, Shenzhen 518060, China.
| | - Changrui Liao
- Key Laboratory of Optoelectronic Devices and Systems of Ministry of Education and Guangdong Province, College of Optoelectronic Engineering, Shenzhen University, Shenzhen 518060, China.
| |
Collapse
|
23
|
Qiu WR, Sun BQ, Xiao X, Xu ZC, Chou KC. iHyd-PseCp: Identify hydroxyproline and hydroxylysine in proteins by incorporating sequence-coupled effects into general PseAAC. Oncotarget 2018; 7:44310-44321. [PMID: 27322424 PMCID: PMC5190098 DOI: 10.18632/oncotarget.10027] [Citation(s) in RCA: 141] [Impact Index Per Article: 20.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2016] [Accepted: 05/29/2016] [Indexed: 12/30/2022] Open
Abstract
Protein hydroxylation is a posttranslational modification (PTM), in which a CH group in Pro (P) or Lys (K) residue has been converted into a COH group, or a hydroxyl group (−OH) is converted into an organic compound. Closely associated with cellular signaling activities, this type of PTM is also involved in some major diseases, such as stomach cancer and lung cancer. Therefore, from the angles of both basic research and drug development, we are facing a challenging problem: for an uncharacterized protein sequence containing many residues of P or K, which ones can be hydroxylated, and which ones cannot? With the explosive growth of protein sequences in the post-genomic age, the problem has become even more urgent. To address such a problem, we have developed a predictor called iHyd-PseCp by incorporating the sequence-coupled information into the general pseudo amino acid composition (PseAAC) and introducing the “Random Forest” algorithm to operate the calculation. Rigorous jackknife tests indicated that the new predictor remarkably outperformed the existing state-of-the-art prediction method for the same purpose. For the convenience of most experimental scientists, a user-friendly web-server for iHyd-PseCp has been established at http://www.jci-bioinfo.cn/iHyd-PseCp, by which users can easily obtain their desired results without the need to go through the complicated mathematical equations involved.
Collapse
Affiliation(s)
- Wang-Ren Qiu
- Computer Department, Jingdezhen Ceramic Institute, Jingdezhen, China.,Department of Computer Science and Bond Life Science Center, University of Missouri, Columbia, MO, USA
| | - Bi-Qian Sun
- Computer Department, Jingdezhen Ceramic Institute, Jingdezhen, China
| | - Xuan Xiao
- Computer Department, Jingdezhen Ceramic Institute, Jingdezhen, China.,Gordon Life Science Institute, Boston, MA, USA
| | - Zhao-Chun Xu
- Computer Department, Jingdezhen Ceramic Institute, Jingdezhen, China
| | - Kuo-Chen Chou
- Gordon Life Science Institute, Boston, MA, USA.,Center of Excellence in Genomic Medicine Research (CEGMR), King Abdulaziz University, Jeddah, Saudi Arabia.,Center of Bioinformatics, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, Sichuan, China
| |
Collapse
|
24
|
iOri-Human: identify human origin of replication by incorporating dinucleotide physicochemical properties into pseudo nucleotide composition. Oncotarget 2018; 7:69783-69793. [PMID: 27626500 PMCID: PMC5342515 DOI: 10.18632/oncotarget.11975] [Citation(s) in RCA: 157] [Impact Index Per Article: 22.4] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2016] [Accepted: 09/06/2016] [Indexed: 02/07/2023] Open
Abstract
The initiation of replication is an extremely important process in DNA life cycle. Given an uncharacterized DNA sequence, can we identify where its origin of replication (ORI) is located? It is no doubt a fundamental problem in genome analysis. Particularly, with the rapid development of genome sequencing technology that results in a huge amount of sequence data, it is highly desired to develop computational methods for rapidly and effectively identifying the ORIs in these genomes. Unfortunately, by means of the existing computational methods, such as sequence alignment or kmer strategies, it could hardly achieve decent success rates. To address this problem, we developed a predictor called “iOri-Human”. Rigorous jackknife tests have shown that its overall accuracy and stability in identifying human ORIs are over 75% and 50%, respectively. In the predictor, it is through the pseudo nucleotide composition (an extension of pseudo amino acid composition) that 96 physicochemical properties for the 16 possible constituent dinucleotides have been incorporated to reflect the global sequence patterns in DNA as well as its local sequence patterns. Moreover, a user-friendly web-server for iOri-Human has been established at http://lin.uestc.edu.cn/server/iOri-Human.html, by which users can easily get their desired results without the need to through the complicated mathematics involved.
Collapse
|
25
|
Qiu WR, Xiao X, Xu ZC, Chou KC. iPhos-PseEn: identifying phosphorylation sites in proteins by fusing different pseudo components into an ensemble classifier. Oncotarget 2018; 7:51270-51283. [PMID: 27323404 PMCID: PMC5239474 DOI: 10.18632/oncotarget.9987] [Citation(s) in RCA: 132] [Impact Index Per Article: 18.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2016] [Accepted: 05/23/2016] [Indexed: 11/26/2022] Open
Abstract
Protein phosphorylation is a posttranslational modification (PTM or PTLM), where a phosphoryl group is added to the residue(s) of a protein molecule. The most commonly phosphorylated amino acids occur at serine (S), threonine (T), and tyrosine (Y). Protein phosphorylation plays a significant role in a wide range of cellular processes; meanwhile its dysregulation is also involved with many diseases. Therefore, from the angles of both basic research and drug development, we are facing a challenging problem: for an uncharacterized protein sequence containing many residues of S, T, or Y, which ones can be phosphorylated, and which ones cannot? To address this problem, we have developed a predictor called iPhos-PseEn by fusing four different pseudo component approaches (amino acids’ disorder scores, nearest neighbor scores, occurrence frequencies, and position weights) into an ensemble classifier via a voting system. Rigorous cross-validations indicated that the proposed predictor remarkably outperformed its existing counterparts. For the convenience of most experimental scientists, a user-friendly web-server for iPhos-PseEn has been established at http://www.jci-bioinfo.cn/iPhos-PseEn, by which users can easily obtain their desired results without the need to go through the complicated mathematical equations involved.
Collapse
Affiliation(s)
- Wang-Ren Qiu
- Computer Department, Jingdezhen Ceramic Institute, Jingdezhen, China.,Department of Computer Science and Bond Life Science Center, University of Missouri, Columbia, MO, USA
| | - Xuan Xiao
- Computer Department, Jingdezhen Ceramic Institute, Jingdezhen, China.,Gordon Life Science Institute, Boston, MA, USA
| | - Zhao-Chun Xu
- Computer Department, Jingdezhen Ceramic Institute, Jingdezhen, China
| | - Kuo-Chen Chou
- Gordon Life Science Institute, Boston, MA, USA.,Center of Excellence in Genomic Medicine Research (CEGMR), King Abdulaziz University, Jeddah, Saudi Arabia.,Center of Bioinformatics, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, Sichuan, China
| |
Collapse
|
26
|
Feng P, Yang H, Ding H, Lin H, Chen W, Chou KC. iDNA6mA-PseKNC: Identifying DNA N 6-methyladenosine sites by incorporating nucleotide physicochemical properties into PseKNC. Genomics 2018; 111:96-102. [PMID: 29360500 DOI: 10.1016/j.ygeno.2018.01.005] [Citation(s) in RCA: 188] [Impact Index Per Article: 26.9] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2017] [Revised: 12/24/2017] [Accepted: 01/07/2018] [Indexed: 11/29/2022]
Abstract
N6-methyladenine (6mA) is one kind of post-replication modification (PTM or PTRM) occurring in a wide range of DNA sequences. Accurate identification of its sites will be very helpful for revealing the biological functions of 6mA, but it is time-consuming and expensive to determine them by experiments alone. Unfortunately, so far, no bioinformatics tool is available to do so. To fill in such an empty area, we have proposed a novel predictor called iDNA6mA-PseKNC that is established by incorporating nucleotide physicochemical properties into Pseudo K-tuple Nucleotide Composition (PseKNC). It has been observed via rigorous cross-validations that the predictor's sensitivity (Sn), specificity (Sp), accuracy (Acc), and stability (MCC) are 93%, 100%, 96%, and 0.93, respectively. For the convenience of most experimental scientists, a user-friendly web server for iDNA6mA-PseKNC has been established at http://lin-group.cn/server/iDNA6mA-PseKNC, by which users can easily obtain their desired results without the need to go through the complicated mathematical equations involved.
Collapse
Affiliation(s)
- Pengmian Feng
- Hebei Province Key Laboratory of Occupational Health and Safety for Coal Industry, School of Public Health, North China University of Science and Technology, Tangshan 063000, China
| | - Hui Yang
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Hui Ding
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Hao Lin
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China; Gordon Life Science Institute, Boston, MA 02478, USA.
| | - Wei Chen
- Department of Physics, School of Sciences, and Center for Genomics and Computational Biology, North China University of Science and Technology, Tangshan, Tangshan 063000, China; Gordon Life Science Institute, Boston, MA 02478, USA.
| | - Kuo-Chen Chou
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China; Gordon Life Science Institute, Boston, MA 02478, USA.
| |
Collapse
|
27
|
Jia J, Liu Z, Xiao X, Liu B, Chou KC. iCar-PseCp: identify carbonylation sites in proteins by Monte Carlo sampling and incorporating sequence coupled effects into general PseAAC. Oncotarget 2018; 7:34558-70. [PMID: 27153555 PMCID: PMC5085176 DOI: 10.18632/oncotarget.9148] [Citation(s) in RCA: 149] [Impact Index Per Article: 21.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2016] [Accepted: 04/09/2016] [Indexed: 01/22/2023] Open
Abstract
Carbonylation is a posttranslational modification (PTM or PTLM), where a carbonyl group is added to lysine (K), proline (P), arginine (R), and threonine (T) residue of a protein molecule. Carbonylation plays an important role in orchestrating various biological processes but it is also associated with many diseases such as diabetes, chronic lung disease, Parkinson's disease, Alzheimer's disease, chronic renal failure, and sepsis. Therefore, from the angles of both basic research and drug development, we are facing a challenging problem: for an uncharacterized protein sequence containing many residues of K, P, R, or T, which ones can be carbonylated, and which ones cannot? To address this problem, we have developed a predictor called iCar-PseCp by incorporating the sequence-coupled information into the general pseudo amino acid composition, and balancing out skewed training dataset by Monte Carlo sampling to expand positive subset. Rigorous target cross-validations on a same set of carbonylation-known proteins indicated that the new predictor remarkably outperformed its existing counterparts. For the convenience of most experimental scientists, a user-friendly web-server for iCar-PseCp has been established at http://www.jci-bioinfo.cn/iCar-PseCp, by which users can easily obtain their desired results without the need to go through the complicated mathematical equations involved. It has not escaped our notice that the formulation and approach presented here can also be used to analyze many other problems in computational proteomics.
Collapse
Affiliation(s)
- Jianhua Jia
- Computer Department, Jing-De-Zhen Ceramic Institute, Jing-De-Zhen 333403 China.,Gordon Life Science Institute, Boston, MA 02478, USA
| | - Zi Liu
- School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing, 210094, China
| | - Xuan Xiao
- Computer Department, Jing-De-Zhen Ceramic Institute, Jing-De-Zhen 333403 China.,Gordon Life Science Institute, Boston, MA 02478, USA
| | - Bingxiang Liu
- Computer Department, Jing-De-Zhen Ceramic Institute, Jing-De-Zhen 333403 China
| | - Kuo-Chen Chou
- Gordon Life Science Institute, Boston, MA 02478, USA.,Center of Excellence in Genomic Medicine Research (CEGMR), King Abdulaziz University, Jeddah 21589, Saudi Arabia
| |
Collapse
|
28
|
Inchauspe AÁ. Solitons: A Cutting-Edge Scientific Proposal Explaining the Mechanisms of Acupuntural Action. Chin Med 2018. [DOI: 10.4236/cm.2018.91002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
|
29
|
iACP: a sequence-based tool for identifying anticancer peptides. Oncotarget 2017; 7:16895-909. [PMID: 26942877 PMCID: PMC4941358 DOI: 10.18632/oncotarget.7815] [Citation(s) in RCA: 313] [Impact Index Per Article: 39.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2016] [Accepted: 02/11/2016] [Indexed: 02/07/2023] Open
Abstract
Cancer remains a major killer worldwide. Traditional methods of cancer treatment are expensive and have some deleterious side effects on normal cells. Fortunately, the discovery of anticancer peptides (ACPs) has paved a new way for cancer treatment. With the explosive growth of peptide sequences generated in the post genomic age, it is highly desired to develop computational methods for rapidly and effectively identifying ACPs, so as to speed up their application in treating cancer. Here we report a sequence-based predictor called iACP developed by the approach of optimizing the g-gap dipeptide components. It was demonstrated by rigorous cross-validations that the new predictor remarkably outperformed the existing predictors for the same purpose in both overall accuracy and stability. For the convenience of most experimental scientists, a publicly accessible web-server for iACP has been established at http://lin.uestc.edu.cn/server/iACP, by which users can easily obtain their desired results.
Collapse
|
30
|
Bi-PSSM: Position specific scoring matrix based intelligent computational model for identification of mycobacterial membrane proteins. J Theor Biol 2017; 435:116-124. [DOI: 10.1016/j.jtbi.2017.09.013] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2017] [Revised: 09/12/2017] [Accepted: 09/15/2017] [Indexed: 02/08/2023]
|
31
|
Cheng X, Xiao X, Chou KC. pLoc-mHum: predict subcellular localization of multi-location human proteins via general PseAAC to winnow out the crucial GO information. Bioinformatics 2017; 34:1448-1456. [DOI: 10.1093/bioinformatics/btx711] [Citation(s) in RCA: 127] [Impact Index Per Article: 15.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2017] [Accepted: 10/31/2017] [Indexed: 01/19/2023] Open
Affiliation(s)
- Xiang Cheng
- Computer Science, Jingdezhen Ceramic Institute, Jingdezhen, China
- Computational Biology, Gordon Life Science Institute, Boston, MA, USA
| | - Xuan Xiao
- Computer Science, Jingdezhen Ceramic Institute, Jingdezhen, China
- Computational Biology, Gordon Life Science Institute, Boston, MA, USA
| | - Kuo-Chen Chou
- Computer Science, Jingdezhen Ceramic Institute, Jingdezhen, China
- Computational Biology, Gordon Life Science Institute, Boston, MA, USA
- Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| |
Collapse
|
32
|
A Two-Layer Computational Model for Discrimination of Enhancer and Their Types Using Hybrid Features Pace of Pseudo K-Tuple Nucleotide Composition. ARABIAN JOURNAL FOR SCIENCE AND ENGINEERING 2017. [DOI: 10.1007/s13369-017-2818-2] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
|
33
|
He L, Li Y, He RL, Yau SST. A novel alignment-free vector method to cluster protein sequences. J Theor Biol 2017; 427:41-52. [DOI: 10.1016/j.jtbi.2017.06.002] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2017] [Revised: 05/04/2017] [Accepted: 06/02/2017] [Indexed: 11/29/2022]
|
34
|
Tahir M, Hayat M. iNuc-STNC: a sequence-based predictor for identification of nucleosome positioning in genomes by extending the concept of SAAC and Chou's PseAAC. MOLECULAR BIOSYSTEMS 2017; 12:2587-93. [PMID: 27271822 DOI: 10.1039/c6mb00221h] [Citation(s) in RCA: 89] [Impact Index Per Article: 11.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/14/2023]
Abstract
The nucleosome is the fundamental unit of eukaryotic chromatin, which participates in regulating different cellular processes. Owing to the huge exploration of new DNA primary sequences, it is indispensable to develop an automated model. However, identification of novel protein sequences using conventional methods is difficult or sometimes impossible because of vague motifs and the intricate structure of DNA. In this regard, an effective and high throughput automated model "iNuc-STNC" has been proposed in order to identify accurately and reliably nucleosome positioning in genomes. In this proposed model, DNA sequences are expressed into three distinct feature extraction strategies containing dinucleotide composition, trinucleotide composition and split trinucleotide composition (STNC). Various statistical models were utilized as learner hypotheses. Jackknife test was employed to evaluate the success rates of the proposed model. The experiential results expressed that SVM, in combination with STNC, has obtained an outstanding performance on all benchmark datasets. The predicted outcomes of the proposed model "iNuc-STNC" is higher than current state of the art methods in the literature so far. It is ascertained that the "iNuc-STNC" model will provide a rudimentary framework for the pharmaceutical industry in the development of drug design.
Collapse
Affiliation(s)
- Muhammad Tahir
- Department of Computer Science, Abdul Wali Khan University, Mardan, Pakistan.
| | - Maqsood Hayat
- Department of Computer Science, Abdul Wali Khan University, Mardan, Pakistan.
| |
Collapse
|
35
|
Liu B, Yang F, Chou KC. 2L-piRNA: A Two-Layer Ensemble Classifier for Identifying Piwi-Interacting RNAs and Their Function. MOLECULAR THERAPY-NUCLEIC ACIDS 2017. [PMID: 28624202 PMCID: PMC5415553 DOI: 10.1016/j.omtn.2017.04.008] [Citation(s) in RCA: 195] [Impact Index Per Article: 24.4] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
Involved with important cellular or gene functions and implicated with many kinds of cancers, piRNAs, or piwi-interacting RNAs, are of small non-coding RNA with around 19–33 nt in length. Given a small non-coding RNA molecule, can we predict whether it is of piRNA according to its sequence information alone? Furthermore, there are two types of piRNA: one has the function of instructing target mRNA deadenylation, and the other does not. Can we discriminate one from the other? With the avalanche of RNA sequences emerging in the postgenomic age, it is urgent to address the two problems for both basic research and drug development. Unfortunately, to the best of our knowledge, so far no computational methods whatsoever could be used to deal with the second problem, let alone deal with the two problems together. Here, by incorporating the physicochemical properties of nucleotides into the pseudo K-tuple nucleotide composition (PseKNC), we proposed a powerful predictor called 2L-piRNA. It is a two-layer ensemble classifier, in which the first layer is for identifying whether a query RNA molecule is piRNA or non-piRNA, and the second layer for identifying whether a piRNA is with or without the function of instructing target mRNA deadenylation. Rigorous cross-validations have indicated that the success rates achieved by the proposed predictor are quite high. For the convenience of most biologists and drug development scientists, the web server for 2L-piRNA has been established at http://bioinformatics.hitsz.edu.cn/2L-piRNA/, by which users can easily get their desired results without the need to go through the mathematical details.
Collapse
Affiliation(s)
- Bin Liu
- School of Computer Science and Technology, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, Guangdong 518055, China; Key Laboratory of Network Oriented Intelligent Computation, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, Guangdong 518055, China; Gordon Life Science Institute, Belmont, MA 02478, USA.
| | - Fan Yang
- School of Computer Science and Technology, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, Guangdong 518055, China
| | - Kuo-Chen Chou
- Gordon Life Science Institute, Belmont, MA 02478, USA; Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China; Center of Excellence in Genomic Medicine Research, King Abdulaziz University, Jeddah 21589, Saudi Arabia.
| |
Collapse
|
36
|
Sequence-based discrimination of protein-RNA interacting residues using a probabilistic approach. J Theor Biol 2017; 418:77-83. [DOI: 10.1016/j.jtbi.2017.01.040] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2016] [Revised: 01/06/2017] [Accepted: 01/27/2017] [Indexed: 11/22/2022]
|
37
|
OOgenesis_Pred: A sequence-based method for predicting oogenesis proteins by six different modes of Chou's pseudo amino acid composition. J Theor Biol 2017; 414:128-136. [DOI: 10.1016/j.jtbi.2016.11.028] [Citation(s) in RCA: 68] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2016] [Revised: 11/25/2016] [Accepted: 11/29/2016] [Indexed: 12/22/2022]
|
38
|
Xiao X, Cheng X, Su S, Mao Q, Chou KC. pLoc-mGpos: Incorporate Key Gene Ontology Information into General PseAAC for Predicting Subcellular Localization of Gram-Positive Bacterial Proteins. ACTA ACUST UNITED AC 2017. [DOI: 10.4236/ns.2017.99032] [Citation(s) in RCA: 46] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
|
39
|
Liu B, Wu H, Chou KC. Pse-in-One 2.0: An Improved Package of Web Servers for Generating Various Modes of Pseudo Components of DNA, RNA, and Protein Sequences. ACTA ACUST UNITED AC 2017. [DOI: 10.4236/ns.2017.94007] [Citation(s) in RCA: 91] [Impact Index Per Article: 11.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
|
40
|
Behbahani M, Mohabatkar H, Nosrati M. Analysis and comparison of lignin peroxidases between fungi and bacteria using three different modes of Chou’s general pseudo amino acid composition. J Theor Biol 2016; 411:1-5. [DOI: 10.1016/j.jtbi.2016.09.001] [Citation(s) in RCA: 77] [Impact Index Per Article: 8.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2016] [Revised: 07/27/2016] [Accepted: 09/01/2016] [Indexed: 02/02/2023]
|
41
|
ProFold: Protein Fold Classification with Additional Structural Features and a Novel Ensemble Classifier. BIOMED RESEARCH INTERNATIONAL 2016; 2016:6802832. [PMID: 27660761 PMCID: PMC5021882 DOI: 10.1155/2016/6802832] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/01/2016] [Revised: 07/15/2016] [Accepted: 08/07/2016] [Indexed: 11/17/2022]
Abstract
Protein fold classification plays an important role in both protein functional analysis and drug design. The number of proteins in PDB is very large, but only a very small part is categorized and stored in the SCOPe database. Therefore, it is necessary to develop an efficient method for protein fold classification. In recent years, a variety of classification methods have been used in many protein fold classification studies. In this study, we propose a novel classification method called proFold. We import protein tertiary structure in the period of feature extraction and employ a novel ensemble strategy in the period of classifier training. Compared with existing similar ensemble classifiers using the same widely used dataset (DD-dataset), proFold achieves 76.2% overall accuracy. Another two commonly used datasets, EDD-dataset and TG-dataset, are also tested, of which the accuracies are 93.2% and 94.3%, higher than the existing methods. ProFold is available to the public as a web-server.
Collapse
|
42
|
Muthu Krishnan S. Classify vertebrate hemoglobin proteins by incorporating the evolutionary information into the general PseAAC with the hybrid approach. J Theor Biol 2016; 409:27-37. [PMID: 27575465 DOI: 10.1016/j.jtbi.2016.08.027] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2016] [Revised: 08/11/2016] [Accepted: 08/16/2016] [Indexed: 01/26/2023]
Abstract
Hemoglobin is an oxygen-binding protein widely present in all kingdoms of life from prokaryotic to eukaryotic, but well established in the vertebrate system. An attempt was made to determine the Vertebrate hemoglobin (VerHb) protein on their animal classifications, based on general pseudo amino acid composition (PseAAC)'s evolutionary profiles and hybrid approach. The support vector machine (SVM) has been applied to develop all models, the prediction results further compared according to their animal classification. The performance of the approaches estimated using five-fold cross-validation techniques. The prediction performance was further investigated by receiver operating characteristic (ROC) and prediction score graphs. The prediction accuracy (ACC), sensitivity (SN) and specificity (SP) were examined to find the accurate predictions on the threshold level. Based on the approach, a web-tool has been developed for identifying the VerHb proteins.
Collapse
Affiliation(s)
- S Muthu Krishnan
- CSIR - Institute of Microbial Technology (IMTECH), Sector-39A, Chandigarh, India.
| |
Collapse
|
43
|
Identification of DNA binding proteins using evolutionary profiles position specific scoring matrix. Neurocomputing 2016. [DOI: 10.1016/j.neucom.2016.03.025] [Citation(s) in RCA: 57] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
|
44
|
Jia J, Zhang L, Liu Z, Xiao X, Chou KC. pSumo-CD: predicting sumoylation sites in proteins with covariance discriminant algorithm by incorporating sequence-coupled effects into general PseAAC. Bioinformatics 2016; 32:3133-3141. [DOI: 10.1093/bioinformatics/btw387] [Citation(s) in RCA: 160] [Impact Index Per Article: 17.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2016] [Accepted: 06/15/2016] [Indexed: 11/13/2022] Open
|
45
|
Qiu WR, Sun BQ, Xiao X, Xu ZC, Chou KC. iPTM-mLys: identifying multiple lysine PTM sites and their different types. Bioinformatics 2016; 32:3116-3123. [DOI: 10.1093/bioinformatics/btw380] [Citation(s) in RCA: 216] [Impact Index Per Article: 24.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2016] [Accepted: 06/13/2016] [Indexed: 11/13/2022] Open
|
46
|
Qiu WR, Sun BQ, Xiao X, Xu D, Chou KC. iPhos-PseEvo: Identifying Human Phosphorylated Proteins by Incorporating Evolutionary Information into General PseAAC via Grey System Theory. Mol Inform 2016; 36. [DOI: 10.1002/minf.201600010] [Citation(s) in RCA: 83] [Impact Index Per Article: 9.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2016] [Accepted: 04/05/2016] [Indexed: 01/04/2023]
Affiliation(s)
- Wang-Ren Qiu
- Computer Department; Jingdezhen Ceramic Institute; Jingdezhen 333403 China
- Department of Computer Science and Bond Life Science Center; University of Missouri; Columbia, MO USA
| | - Bi-Qian Sun
- Computer Department; Jingdezhen Ceramic Institute; Jingdezhen 333403 China
| | - Xuan Xiao
- Computer Department; Jingdezhen Ceramic Institute; Jingdezhen 333403 China
- Gordon Life Science Institute, Boston; Massachusetts 02478 USA
| | - Dong Xu
- Department of Computer Science and Bond Life Science Center; University of Missouri; Columbia, MO USA
| | - Kuo-Chen Chou
- Gordon Life Science Institute, Boston; Massachusetts 02478 USA
- Center of Excellence in Genomic Medicine Research (CEGMR); King Abdulaziz University; Jeddah 21589 Saudi Arabia
| |
Collapse
|
47
|
Ju Z, Cao JZ, Gu H. Predicting lysine phosphoglycerylation with fuzzy SVM by incorporating k-spaced amino acid pairs into Chou׳s general PseAAC. J Theor Biol 2016; 397:145-50. [DOI: 10.1016/j.jtbi.2016.02.020] [Citation(s) in RCA: 86] [Impact Index Per Article: 9.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2015] [Revised: 02/14/2016] [Accepted: 02/15/2016] [Indexed: 11/30/2022]
|
48
|
Iqbal M, Hayat M. "iSS-Hyb-mRMR": Identification of splicing sites using hybrid space of pseudo trinucleotide and pseudo tetranucleotide composition. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2016; 128:1-11. [PMID: 27040827 DOI: 10.1016/j.cmpb.2016.02.006] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/24/2015] [Accepted: 02/16/2016] [Indexed: 06/05/2023]
Abstract
BACKGROUND AND OBJECTIVES Gene splicing is a vital source of protein diversity. Perfectly eradication of introns and joining exons is the prominent task in eukaryotic gene expression, as exons are usually interrupted by introns. Identification of splicing sites through experimental techniques is complicated and time-consuming task. With the avalanche of genome sequences generated in the post genomic age, it remains a complicated and challenging task to develop an automatic, robust and reliable computational method for fast and effective identification of splicing sites. METHODS In this study, a hybrid model "iSS-Hyb-mRMR" is proposed for quickly and accurately identification of splicing sites. Two sample representation methods namely; pseudo trinucleotide composition (PseTNC) and pseudo tetranucleotide composition (PseTetraNC) were used to extract numerical descriptors from DNA sequences. Hybrid model was developed by concatenating PseTNC and PseTetraNC. In order to select high discriminative features, minimum redundancy maximum relevance algorithm was applied on the hybrid feature space. The performance of these feature representation methods was tested using various classification algorithms including K-nearest neighbor, probabilistic neural network, general regression neural network, and fitting network. Jackknife test was used for evaluation of its performance on two benchmark datasets S1 and S2, respectively. RESULTS The predictor, proposed in the current study achieved an accuracy of 93.26%, sensitivity of 88.77%, and specificity of 97.78% for S1, and the accuracy of 94.12%, sensitivity of 87.14%, and specificity of 98.64% for S2, respectively. CONCLUSION It is observed, that the performance of proposed model is higher than the existing methods in the literature so for; and will be fruitful in the mechanism of RNA splicing, and other research academia.
Collapse
Affiliation(s)
- Muhammad Iqbal
- Department of Computer Science, Abdul Wali Khan University, Mardan, Pakistan
| | - Maqsood Hayat
- Department of Computer Science, Abdul Wali Khan University, Mardan, Pakistan.
| |
Collapse
|
49
|
An unusual chimeric amylosucrase generated by domain-swapping mutagenesis. Enzyme Microb Technol 2016; 86:7-16. [DOI: 10.1016/j.enzmictec.2016.01.004] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2015] [Revised: 01/05/2016] [Accepted: 01/13/2016] [Indexed: 11/19/2022]
|
50
|
Jia J, Liu Z, Xiao X, Liu B, Chou KC. pSuc-Lys: Predict lysine succinylation sites in proteins with PseAAC and ensemble random forest approach. J Theor Biol 2016; 394:223-230. [DOI: 10.1016/j.jtbi.2016.01.020] [Citation(s) in RCA: 231] [Impact Index Per Article: 25.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2015] [Revised: 01/06/2016] [Accepted: 01/07/2016] [Indexed: 10/22/2022]
|