1
|
Nguyen TTD, Ho QT, Le NQK, Phan VD, Ou YY. Use Chou's 5-Steps Rule With Different Word Embedding Types to Boost Performance of Electron Transport Protein Prediction Model. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:1235-1244. [PMID: 32750894 DOI: 10.1109/tcbb.2020.3010975] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Living organisms receive necessary energy substances directly from cellular respiration. The completion of electron storage and transportation requires the process of cellular respiration with the aid of electron transport chains. Therefore, the work of deciphering electron transport proteins is inevitably needed. The identification of these proteins with high performance has a prompt dependence on the choice of methods for feature extraction and machine learning algorithm. In this study, protein sequences served as natural language sentences comprising words. The nominated word embedding-based feature sets, hinged on the word embedding modulation and protein motif frequencies, were useful for feature choosing. Five word embedding types and a variety of conjoint features were examined for such feature selection. The support vector machine algorithm consequentially was employed to perform classification. The performance statistics within the 5-fold cross-validation including average accuracy, specificity, sensitivity, as well as MCC rates surpass 0.95. Such metrics in the independent test are 96.82, 97.16, 95.76 percent, and 0.9, respectively. Compared to state-of-the-art predictors, the proposed method can generate more preferable performance above all metrics indicating the effectiveness of the proposed method in determining electron transport proteins. Furthermore, this study reveals insights about the applicability of various word embeddings for understanding surveyed sequences.
Collapse
|
2
|
Godin R, Durrant JR. Dynamics of photoconversion processes: the energetic cost of lifetime gain in photosynthetic and photovoltaic systems. Chem Soc Rev 2021; 50:13372-13409. [PMID: 34786578 DOI: 10.1039/d1cs00577d] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
The continued development of solar energy conversion technologies relies on an improved understanding of their limitations. In this review, we focus on a comparison of the charge carrier dynamics underlying the function of photovoltaic devices with those of both natural and artificial photosynthetic systems. The solar energy conversion efficiency is determined by the product of the rate of generation of high energy species (charges for solar cells, chemical fuels for photosynthesis) and the energy contained in these species. It is known that the underlying kinetics of the photophysical and charge transfer processes affect the production yield of high energy species. Comparatively little attention has been paid to how these kinetics are linked to the energy contained in the high energy species or the energy lost in driving the forward reactions. Here we review the operational parameters of both photovoltaic and photosynthetic systems to highlight the energy cost of extending the lifetime of charge carriers to levels that enable function. We show a strong correlation between the energy lost within the device and the necessary lifetime gain, even when considering natural photosynthesis alongside artificial systems. From consideration of experimental data across all these systems, the emprical energetic cost of each 10-fold increase in lifetime is 87 meV. This energetic cost of lifetime gain is approx. 50% greater than the 59 meV predicted from a simple kinetic model. Broadly speaking, photovoltaic devices show smaller energy losses compared to photosynthetic devices due to the smaller lifetime gains needed. This is because of faster charge extraction processes in photovoltaic devices compared to the complex multi-electron, multi-proton redox reactions that produce fuels in photosynthetic devices. The result is that in photosynthetic systems, larger energetic costs are paid to overcome unfavorable kinetic competition between the excited state lifetime and the rate of interfacial reactions. We apply this framework to leading examples of photovoltaic and photosynthetic devices to identify kinetic sources of energy loss and identify possible strategies to reduce this energy loss. The kinetic and energetic analyses undertaken are applicable to both photovoltaic and photosynthetic systems allowing for a holistic comparison of both types of solar energy conversion approaches.
Collapse
Affiliation(s)
- Robert Godin
- Department of Chemistry, The University of British Columbia, 3247 University Way, Kelowna, British Columbia, V1V 1V7, Canada. .,Clean Energy Research Center, University of British Columbia, 2360 East Mall, Vancouver, British Columbia, V6T 1Z3, Canada.,Okanagan Institute for Biodiversity, Resilience, and Ecosystem Services, University of British Columbia, Kelowna, British Columbia, Canada
| | - James R Durrant
- Department of Chemistry and Centre for Processable Electronics, Imperial College London, Exhibition Road, London SW7 2AZ, UK
| |
Collapse
|
3
|
Chou KC. An Insightful 10-year Recollection Since the Emergence of the 5-steps Rule. Curr Pharm Des 2020; 25:4223-4234. [PMID: 31782354 DOI: 10.2174/1381612825666191129164042] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2019] [Accepted: 11/25/2019] [Indexed: 11/22/2022]
Abstract
OBJECTIVE One of the most challenging and also the most difficult problems is how to formulate a biological sequence with a vector but considerably keep its sequence order information. METHODS To address such a problem, the approach of Pseudo Amino Acid Components or PseAAC has been developed. RESULTS AND CONCLUSION It has become increasingly clear via the 10-year recollection that the aforementioned proposal has been indeed very powerful.
Collapse
Affiliation(s)
- Kuo-Chen Chou
- Gordon Life Science Institute, Boston, Massachusetts 02478, United States.,Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| |
Collapse
|
4
|
|
5
|
|
6
|
Identifying FL11 subtype by characterizing tumor immune microenvironment in prostate adenocarcinoma via Chou's 5-steps rule. Genomics 2020; 112:1500-1515. [DOI: 10.1016/j.ygeno.2019.08.021] [Citation(s) in RCA: 28] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2019] [Revised: 08/03/2019] [Accepted: 08/26/2019] [Indexed: 12/14/2022]
|
7
|
Chou KC. Impacts of Pseudo Amino Acid Components and 5-steps Rule to Proteomics and Proteome Analysis. Curr Top Med Chem 2019; 19:2283-2300. [DOI: 10.2174/1568026619666191018100141] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2019] [Revised: 08/18/2019] [Accepted: 08/26/2019] [Indexed: 01/27/2023]
Abstract
Stimulated by the 5-steps rule during the last decade or so, computational proteomics has achieved remarkable progresses in the following three areas: (1) protein structural class prediction; (2) protein subcellular location prediction; (3) post-translational modification (PTM) site prediction. The results obtained by these predictions are very useful not only for an in-depth study of the functions of proteins and their biological processes in a cell, but also for developing novel drugs against major diseases such as cancers, Alzheimer’s, and Parkinson’s. Moreover, since the targets to be predicted may have the multi-label feature, two sets of metrics are introduced: one is for inspecting the global prediction quality, while the other for the local prediction quality. All the predictors covered in this review have a userfriendly web-server, through which the majority of experimental scientists can easily obtain their desired data without the need to go through the complicated mathematics.
Collapse
Affiliation(s)
- Kuo-Chen Chou
- Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, 610054, China
| |
Collapse
|
8
|
Malebary SJ, Rehman MSU, Khan YD. iCrotoK-PseAAC: Identify lysine crotonylation sites by blending position relative statistical features according to the Chou's 5-step rule. PLoS One 2019; 14:e0223993. [PMID: 31751380 PMCID: PMC6874067 DOI: 10.1371/journal.pone.0223993] [Citation(s) in RCA: 25] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2019] [Accepted: 10/02/2019] [Indexed: 01/22/2023] Open
Abstract
Among different post-translational modifications (PTMs), one of the most important one is the lysine crotonylation in proteins. Its importance cannot be undermined related to different diseases and essential biological practice. The key step for finding the hidden mechanisms of crotonylation along with their occurrence sites is to completely apprehend the mechanism behind this biological process. In previously reported studies, researchers have used different techniques, like position weighted matrix (PWM), support vector machine (SVM), k nearest neighbors (KNN), and many others. However, the maximum prediction accuracy achieved was not such high. To address this, herein, we propose an improved predictor for lysine crotonylation sites named iCrotoK-PseAAC, in which we have incorporated various position and composition relative features along with statistical moments into PseAAC. The results of self-consistency testing were 100% accurate, while the 10-fold cross validation gave 99.0% accuracy. Based on the validation and comparison of model, it is concluded that the iCrotoK-PseAAC is more accurate than the previously proposed models.
Collapse
Affiliation(s)
- Sharaf Jameel Malebary
- Department of Information Technology, King Abdul Aziz University, Rabigh, Kingdom of Saudi Arabia
| | - Muhammad Safi ur Rehman
- Department of Computer Science, School of Systems and Technology, University of Management and Technology, Lahore, Pakistan
| | - Yaser Daanial Khan
- Department of Computer Science, School of Systems and Technology, University of Management and Technology, Lahore, Pakistan
| |
Collapse
|
9
|
Behbahani M, Nosrati M, Moradi M, Mohabatkar H. Using Chou's General Pseudo Amino Acid Composition to Classify Laccases from Bacterial and Fungal Sources via Chou's Five-Step Rule. Appl Biochem Biotechnol 2019; 190:1035-1048. [PMID: 31659712 DOI: 10.1007/s12010-019-03141-8] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2019] [Accepted: 09/12/2019] [Indexed: 01/28/2023]
Abstract
Laccases are a group of enzymes with a critical activity in the degradation process of both phenolic and non-phenolic compounds. These enzymes present in a diverse array of species, including fungi and bacteria. Since this enzyme is in the market for different usages from industry to medicine, having a better knowledge of its structures and properties from diverse sources will be useful to select the most appropriate candidate for different purposes. In the current study, sequence- and structure-based characteristics of these enzymes from fungi and bacteria, including pseudo amino acid composition (PseAAC), physicochemical characteristics, and their secondary structures, are being compared and classified. Autodock 4 software was used for docking analysis between these laccases and some phenolic and non-phenolic compounds. The results indicated that features including molecular weight, aliphatic, extinction coefficient, and random coil percentage of these protein groups present high degrees of diversity in most cases. Categorization of these enzymes by the notion of PseAAC, showed over 96% accuracy. The binding free energy between fungal laccases and their substrates showed to be considerably higher than those of bacterial ones. According to the outcomes of the current study, data mining methods by using different machine learning algorithms, especially neural networks, could provide valuable information for a fair comparison between fungal and bacterial laccases. These results also suggested an association between efficacy and physicochemical features of laccase enzymes from different sources.
Collapse
Affiliation(s)
- Mandana Behbahani
- Department of Biotechnology, Faculty of Biological Science and Technology, University of Isfahan, Isfahan, Iran
| | - Mokhtar Nosrati
- Department of Biotechnology, Faculty of Biological Science and Technology, University of Isfahan, Isfahan, Iran
| | - Mohammad Moradi
- Department of Biotechnology, Faculty of Biological Science and Technology, University of Isfahan, Isfahan, Iran
| | - Hassan Mohabatkar
- Department of Biotechnology, Faculty of Biological Science and Technology, University of Isfahan, Isfahan, Iran.
| |
Collapse
|
10
|
Chou KC. Proposing Pseudo Amino Acid Components is an Important Milestone for Proteome and Genome Analyses. Int J Pept Res Ther 2019. [DOI: 10.1007/s10989-019-09910-7] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022]
|
11
|
|
12
|
Khan S, Khan M, Iqbal N, Hussain T, Khan SA, Chou KC. A Two-Level Computation Model Based on Deep Learning Algorithm for Identification of piRNA and Their Functions via Chou’s 5-Steps Rule. Int J Pept Res Ther 2019. [DOI: 10.1007/s10989-019-09887-3] [Citation(s) in RCA: 45] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
|
13
|
Zhao Y, Xue X, Xie X. An alignment-free measure based on physicochemical properties of amino acids for protein sequence comparison. Comput Biol Chem 2019; 80:10-15. [PMID: 30851619 DOI: 10.1016/j.compbiolchem.2019.01.005] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2018] [Revised: 12/30/2018] [Accepted: 01/17/2019] [Indexed: 01/21/2023]
Abstract
Sequence comparison is an important topic in bioinformatics. With the exponential increase of biological sequences, the traditional protein sequence comparison methods - the alignment methods become limited, so the alignment-free methods are widely proposed in the past two decades. In this paper, we considered not only the six typical physicochemical properties of amino acids, but also their frequency and positional distribution. A 51-dimensional vector was obtained to describe the protein sequence. We got a pairwise distance matrix by computing the standardized Euclidean distance, and discriminant analysis and phylogenetic analysis can be made. The results on the Influenza A virus and ND5 datasets indicate that our method is accurate and efficient for classifying proteins and inferring the phylogeny of species.
Collapse
Affiliation(s)
- Yunxiu Zhao
- College of Science, Northwest A&F University, Yangling, Shaanxi 712100, PR China
| | - Xiaolong Xue
- College of Science, Northwest A&F University, Yangling, Shaanxi 712100, PR China
| | - Xiaoli Xie
- College of Science, Northwest A&F University, Yangling, Shaanxi 712100, PR China.
| |
Collapse
|
14
|
A study of the Immune Epitope Database for some fungi species using network topological indices. Mol Divers 2017; 21:713-718. [DOI: 10.1007/s11030-017-9749-4] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2016] [Accepted: 05/09/2017] [Indexed: 10/19/2022]
|
15
|
Ding Y, Wang X, Mou Z. Communities in the iron superoxide dismutase amino acid network. J Theor Biol 2015; 367:278-285. [PMID: 25500180 DOI: 10.1016/j.jtbi.2014.11.030] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2014] [Revised: 11/24/2014] [Accepted: 11/28/2014] [Indexed: 10/24/2022]
Abstract
Amino acid networks (AANs) analysis is a new way to reveal the relationship between protein structure and function. We constructed six different types of AANs based on iron superoxide dismutase (Fe-SOD) three-dimensional structure information. These Fe-SOD AANs have clear community structures when they were modularized by different methods. Especially, detected communities are related to Fe-SOD secondary structures. Regular structures show better correlations with detected communities than irregular structures, and loops weaken these correlations, which suggest that secondary structure is the unit element in Fe-SOD folding process. In addition, a comparative analysis of mesophilic and thermophilic Fe-SOD AANs' communities revealed that thermostable Fe-SOD AANs had more highly associated community structures than mesophilic one. Thermophilic Fe-SOD AANs also had more high similarity between communities and secondary structures than mesophilic Fe-SOD AANs. The communities in Fe-SOD AANs show that dense interactions in modules can help to stabilize thermophilic Fe-SOD.
Collapse
Affiliation(s)
- Yanrui Ding
- School of Digital Media, Jiangnan University, Wuxi, Jiangsu, 214122, P. R. China; Key Laboratory of Industrial Biotechnology, Jiangnan University, Wuxi, Jiangsu, 214122, P. R. China.
| | - Xueqin Wang
- School of Digital Media, Jiangnan University, Wuxi, Jiangsu, 214122, P. R. China
| | - Zhaolin Mou
- School of Digital Media, Jiangnan University, Wuxi, Jiangsu, 214122, P. R. China
| |
Collapse
|
16
|
A novel k-word relative measure for sequence comparison. Comput Biol Chem 2014; 53PB:331-338. [PMID: 25462340 DOI: 10.1016/j.compbiolchem.2014.10.007] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2014] [Revised: 08/10/2014] [Accepted: 10/25/2014] [Indexed: 12/28/2022]
Abstract
In order to extract phylogenetic information from DNA sequences, the new normalized k-word average relative distance is proposed in this paper. The proposed measure was tested by discriminate analysis and phylogenetic analysis. The phylogenetic trees based on the Manhattan distance measure are reconstructed with k ranging from 1 to 12. At the same time, a new method is suggested to reduce the matrix dimension, can greatly lessen the amount of calculation and operation time. The experimental assessment demonstrated that our measure was efficient. What's more, comparing with other methods' results shows that our method is feasible and powerful for phylogenetic analysis.
Collapse
|
17
|
An effective haplotype assembly algorithm based on hypergraph partitioning. J Theor Biol 2014; 358:85-92. [DOI: 10.1016/j.jtbi.2014.05.034] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2014] [Revised: 05/08/2014] [Accepted: 05/25/2014] [Indexed: 11/20/2022]
|
18
|
iSS-PseDNC: identifying splicing sites using pseudo dinucleotide composition. BIOMED RESEARCH INTERNATIONAL 2014; 2014:623149. [PMID: 24967386 PMCID: PMC4055483 DOI: 10.1155/2014/623149] [Citation(s) in RCA: 97] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/19/2014] [Revised: 04/22/2014] [Accepted: 04/23/2014] [Indexed: 11/17/2022]
Abstract
In eukaryotic genes, exons are generally interrupted by introns. Accurately removing introns and joining exons together are essential processes in eukaryotic gene expression. With the avalanche of genome sequences generated in the postgenomic age, it is highly desired to develop automated methods for rapid and effective detection of splice sites that play important roles in gene structure annotation and even in RNA splicing. Although a series of computational methods were proposed for splice site identification, most of them neglected the intrinsic local structural properties. In the present study, a predictor called “iSS-PseDNC” was developed for identifying splice sites. In the new predictor, the sequences were formulated by a novel feature-vector called “pseudo dinucleotide composition” (PseDNC) into which six DNA local structural properties were incorporated. It was observed by the rigorous cross-validation tests on two benchmark datasets that the overall success rates achieved by iSS-PseDNC in identifying splice donor site and splice acceptor site were 85.45% and 87.73%, respectively. It is anticipated that iSS-PseDNC may become a useful tool for identifying splice sites and that the six DNA local structural properties described in this paper may provide novel insights for in-depth investigations into the mechanism of RNA splicing.
Collapse
|
19
|
A QSPR-like model for multilocus genotype networks of Fasciola hepatica in Northwest Spain. J Theor Biol 2014; 343:16-24. [DOI: 10.1016/j.jtbi.2013.11.005] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2013] [Revised: 11/08/2013] [Accepted: 11/11/2013] [Indexed: 11/23/2022]
|
20
|
Liu Q, Chen YPP, Li J. k-Partite cliques of protein interactions: A novel subgraph topology for functional coherence analysis on PPI networks. J Theor Biol 2014; 340:146-54. [PMID: 24056214 DOI: 10.1016/j.jtbi.2013.09.013] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2013] [Revised: 08/09/2013] [Accepted: 09/10/2013] [Indexed: 01/02/2023]
Abstract
Many studies are aimed at identifying dense clusters/subgraphs from protein-protein interaction (PPI) networks for protein function prediction. However, the prediction performance based on the dense clusters is actually worse than a simple guilt-by-association method using neighbor counting ideas. This indicates that the local topological structures and properties of PPI networks are still open to new theoretical investigation and empirical exploration. We introduce a novel topological structure called k-partite cliques of protein interactions-a functionally coherent but not-necessarily dense subgraph topology in PPI networks-to study PPI networks. A k-partite protein clique is a maximal k-partite clique comprising two or more nonoverlapping protein subsets between any two of which full interactions are exhibited. In the detection of PPI's maximal k-partite cliques, we propose to transform PPI networks into induced K-partite graphs where edges exist only between the partites. Then, we present a maximal k-partite clique mining (MaCMik) algorithm to enumerate maximal k-partite cliques from K-partite graphs. Our MaCMik algorithm is then applied to a yeast PPI network. We observed interesting and unusually high functional coherence in k-partite protein cliques-the majority of the proteins in k-partite protein cliques, especially those in the same partites, share the same functions, although k-partite protein cliques are not restricted to be dense compared with dense subgraph patterns or (quasi-)cliques. The idea of k-partite protein cliques provides a novel approach of characterizing PPI networks, and so it will help function prediction for unknown proteins.
Collapse
Affiliation(s)
- Qian Liu
- Advanced Analytics Institute, University of Technology Sydney, Sydney, Australia
| | | | | |
Collapse
|
21
|
Grappling the high altitude for safe edible bamboo shoots with rich nutritional attributes and escaping cyanogenic toxicity. BIOMED RESEARCH INTERNATIONAL 2013; 2013:289285. [PMID: 24350255 PMCID: PMC3852316 DOI: 10.1155/2013/289285] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/18/2013] [Accepted: 10/03/2013] [Indexed: 11/17/2022]
Abstract
Consumption of bamboo species with high level of total cyanogenic content (TCC) in Asia by many ethnic groups is significantly associated with food poisoning and occasionally Konzo (a neurological disorder). Adequate characterization of edible bamboo species with low level of TCC and high nutritious attributes is required for consumer's safety as well as for the conservation of the gene pool. Here, we employed morphological descriptors, atomic absorption spectrophotometer, RAPD, and trnL-F intergenic spacer to characterize 15 indigenous edible bamboo species of north-east India. The study indicates that morphologically and genetically evolved edible bamboo species having large and robust bamboo-shoot texture and growing at low altitude contain high level of TCC, low antioxidant properties, and low levels of beneficial macronutrients and micronutrients. Importantly, Dendrocalamus species are shown to be rich in TCC irrespective of the growing altitude while Bambusa species are found to have moderate level of TCC. The findings clearly demonstrated that Chimonobambusa callosa growing at high altitude represents safe edible bamboo species with nutritious attributes.
Collapse
|
22
|
Al-Mamun M, Brown L, Hossain M, Fall C, Wagstaff L, Bass R. A hybrid computational model for the effects of maspin on cancer cell dynamics. J Theor Biol 2013; 337:150-60. [DOI: 10.1016/j.jtbi.2013.08.016] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2013] [Revised: 08/07/2013] [Accepted: 08/15/2013] [Indexed: 01/01/2023]
|
23
|
Yang X, Wang T. Linear regression model of short k-word: a similarity distance suitable for biological sequences with various lengths. J Theor Biol 2013; 337:61-70. [PMID: 23933105 DOI: 10.1016/j.jtbi.2013.07.028] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2013] [Revised: 07/02/2013] [Accepted: 07/30/2013] [Indexed: 01/10/2023]
Abstract
Originating from sequences' length difference, both k-word based methods and graphical representation approaches have uncovered biological information in their distinct ways. However, it is less likely that the mechanisms of information storage vary with sequences' length. A similarity distance suitable for sequences with various lengths will be much near to the mechanisms of information storage. In this paper, new sub-sequences of k-word were extracted from biological sequences under a one-to-one mapping. The new sub-sequences were evaluated by a linear regression model. Moreover, a new distance was defined on the invariants from the linear regression model. With comparison to other alignment-free distances, the results of four experiments demonstrated that our similarity distance was more efficient.
Collapse
Affiliation(s)
- Xiwu Yang
- School of Mathematical Sciences, Dalian University of Technology, Dalian, Liaoning 116024, PR China; School of Mathematics, Liaoning Normal University, Dalian, Liaoning 116029, PR China.
| | - Tianming Wang
- School of Mathematical Sciences, Dalian University of Technology, Dalian, Liaoning 116024, PR China
| |
Collapse
|
24
|
Sequence and structure space model of protein divergence driven by point mutations. J Theor Biol 2013; 330:1-8. [DOI: 10.1016/j.jtbi.2013.03.015] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2012] [Revised: 03/07/2013] [Accepted: 03/18/2013] [Indexed: 12/11/2022]
|
25
|
Sharma A, Costantini S, Colonna G. The protein-protein interaction network of the human Sirtuin family. BIOCHIMICA ET BIOPHYSICA ACTA-PROTEINS AND PROTEOMICS 2013; 1834:1998-2009. [PMID: 23811471 DOI: 10.1016/j.bbapap.2013.06.012] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/20/2013] [Revised: 05/31/2013] [Accepted: 06/18/2013] [Indexed: 12/15/2022]
Abstract
Protein-protein interaction networks are useful for studying human diseases and to look for possible health care through a holistic approach. Networks are playing an increasing and important role in the understanding of physiological processes such as homeostasis, signaling, spatial and temporal organizations, and pathological conditions. In this article we show the complex system of interactions determined by human Sirtuins (Sirt) largely involved in many metabolic processes as well as in different diseases. The Sirtuin family consists of seven homologous Sirt-s having structurally similar cores but different terminal segments, being rather variable in length and/or intrinsically disordered. Many studies have determined their cellular location as well as biological functions although molecular mechanisms through which they act are actually little known therefore, the aim of this work was to define, explore and understand the Sirtuin-related human interactome. As a first step, we have integrated the experimentally determined protein-protein interactions of the Sirtuin-family as well as their first and second neighbors to a Sirtuin-related sub-interactome. Our data showed that the second-neighbor network of Sirtuins encompasses 25% of the entire human interactome, and exhibits a scale-free degree distribution and interconnectedness among top degree nodes. Moreover, the Sirtuin sub interactome showed a modular structure around the core comprising mixed functions. Finally, we extracted from the Sirtuin sub-interactome subnets related to cancer, aging and post-translational modifications for information on key nodes and topological space of the subnets in the Sirt family network.
Collapse
Affiliation(s)
- Ankush Sharma
- Biochemistry, Biophysics and General Pathology Department, Second University of Naples, Naples, Italy
| | | | | |
Collapse
|
26
|
Signal propagation in protein interaction network during colorectal cancer progression. BIOMED RESEARCH INTERNATIONAL 2013; 2013:287019. [PMID: 23586028 PMCID: PMC3615629 DOI: 10.1155/2013/287019] [Citation(s) in RCA: 36] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/16/2013] [Accepted: 02/18/2013] [Indexed: 11/18/2022]
Abstract
Colorectal cancer is generally categorized into the following four stages according to its development or serious degree: Dukes A, B, C, and D. Since different stage of colorectal cancer actually corresponds to different activated region of the network, the transition of different network states may reflect its pathological changes. In view of this, we compared the gene expressions among the colorectal cancer patients in the aforementioned four stages and obtained the early and late stage biomarkers, respectively. Subsequently, the two kinds of biomarkers were both mapped onto the protein interaction network. If an early biomarker and a late biomarker were close in the network and also if their expression levels were correlated in the Dukes B and C patients, then a signal propagation path from the early stage biomarker to the late one was identified. Many transition genes in the signal propagation paths were involved with the signal transduction, cell communication, and cellular process regulation. Some transition hubs were known as colorectal cancer genes. The findings reported here may provide useful insights for revealing the mechanism of colorectal cancer progression at the cellular systems biology level.
Collapse
|
27
|
Xiao X, Wang P, Lin WZ, Jia JH, Chou KC. iAMP-2L: a two-level multi-label classifier for identifying antimicrobial peptides and their functional types. Anal Biochem 2013; 436:168-77. [PMID: 23395824 DOI: 10.1016/j.ab.2013.01.019] [Citation(s) in RCA: 367] [Impact Index Per Article: 33.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2012] [Revised: 01/10/2013] [Accepted: 01/21/2013] [Indexed: 12/14/2022]
Abstract
Antimicrobial peptides (AMPs), also called host defense peptides, are an evolutionarily conserved component of the innate immune response and are found among all classes of life. According to their special functions, AMPs are generally classified into ten categories: Antibacterial Peptides, Anticancer/tumor Peptides, Antifungal Peptides, Anti-HIV Peptides, Antiviral Peptides, Antiparasital Peptides, Anti-protist Peptides, AMPs with Chemotactic Activity, Insecticidal Peptides, and Spermicidal Peptides. Given a query peptide, how can we identify whether it is an AMP or non-AMP? If it is, can we identify which functional type or types it belong to? Particularly, how can we deal with the multi-type problem since an AMP may belong to two or more functional types? To address these problems, which are obviously very important to both basic research and drug development, a multi-label classifier was developed based on the pseudo amino acid composition (PseAAC) and fuzzy K-nearest neighbor (FKNN) algorithm, where the components of PseAAC were featured by incorporating five physicochemical properties. The novel classifier is called iAMP-2L, where "2L" means that it is a 2-level predictor. The 1st-level is to answer the 1st question above, while the 2nd-level is to answer the 2nd and 3rd questions that are beyond the reach of any existing methods in this area. For the conveniences of users, a user-friendly web-server for iAMP-2L was established at http://www.jci-bioinfo.cn/iAMP-2L.
Collapse
Affiliation(s)
- Xuan Xiao
- Computer Department, Jing-De-Zhen Ceramic Institute, Jing-De-Zhen 333403, China.
| | | | | | | | | |
Collapse
|
28
|
Li ZC, Lai YH, Chen LL, Chen C, Xie Y, Dai Z, Zou XY. Identifying subcellular localizations of mammalian protein complexes based on graph theory with a random forest algorithm. MOLECULAR BIOSYSTEMS 2013; 9:658-67. [DOI: 10.1039/c3mb25451h] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/28/2023]
|
29
|
A graph spectrum based geometric biclustering algorithm. J Theor Biol 2013; 317:200-11. [DOI: 10.1016/j.jtbi.2012.10.012] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2012] [Revised: 10/04/2012] [Accepted: 10/06/2012] [Indexed: 11/22/2022]
|
30
|
A simple k-word interval method for phylogenetic analysis of DNA sequences. J Theor Biol 2013; 317:192-9. [DOI: 10.1016/j.jtbi.2012.10.010] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2012] [Revised: 10/02/2012] [Accepted: 10/06/2012] [Indexed: 11/18/2022]
|
31
|
Lin SX, Lapointe J. Theoretical and experimental biology in one<br>—A symposium in honour of Professor Kuo-Chen Chou’s 50th anniversary and Professor Richard Giegé’s 40th anniversary of their scientific careers. ACTA ACUST UNITED AC 2013. [DOI: 10.4236/jbise.2013.64054] [Citation(s) in RCA: 132] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
|
32
|
iNuc-PhysChem: a sequence-based predictor for identifying nucleosomes via physicochemical properties. PLoS One 2012; 7:e47843. [PMID: 23144709 PMCID: PMC3483203 DOI: 10.1371/journal.pone.0047843] [Citation(s) in RCA: 165] [Impact Index Per Article: 13.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2012] [Accepted: 09/21/2012] [Indexed: 01/14/2023] Open
Abstract
Nucleosome positioning has important roles in key cellular processes. Although intensive efforts have been made in this area, the rules defining nucleosome positioning is still elusive and debated. In this study, we carried out a systematic comparison among the profiles of twelve DNA physicochemical features between the nucleosomal and linker sequences in the Saccharomyces cerevisiae genome. We found that nucleosomal sequences have some position-specific physicochemical features, which can be used for in-depth studying nucleosomes. Meanwhile, a new predictor, called iNuc-PhysChem, was developed for identification of nucleosomal sequences by incorporating these physicochemical properties into a 1788-D (dimensional) feature vector, which was further reduced to a 884-D vector via the IFS (incremental feature selection) procedure to optimize the feature set. It was observed by a cross-validation test on a benchmark dataset that the overall success rate achieved by iNuc-PhysChem was over 96% in identifying nucleosomal or linker sequences. As a web-server, iNuc-PhysChem is freely accessible to the public at http://lin.uestc.edu.cn/server/iNuc-PhysChem. For the convenience of the vast majority of experimental scientists, a step-by-step guide is provided on how to use the web-server to get the desired results without the need to follow the complicated mathematics that were presented just for the integrity in developing the predictor. Meanwhile, for those who prefer to run predictions in their own computers, the predictor's code can be easily downloaded from the web-server. It is anticipated that iNuc-PhysChem may become a useful high throughput tool for both basic research and drug design.
Collapse
|
33
|
Fernandez-Blanco E, Rivero D, Rabuñal J, Dorado J, Pazos A, Munteanu CR. Automatic seizure detection based on star graph topological indices. J Neurosci Methods 2012; 209:410-9. [DOI: 10.1016/j.jneumeth.2012.07.004] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2012] [Revised: 06/28/2012] [Accepted: 07/10/2012] [Indexed: 11/27/2022]
|
34
|
Predicting Anatomical Therapeutic Chemical (ATC) classification of drugs by integrating chemical-chemical interactions and similarities. PLoS One 2012; 7:e35254. [PMID: 22514724 PMCID: PMC3325992 DOI: 10.1371/journal.pone.0035254] [Citation(s) in RCA: 140] [Impact Index Per Article: 11.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2011] [Accepted: 03/14/2012] [Indexed: 12/25/2022] Open
Abstract
The Anatomical Therapeutic Chemical (ATC) classification system, recommended by the World Health Organization, categories drugs into different classes according to their therapeutic and chemical characteristics. For a set of query compounds, how can we identify which ATC-class (or classes) they belong to? It is an important and challenging problem because the information thus obtained would be quite useful for drug development and utilization. By hybridizing the informations of chemical-chemical interactions and chemical-chemical similarities, a novel method was developed for such purpose. It was observed by the jackknife test on a benchmark dataset of 3,883 drug compounds that the overall success rate achieved by the prediction method was about 73% in identifying the drugs among the following 14 main ATC-classes: (1) alimentary tract and metabolism; (2) blood and blood forming organs; (3) cardiovascular system; (4) dermatologicals; (5) genitourinary system and sex hormones; (6) systemic hormonal preparations, excluding sex hormones and insulins; (7) anti-infectives for systemic use; (8) antineoplastic and immunomodulating agents; (9) musculoskeletal system; (10) nervous system; (11) antiparasitic products, insecticides and repellents; (12) respiratory system; (13) sensory organs; (14) various. Such a success rate is substantially higher than 7% by the random guess. It has not escaped our notice that the current method can be straightforwardly extended to identify the drugs for their 2nd-level, 3rd-level, 4th-level, and 5th-level ATC-classifications once the statistically significant benchmark data are available for these lower levels.
Collapse
|
35
|
Identification of colorectal cancer related genes with mRMR and shortest path in protein-protein interaction network. PLoS One 2012; 7:e33393. [PMID: 22496748 PMCID: PMC3319543 DOI: 10.1371/journal.pone.0033393] [Citation(s) in RCA: 131] [Impact Index Per Article: 10.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2011] [Accepted: 02/13/2012] [Indexed: 11/19/2022] Open
Abstract
One of the most important and challenging problems in biomedicine and genomics is how to identify the disease genes. In this study, we developed a computational method to identify colorectal cancer-related genes based on (i) the gene expression profiles, and (ii) the shortest path analysis of functional protein association networks. The former has been used to select differentially expressed genes as disease genes for quite a long time, while the latter has been widely used to study the mechanism of diseases. With the existing protein-protein interaction data from STRING (Search Tool for the Retrieval of Interacting Genes), a weighted functional protein association network was constructed. By means of the mRMR (Maximum Relevance Minimum Redundancy) approach, six genes were identified that can distinguish the colorectal tumors and normal adjacent colonic tissues from their gene expression profiles. Meanwhile, according to the shortest path approach, we further found an additional 35 genes, of which some have been reported to be relevant to colorectal cancer and some are very likely to be relevant to it. Interestingly, the genes we identified from both the gene expression profiles and the functional protein association network have more cancer genes than the genes identified from the gene expression profiles alone. Besides, these genes also had greater functional similarity with the reported colorectal cancer genes than the genes identified from the gene expression profiles alone. All these indicate that our method as presented in this paper is quite promising. The method may become a useful tool, or at least plays a complementary role to the existing method, for identifying colorectal cancer genes. It has not escaped our notice that the method can be applied to identify the genes of other diseases as well.
Collapse
|
36
|
Ye H, Tang K, Yang L, Cao Z, Li Y. Study of drug function based on similarity of pathway fingerprint. Protein Cell 2012; 3:132-9. [PMID: 22426982 DOI: 10.1007/s13238-012-2011-z] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2011] [Accepted: 01/04/2012] [Indexed: 02/06/2023] Open
Abstract
Drugs sharing similar therapeutic function may not bind to the same group of targets. However, their targets may be involved in similar pathway profiles which are associated with certain pathological process. In this study, pathway fingerprint was introduced to indicate the profile of significant pathways being influenced by the targets of drugs. Then drug-drug network was further constructed based on significant similarity of pathway fingerprints. In this way, the functions of a drug may be hinted by the enriched therapeutic functions of its neighboring drugs. In the test of 911 FDA approved drugs with more than one known target, 471 drugs could be connected into networks. 760 significant associations of drug-therapeutic function were generated, among which around 60% of them were supported by scientific literatures or ATC codes of drug functional classification. Therefore, pathway fingerprints may be useful to further study on the potential function of known drugs, or the unknown function of new drugs.
Collapse
Affiliation(s)
- Hao Ye
- State Key Laboratory of Bioreactor Engineering, East China University of Science & Technology, Shanghai, 200237, China
| | | | | | | | | |
Collapse
|
37
|
Riera-Fernández P, Munteanu CR, Escobar M, Prado-Prado F, Martín-Romalde R, Pereira D, Villalba K, Duardo-Sánchez A, González-Díaz H. New Markov–Shannon Entropy models to assess connectivity quality in complex networks: From molecular to cellular pathway, Parasite–Host, Neural, Industry, and Legal–Social networks. J Theor Biol 2012; 293:174-88. [DOI: 10.1016/j.jtbi.2011.10.016] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2011] [Revised: 10/09/2011] [Accepted: 10/14/2011] [Indexed: 11/25/2022]
|
38
|
Qiu Z, Wang X. Prediction of protein-protein interaction sites using patch-based residue characterization. J Theor Biol 2011; 293:143-50. [PMID: 22037062 DOI: 10.1016/j.jtbi.2011.10.021] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2011] [Revised: 09/13/2011] [Accepted: 10/15/2011] [Indexed: 10/15/2022]
Abstract
Identifying protein-protein interaction sites provides important clues to the function of a protein and is becoming increasingly relevant in topics such as systems biology and drug discovery. Using a patch-based model for residue characterization, we trained random forest classifiers for residue-based interface prediction, which was followed by a clustering procedure to produce patches for patch-based interface prediction. For residue-based interface prediction, our method achieves a specificity rate of 0.7 and a sensitivity rate of 0.78. For patch-based interface prediction, a success rate of 0.80 is achieved. Based on same datasets, we also compare it with several published methods. The results show that our method is a successful predictor for residue-based and patch-based interface prediction.
Collapse
Affiliation(s)
- Zhijun Qiu
- The State Key Laboratory of Structural Analysis of Industrial Equipment, Dalian University of Technology, 2 Ling-Gong Road, Dalian 116024, China
| | | |
Collapse
|
39
|
Wavelet images and Chou’s pseudo amino acid composition for protein classification. Amino Acids 2011; 43:657-65. [DOI: 10.1007/s00726-011-1114-9] [Citation(s) in RCA: 72] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2010] [Accepted: 09/28/2011] [Indexed: 10/16/2022]
|
40
|
Huang T, Chen L, Cai YD, Chou KC. Classification and analysis of regulatory pathways using graph property, biochemical and physicochemical property, and functional property. PLoS One 2011; 6:e25297. [PMID: 21980418 PMCID: PMC3182212 DOI: 10.1371/journal.pone.0025297] [Citation(s) in RCA: 77] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2011] [Accepted: 08/31/2011] [Indexed: 12/20/2022] Open
Abstract
Given a regulatory pathway system consisting of a set of proteins, can we predict which pathway class it belongs to? Such a problem is closely related to the biological function of the pathway in cells and hence is quite fundamental and essential in systems biology and proteomics. This is also an extremely difficult and challenging problem due to its complexity. To address this problem, a novel approach was developed that can be used to predict query pathways among the following six functional categories: (i) “Metabolism”, (ii) “Genetic Information Processing”, (iii) “Environmental Information Processing”, (iv) “Cellular Processes”, (v) “Organismal Systems”, and (vi) “Human Diseases”. The prediction method was established trough the following procedures: (i) according to the general form of pseudo amino acid composition (PseAAC), each of the pathways concerned is formulated as a 5570-D (dimensional) vector; (ii) each of components in the 5570-D vector was derived by a series of feature extractions from the pathway system according to its graphic property, biochemical and physicochemical property, as well as functional property; (iii) the minimum redundancy maximum relevance (mRMR) method was adopted to operate the prediction. A cross-validation by the jackknife test on a benchmark dataset consisting of 146 regulatory pathways indicated that an overall success rate of 78.8% was achieved by our method in identifying query pathways among the above six classes, indicating the outcome is quite promising and encouraging. To the best of our knowledge, the current study represents the first effort in attempting to identity the type of a pathway system or its biological function. It is anticipated that our report may stimulate a series of follow-up investigations in this new and challenging area.
Collapse
Affiliation(s)
- Tao Huang
- Institute of Systems Biology, Shanghai University, Shanghai, People's Republic of China
- Key Laboratory of Systems Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, People's Republic of China
- Shanghai Center for Bioinformation Technology, Shanghai, People's Republic of China
| | - Lei Chen
- College of Information Engineering, Shanghai Maritime University, Shanghai, People's Republic of China
| | - Yu-Dong Cai
- Institute of Systems Biology, Shanghai University, Shanghai, People's Republic of China
- Gordon Life Science Institute, San Diego, California, United States of America
- * E-mail:
| | - Kuo-Chen Chou
- Gordon Life Science Institute, San Diego, California, United States of America
| |
Collapse
|
41
|
Jingbo X, Silan Z, Feng S, Huijuan X, Xuehai H, Xiaohui N, Zhi L. Using the concept of pseudo amino acid composition to predict resistance gene against Xanthomonas oryzae pv. oryzae in rice: An approach from chaos games representation. J Theor Biol 2011; 284:16-23. [DOI: 10.1016/j.jtbi.2011.06.003] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2010] [Revised: 06/02/2011] [Accepted: 06/03/2011] [Indexed: 10/18/2022]
|
42
|
Hu LL, Huang T, Cai YD, Chou KC. Prediction of body fluids where proteins are secreted into based on protein interaction network. PLoS One 2011; 6:e22989. [PMID: 21829572 PMCID: PMC3146524 DOI: 10.1371/journal.pone.0022989] [Citation(s) in RCA: 34] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2011] [Accepted: 07/08/2011] [Indexed: 12/27/2022] Open
Abstract
Determining the body fluids where secreted proteins can be secreted into is important for protein function annotation and disease biomarker discovery. In this study, we developed a network-based method to predict which kind of body fluids human proteins can be secreted into. For a newly constructed benchmark dataset that consists of 529 human-secreted proteins, the prediction accuracy for the most possible body fluid location predicted by our method via the jackknife test was 79.02%, significantly higher than the success rate by a random guess (29.36%). The likelihood that the predicted body fluids of the first four orders contain all the true body fluids where the proteins can be secreted into is 62.94%. Our method was further demonstrated with two independent datasets: one contains 57 proteins that can be secreted into blood; while the other contains 61 proteins that can be secreted into plasma/serum and were possible biomarkers associated with various cancers. For the 57 proteins in first dataset, 55 were correctly predicted as blood-secrete proteins. For the 61 proteins in the second dataset, 58 were predicted to be most possible in plasma/serum. These encouraging results indicate that the network-based prediction method is quite promising. It is anticipated that the method will benefit the relevant areas for both basic research and drug development.
Collapse
Affiliation(s)
- Le-Le Hu
- Institute of Systems Biology, Shanghai University, Shanghai, China
- Department of Chemistry, College of Sciences, Shanghai University, Shanghai, China
| | - Tao Huang
- Key Laboratory of Systems Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, China
- Shanghai Center for Bioinformation Technology, Shanghai, China
| | - Yu-Dong Cai
- Institute of Systems Biology, Shanghai University, Shanghai, China
- Centre for Computational Systems Biology, Fudan University, Shanghai, China
- Gordon Life Science Institute, San Diego, California, United States of America
- * E-mail:
| | - Kuo-Chen Chou
- Gordon Life Science Institute, San Diego, California, United States of America
| |
Collapse
|
43
|
García I, Fall Y, García-Mera X, Prado-Prado F. Theoretical study of GSK−3α: neural networks QSAR studies for the design of new inhibitors using 2D descriptors. Mol Divers 2011; 15:947-55. [DOI: 10.1007/s11030-011-9325-2] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2011] [Accepted: 06/20/2011] [Indexed: 10/18/2022]
|
44
|
Self-similarity analysis of eubacteria genome based on weighted graph. J Theor Biol 2011; 280:10-8. [PMID: 21496459 PMCID: PMC7094106 DOI: 10.1016/j.jtbi.2011.03.033] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2010] [Revised: 03/08/2011] [Accepted: 03/26/2011] [Indexed: 11/22/2022]
Abstract
We introduce a weighted graph model to investigate the self-similarity characteristics of eubacteria genomes. The regular treating in similarity comparison about genome is to discover the evolution distance among different genomes. Few people focus their attention on the overall statistical characteristics of each gene compared with other genes in the same genome. In our model, each genome is attributed to a weighted graph, whose topology describes the similarity relationship among genes in the same genome. Based on the related weighted graph theory, we extract some quantified statistical variables from the topology, and give the distribution of some variables derived from the largest social structure in the topology. The 23 eubacteria recently studied by Sorimachi and Okayasu are markedly classified into two different groups by their double logarithmic point-plots describing the similarity relationship among genes of the largest social structure in genome. The results show that the proposed model may provide us with some new sights to understand the structures and evolution patterns determined from the complete genomes.
Collapse
|
45
|
Zhou GP. The disposition of the LZCC protein residues in wenxiang diagram provides new insights into the protein-protein interaction mechanism. J Theor Biol 2011; 284:142-8. [PMID: 21718705 PMCID: PMC7094099 DOI: 10.1016/j.jtbi.2011.06.006] [Citation(s) in RCA: 128] [Impact Index Per Article: 9.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2011] [Revised: 04/28/2011] [Accepted: 06/07/2011] [Indexed: 01/06/2023]
Abstract
Wenxiang diagram is a new two-dimensional representation that characterizes the disposition of hydrophobic and hydrophilic residues in α-helices. In this research, the hydrophobic and hydrophilic residues of two leucine zipper coiled-coil (LZCC) structural proteins, cGKIα(1-59) and MBS(CT35) are dispositioned on the wenxiang diagrams according to heptad repeat pattern (abcdefg)(n), respectively. Their wenxiang diagrams clearly demonstrate that the residues with same repeat letters are laid on same side of the spiral diagrams, where most hydrophobic residues are positioned at a and d, and most hydrophilic residues are localized on b, c, e, f and g polar position regions. The wenxiang diagrams of a dimetric LZCC can be represented by the combination of two monomeric wenxiang diagrams, and the wenxiang diagrams of the two LZCC (tetramer) complex structures can also be assembled by using two pairs of their wenxiang diagrams. Furthermore, by comparing the wenxiang diagrams of cGKIα(1-59) and MBS(CT35), the interaction between cGKIα(1-59) and MBS(CT35) is suggested to be weaker. By analyzing the wenxiang diagram of the cGKIα(1-59.)·MBS(CT42) complex structure, most affected residues of cGKIα(1-59) by the interaction with MBS(CT42) are proposed at positions d, a, e and g of the LZCC structure. These findings are consistent with our previous NMR results. Incorporating NMR spectroscopy, the wenxiang diagrams of LZCC structures may provide novel insights into the interaction mechanisms between dimeric, trimeric, tetrameric coiled-coil structures.
Collapse
Affiliation(s)
- Guo-Ping Zhou
- Gordon Life Science Institute, 13784 Torrey Del Mar Drive, San Diego, CA 92130, USA.
| |
Collapse
|
46
|
Optimal atomic-resolution structures of prion AGAAAAGA amyloid fibrils. J Theor Biol 2011; 279:17-28. [DOI: 10.1016/j.jtbi.2011.02.012] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2010] [Revised: 02/05/2011] [Accepted: 02/16/2011] [Indexed: 11/20/2022]
|
47
|
Huang Y, Yang L, Wang T. Phylogenetic analysis of DNA sequences based on the generalized pseudo-amino acid composition. J Theor Biol 2011; 269:217-23. [DOI: 10.1016/j.jtbi.2010.10.027] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2010] [Revised: 10/22/2010] [Accepted: 10/22/2010] [Indexed: 11/15/2022]
|
48
|
|
49
|
Chou KC. Some remarks on protein attribute prediction and pseudo amino acid composition. J Theor Biol 2010; 273:236-47. [PMID: 21168420 PMCID: PMC7125570 DOI: 10.1016/j.jtbi.2010.12.024] [Citation(s) in RCA: 966] [Impact Index Per Article: 69.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2010] [Revised: 12/08/2010] [Accepted: 12/13/2010] [Indexed: 11/29/2022]
Abstract
With the accomplishment of human genome sequencing, the number of sequence-known proteins has increased explosively. In contrast, the pace is much slower in determining their biological attributes. As a consequence, the gap between sequence-known proteins and attribute-known proteins has become increasingly large. The unbalanced situation, which has critically limited our ability to timely utilize the newly discovered proteins for basic research and drug development, has called for developing computational methods or high-throughput automated tools for fast and reliably identifying various attributes of uncharacterized proteins based on their sequence information alone. Actually, during the last two decades or so, many methods in this regard have been established in hope to bridge such a gap. In the course of developing these methods, the following things were often needed to consider: (1) benchmark dataset construction, (2) protein sample formulation, (3) operating algorithm (or engine), (4) anticipated accuracy, and (5) web-server establishment. In this review, we are to discuss each of the five procedures, with a special focus on the introduction of pseudo amino acid composition (PseAAC), its different modes and applications as well as its recent development, particularly in how to use the general formulation of PseAAC to reflect the core and essential features that are deeply hidden in complicated protein sequences.
Collapse
Affiliation(s)
- Kuo-Chen Chou
- Gordon Life Science Institute, 13784 Torrey Del Mar Drive, San Diego, CA 92130, USA.
| |
Collapse
|
50
|
Qi ZH, Wei RY. A combination dimensionality reduction approach to codon position patterns of eubacteria based on their complete genomes. J Theor Biol 2010; 272:26-34. [PMID: 21163267 DOI: 10.1016/j.jtbi.2010.12.014] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2010] [Revised: 12/08/2010] [Accepted: 12/08/2010] [Indexed: 01/11/2023]
Abstract
Graphical techniques have become powerful tools for the visualization and analysis of complicated biological systems. However, we cannot give such a graphical representation in a 2D/3D space when the dimensions of the represented data are more than three dimensions. The proposed method, a combination dimensionality reduction approach (CDR), consists of two parts: (i) principal component analysis (PCA) with a newly defined parameter ρ and (ii) locally linear embedding (LLE) with a proposed graphical selection for its optional parameter k. The CDR approach with ρ and k not only avoids loss of principal information, but also sufficiently well preserves the global high-dimensional structures in low-dimensional space such as 2D or 3D. The applications of the CDR on characteristic analysis at different codon positions in genome show that the method is a useful tool by which biologists could find useful biological knowledge.
Collapse
Affiliation(s)
- Zhao-Hui Qi
- College of Information Science and Technology, Shijiazhuang Tiedao University, Shijiazhuang, Hebei, People's Republic of China.
| | | |
Collapse
|