1
|
Xu L, Jiao S, Zhang D, Wu S, Zhang H, Gao B. Identification of long noncoding RNAs with machine learning methods: a review. Brief Funct Genomics 2021; 20:174-180. [PMID: 33758917 DOI: 10.1093/bfgp/elab017] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2021] [Revised: 02/24/2021] [Accepted: 02/25/2021] [Indexed: 12/11/2022] Open
Abstract
Long noncoding RNAs (lncRNAs) are noncoding RNAs with a length greater than 200 nucleotides. Studies have shown that they play an important role in many life activities. Dozens of lncRNAs have been characterized to some extent, and they are reported to be related to the development of diseases in a variety of cells. However, the biological functions of most lncRNAs are currently still unclear. Therefore, accurately identifying and predicting lncRNAs would be helpful for research on their biological functions. Due to the disadvantages of high cost and high resource-intensiveness of experimental methods, scientists have developed numerous computational methods to identify and predict lncRNAs in recent years. In this paper, we systematically summarize the machine learning-based lncRNAs prediction tools from several perspectives, and discuss the challenges and prospects for the future work.
Collapse
Affiliation(s)
- Lei Xu
- School of Electronic and Communication Engineering, Shenzhen Polytechnic
| | - Shihu Jiao
- College of Chemistry, Sichuan University, Sichuan, China
| | - Dandan Zhang
- Departments of Obstetrics and Gynecology, First Affiliated Hospital of Harbin Medical University
| | - Song Wu
- Preventive Treatment of Disease Centre of Qinhuangdao Hospital of Traditional Chinese Medicine
| | - Haihong Zhang
- First Affiliated Hospital of Harbin Medical University
| | - Bo Gao
- Second Affiliated Hospital, Harbin Medical University, Harbin, China
| |
Collapse
|
2
|
Das S, Chakrabarti S. Classification and prediction of protein-protein interaction interface using machine learning algorithm. Sci Rep 2021; 11:1761. [PMID: 33469042 PMCID: PMC7815773 DOI: 10.1038/s41598-020-80900-2] [Citation(s) in RCA: 33] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2020] [Accepted: 12/15/2020] [Indexed: 01/29/2023] Open
Abstract
Structural insight of the protein-protein interaction (PPI) interface can provide knowledge about the kinetics, thermodynamics and molecular functions of the complex while elucidating its role in diseases and further enabling it as a potential therapeutic target. However, owing to experimental lag in solving protein-protein complex structures, three-dimensional (3D) knowledge of the PPI interfaces can be gained via computational approaches like molecular docking and post-docking analyses. Despite development of numerous docking tools and techniques, success in identification of native like interfaces based on docking score functions is limited. Hence, we employed an in-depth investigation of the structural features of the interface that might successfully delineate native complexes from non-native ones. We identify interface properties, which show statistically significant difference between native and non-native interfaces belonging to homo and hetero, protein-protein complexes. Utilizing these properties, a support vector machine (SVM) based classification scheme has been implemented to differentiate native and non-native like complexes generated using docking decoys. Benchmarking and comparative analyses suggest very good performance of our SVM classifiers. Further, protein interactions, which are proven via experimental findings but not resolved structurally, were subjected to this approach where 3D-models of the complexes were generated and most likely interfaces were predicted. A web server called Protein Complex Prediction by Interface Properties (PCPIP) is developed to predict whether interface of a given protein-protein dimer complex resembles known protein interfaces. The server is freely available at http://www.hpppi.iicb.res.in/pcpip/ .
Collapse
Affiliation(s)
- Subhrangshu Das
- grid.417635.20000 0001 2216 5074Structural Biology and Bioinformatics Division, CSIR-Indian Institute of Chemical Biology, Kolkata, WB India
| | - Saikat Chakrabarti
- grid.417635.20000 0001 2216 5074Structural Biology and Bioinformatics Division, CSIR-Indian Institute of Chemical Biology, Kolkata, WB India
| |
Collapse
|
3
|
Wang C, Zhao N, Sun K, Zhang Y. A Cancer Gene Module Mining Method Based on Bio-Network of Multi-Omics Gene Groups. Front Oncol 2020; 10:1159. [PMID: 32637361 PMCID: PMC7317001 DOI: 10.3389/fonc.2020.01159] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2020] [Accepted: 06/08/2020] [Indexed: 11/13/2022] Open
Abstract
The initiation, promotion and progression of cancer are highly associated to the environment a human lives in as well as individual genetic factors. In view of the dangers to life and health caused by this abnormally complex systemic disease, many top scientific research institutions around the world have been actively carrying out research in order to discover the pathogenic mechanisms driving cancer occurrence and development. The emergence of high-throughput sequencing technology has greatly advanced oncology research and given rise to the revelation of important oncogenes and the interrelationship among them. Here, we have studied heterogeneous multi-level data within a context of integrated data, and scientifically introduced lncRNA omics data to construct multi-omics bio-network models, allowing the screening of key cancer-related gene groups. We propose a compactness clustering algorithm based on corrected cumulative rank scores, which uses the functional similarity between groups of genes as a distance measure to excavate key gene modules for abnormal regulation contained in gene groups through clustering. We also conducted a survival analysis using our results and found that our model could divide groups of different levels very well. The results also demonstrate that the integration of multi-omics biological data, key gene modules and their dysregulated gene groups can be discovered, which is crucial for cancer research.
Collapse
Affiliation(s)
- Chunyu Wang
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Ning Zhao
- School of Life Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Kai Sun
- Thoracic Surgery Department, General Hospital of Heilongjiang Province Land Reclamation Bureau, Harbin, China
| | - Ying Zhang
- Department of Pharmacy, General Hospital of Heilongjiang Province Land Reclamation Bureau, Harbin, China
| |
Collapse
|
4
|
Guo F, Zou Q, Yang G, Wang D, Tang J, Xu J. Identifying protein-protein interface via a novel multi-scale local sequence and structural representation. BMC Bioinformatics 2019; 20:483. [PMID: 31874604 PMCID: PMC6929278 DOI: 10.1186/s12859-019-3048-2] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2019] [Accepted: 08/21/2019] [Indexed: 12/23/2022] Open
Abstract
Background Protein-protein interaction plays a key role in a multitude of biological processes, such as signal transduction, de novo drug design, immune responses, and enzymatic activities. Gaining insights of various binding abilities can deepen our understanding of the interaction. It is of great interest to understand how proteins in a complex interact with each other. Many efficient methods have been developed for identifying protein-protein interface. Results In this paper, we obtain the local information on protein-protein interface, through multi-scale local average block and hexagon structure construction. Given a pair of proteins, we use a trained support vector regression (SVR) model to select best configurations. On Benchmark v4.0, our method achieves average Irmsd value of 3.28Å and overall Fnat value of 63%, which improves upon Irmsd of 3.89Å and Fnat of 49% for ZRANK, and Irmsd of 3.99Å and Fnat of 46% for ClusPro. On CAPRI targets, our method achieves average Irmsd value of 3.45Å and overall Fnat value of 46%, which improves upon Irmsd of 4.18Å and Fnat of 40% for ZRANK, and Irmsd of 5.12Å and Fnat of 32% for ClusPro. The success rates by our method, FRODOCK 2.0, InterEvDock and SnapDock on Benchmark v4.0 are 41.5%, 29.0%, 29.4% and 37.0%, respectively. Conclusion Experiments show that our method performs better than some state-of-the-art methods, based on the prediction quality improved in terms of CAPRI evaluation criteria. All these results demonstrate that our method is a valuable technological tool for identifying protein-protein interface.
Collapse
Affiliation(s)
- Fei Guo
- College of Intelligence and Computing, Tianjin University, Tianjin, People's Republic of China.
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, People's Republic of China
| | - Guang Yang
- School of Economics, Nankai University, Tianjin, People's Republic of China
| | - Dan Wang
- Department of Computer Science, City University of Hong Kong, Kowloon Tong, Hong Kong
| | - Jijun Tang
- College of Intelligence and Computing, Tianjin University, Tianjin, People's Republic of China.,Department of Computer Science and Engineering, University of South Carolina, Columbia, USA
| | - Junhai Xu
- College of Intelligence and Computing, Tianjin University, Tianjin, People's Republic of China
| |
Collapse
|
5
|
Nilofer C, Sukhwal A, Mohanapriya A, Sakharkar MK, Kangueane P. Small protein-protein interfaces rich in electrostatic are often linked to regulatory function. J Biomol Struct Dyn 2019; 38:3260-3279. [PMID: 31495333 DOI: 10.1080/07391102.2019.1657040] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
Protein-protein interaction (PPI) is critical for several biological functions in living cells through the formation of an interface. Therefore, it is of interest to characterize protein-protein interfaces using an updated non-redundant structural dataset of 2557 homo (identical subunits) and 393 hetero (different subunits) dimer protein complexes determined by X-ray crystallography. We analyzed the interfaces using van der Waals (vdW), hydrogen bonding and electrostatic energies. Results show that on average homo and hetero interfaces are similar. Hence, we further grouped the 2950 interfaces based on percentage vdW to total energies into dominant (≥60%) and sub-dominant (<60%) vdW interfaces. Majority (92%) of interfaces have dominant vdW energy with large interface size (146 ± 87 (homo) and 137 ± 76 (hetero) residues) and interface area (1622 ± 1135 Å2 (homo) and 1579 ± 1060 Å2 (hetero)). However, a proportion (8%) of interfaces have sub-dominant vdW energy with small interface size (85 ± 46 (homo) and 88 ± 36 (hetero) residues) and interface area (823 ± 538 Å2 (homo) and 881 ± 377 Å2 (hetero)). It is found that large interfaces have two-fold more interface area and interface size than small interfaces with increasing hydrogen bonding energy to interface size. However, small interfaces have three-fold more electrostatics energy than large interfaces with increasing electrostatics to interface size. Thus, 8% of complexes having small interfaces with limited interface area and sub-dominant vdW energy are rich in electrostatics. It is interesting to observe that complexes having small interfaces are often associated with regulatory function. Hence, the observed structural features with known molecular function provide insights for the better understanding of PPI.Communicated by Ramaswamy H. Sarma.
Collapse
Affiliation(s)
- Christina Nilofer
- Biomedical Informatics (P) Ltd., Pondicherry, India.,School of Biosciences & Technology, VIT University, Vellore, Tamil Nadu, India
| | - Anshul Sukhwal
- National Centre for Biological Sciences (NCBS), Bangalore, India
| | | | | | | |
Collapse
|
6
|
Identification of amyloidogenic peptides via optimized integrated features space based on physicochemical properties and PSSM. Anal Biochem 2019; 583:113362. [PMID: 31310738 DOI: 10.1016/j.ab.2019.113362] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2019] [Revised: 07/09/2019] [Accepted: 07/12/2019] [Indexed: 01/08/2023]
Abstract
At present, the identification of amyloid becomes more and more essential and meaningful. Because its mis-aggregation may cause some diseases such as Alzheimer's and Parkinson's diseases. This paper focus on the classification of amyloidogenic peptides and a novel feature representation called PhyAve_PSSMDwt is proposed. It includes two parts. One is based on physicochemical properties involving hydrophilicity, hydrophobicity, aggregation tendency, packing density and H-bonding which extracts 15-dimensional features in total. And the other is 60-dimensional features through recursive feature elimination from PSSM by discrete wavelet transform. In this period, sliding window is introduced to reconstruct PSSM so that the evolutionary information of short sequences can still be extracted. At last, the support vector machine is adopted as a classifier. The experimental result on Pep424 dataset shows that PSSM's information makes a great contribution on performance. And compared with other existing methods, our results after cross-validation increase by 3.1%, 3.3%, 0.136 and 0.007 in accuracy, specificity, Matthew's correlation coefficient and AUC value, respectively. It indicates that our method is effective and competitive.
Collapse
|
7
|
Zhang Z, Xu J, Tang J, Zou Q, Guo F. Diagnosis of Brain Diseases via Multi-Scale Time-Series Model. Front Neurosci 2019; 13:197. [PMID: 30930733 PMCID: PMC6427090 DOI: 10.3389/fnins.2019.00197] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2019] [Accepted: 02/19/2019] [Indexed: 01/09/2023] Open
Abstract
The functional magnetic resonance imaging (fMRI) data and brain network analysis have been widely applied to automated diagnosis of neural diseases or brain diseases. The fMRI time series data not only contains specific numerical information, but also involves rich dynamic temporal information, those previous graph theory approaches focus on local topology structure and lose contextual information and global fluctuation information. Here, we propose a novel multi-scale functional connectivity for identifying the brain disease via fMRI data. We calculate the discrete probability distribution of co-activity between different brain regions with various intervals. Also, we consider nonsynchronous information under different time dimensions, for analyzing the contextual information in the fMRI data. Therefore, our proposed method can be applied to more disease diagnosis and other fMRI data, particularly automated diagnosis of neural diseases or brain diseases. Finally, we adopt Support Vector Machine (SVM) on our proposed time-series features, which can be applied to do the brain disease classification and even deal with all time-series data. Experimental results verify the effectiveness of our proposed method compared with other outstanding approaches on Alzheimer's Disease Neuroimaging Initiative (ADNI) dataset and Major Depressive Disorder (MDD) dataset. Therefore, we provide an efficient system via a novel perspective to study brain networks.
Collapse
Affiliation(s)
- Zehua Zhang
- School of Computer Science and Technology, College of Intelligence and Computing, Tianjin University, Tianjin, China
| | - Junhai Xu
- School of Artificial Intelligence, College of Intelligence and Computing, Tianjin University, Tianjin, China
| | - Jijun Tang
- School of Computer Science and Technology, College of Intelligence and Computing, Tianjin University, Tianjin, China.,Department of Computer Science and Engineering, University of South Carolina, Columbia, SC, United States
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
| | - Fei Guo
- School of Computer Science and Technology, College of Intelligence and Computing, Tianjin University, Tianjin, China
| |
Collapse
|
8
|
Prediction of protein-protein interactions by label propagation with protein evolutionary and chemical information derived from heterogeneous network. J Theor Biol 2017. [DOI: 10.1016/j.jtbi.2017.06.003] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
|
9
|
Jiao X, Ranganathan S. Prediction of interface residue based on the features of residue interaction network. J Theor Biol 2017; 432:49-54. [PMID: 28818468 DOI: 10.1016/j.jtbi.2017.08.014] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2017] [Revised: 07/31/2017] [Accepted: 08/13/2017] [Indexed: 10/19/2022]
Abstract
Protein-protein interaction plays a crucial role in the cellular biological processes. Interface prediction can improve our understanding of the molecular mechanisms of the related processes and functions. In this work, we propose a classification method to recognize the interface residue based on the features of a weighted residue interaction network. The random forest algorithm is used for the prediction and 16 network parameters and the B-factor are acting as the element of the input feature vector. Compared with other similar work, the method is feasible and effective. The relative importance of these features also be analyzed to identify the key feature for the prediction. Some biological meaning of the important feature is explained. The results of this work can be used for the related work about the structure-function relationship analysis via a residue interaction network model.
Collapse
Affiliation(s)
- Xiong Jiao
- Institute of Applied Mechanics and Biomedical Engineering, College of Mechanics, Taiyuan University of Technology, Taiyuan 030024, China; Department of Chemistry and Biomolecular Sciences, Macquarie University, Sydney, New South Wales 2109, Australia.
| | - Shoba Ranganathan
- Department of Chemistry and Biomolecular Sciences, Macquarie University, Sydney, New South Wales 2109, Australia
| |
Collapse
|