1
|
Xin XH, Zhang YY, Gao CQ, Min H, Wang L, Du PF. SGII: Systematic Identification of Essential lncRNAs in Mouse and Human Genome With lncRNA-Protein-Protein Heterogeneous Interaction Network. Front Genet 2022; 13:864564. [PMID: 35386279 PMCID: PMC8978670 DOI: 10.3389/fgene.2022.864564] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2022] [Accepted: 03/02/2022] [Indexed: 12/25/2022] Open
Abstract
Long noncoding RNAs (lncRNAs) play important roles in a variety of biological processes. Knocking out or knocking down some lncRNA genes can lead to death or infertility. These lncRNAs are called essential lncRNAs. Identifying the essential lncRNA is of importance for complex disease diagnosis and treatments. However, experimental methods for identifying essential lncRNAs are always costly and time consuming. Therefore, computational methods can be considered as an alternative approach. We propose a method to identify essential lncRNAs by combining network centrality measures and lncRNA sequence information. By constructing a lncRNA-protein-protein interaction network, we measure the essentiality of lncRNAs from their role in the network and their sequence together. We name our method as the systematic gene importance index (SGII). As far as we can tell, this is the first attempt to identify essential lncRNAs by combining sequence and network information together. The results of our method indicated that essential lncRNAs have similar roles in the LPPI network as the essential coding genes in the PPI network. Another encouraging observation is that the network information can significantly boost the predictive performance of sequence-based method. All source code and dataset of SGII have been deposited in a GitHub repository (https://github.com/ninglolo/SGII).
Collapse
Affiliation(s)
- Xiao-Hong Xin
- College of Intelligence and Computing, Tianjin University, Tianjin, China
| | - Ying-Ying Zhang
- College of Intelligence and Computing, Tianjin University, Tianjin, China
| | - Chu-Qiao Gao
- College of Intelligence and Computing, Tianjin University, Tianjin, China
| | - Hui Min
- College of Intelligence and Computing, Tianjin University, Tianjin, China
| | - Likun Wang
- Institute of Systems Biomedicine, Department of Pathology, School of Basic Medical Sciences, Beijing Key Laboratory of Tumor Systems Biology, Peking-Tsinghua Center of Life Sciences, Peking University Health Science Center, Beijing, China
| | - Pu-Feng Du
- College of Intelligence and Computing, Tianjin University, Tianjin, China
| |
Collapse
|
2
|
Li S, Zhang Z, Li X, Tan Y, Wang L, Chen Z. An iteration model for identifying essential proteins by combining comprehensive PPI network with biological information. BMC Bioinformatics 2021; 22:430. [PMID: 34496745 PMCID: PMC8425031 DOI: 10.1186/s12859-021-04300-7] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2020] [Accepted: 07/08/2021] [Indexed: 11/10/2022] Open
Abstract
Background Essential proteins have great impacts on cell survival and development, and played important roles in disease analysis and new drug design. However, since it is inefficient and costly to identify essential proteins by using biological experiments, then there is an urgent need for automated and accurate detection methods. In recent years, the recognition of essential proteins in protein interaction networks (PPI) has become a research hotspot, and many computational models for predicting essential proteins have been proposed successively. Results In order to achieve higher prediction performance, in this paper, a new prediction model called TGSO is proposed. In TGSO, a protein aggregation degree network is constructed first by adopting the node density measurement method for complex networks. And simultaneously, a protein co-expression interactive network is constructed by combining the gene expression information with the network connectivity, and a protein co-localization interaction network is constructed based on the subcellular localization data. And then, through integrating these three kinds of newly constructed networks, a comprehensive protein–protein interaction network will be obtained. Finally, based on the homology information, scores can be calculated out iteratively for different proteins, which can be utilized to estimate the importance of proteins effectively. Moreover, in order to evaluate the identification performance of TGSO, we have compared TGSO with 13 different latest competitive methods based on three kinds of yeast databases. And experimental results show that TGSO can achieve identification accuracies of 94%, 82% and 72% out of the top 1%, 5% and 10% candidate proteins respectively, which are to some degree superior to these state-of-the-art competitive models. Conclusions We constructed a comprehensive interactive network based on multi-source data to reduce the noise and errors in the initial PPI, and combined with iterative methods to improve the accuracy of necessary protein prediction, and means that TGSO may be conducive to the future development of essential protein recognition as well.
Collapse
Affiliation(s)
- Shiyuan Li
- College of Computer Engineering and Applied Mathematics, Changsha University, Changsha, 410022, China.,Hunan Province Key Laboratory of Industrial Internet Technology and Security, Changsha University, Changsha, 410022, China
| | - Zhen Zhang
- College of Electronic Information and Electrical Engineering, Changsha University, Changsha, 410022, China
| | - Xueyong Li
- College of Computer Engineering and Applied Mathematics, Changsha University, Changsha, 410022, China.,Hunan Province Key Laboratory of Industrial Internet Technology and Security, Changsha University, Changsha, 410022, China
| | - Yihong Tan
- College of Computer Engineering and Applied Mathematics, Changsha University, Changsha, 410022, China. .,Hunan Province Key Laboratory of Industrial Internet Technology and Security, Changsha University, Changsha, 410022, China.
| | - Lei Wang
- College of Computer Engineering and Applied Mathematics, Changsha University, Changsha, 410022, China.,Hunan Province Key Laboratory of Industrial Internet Technology and Security, Changsha University, Changsha, 410022, China
| | - Zhiping Chen
- College of Computer Engineering and Applied Mathematics, Changsha University, Changsha, 410022, China. .,Hunan Province Key Laboratory of Industrial Internet Technology and Security, Changsha University, Changsha, 410022, China.
| |
Collapse
|
3
|
Thummadi NB, T M, Vindal V, P M. Prioritizing the candidate genes related to cervical cancer using the moment of inertia tensor. Proteins 2021; 90:363-371. [PMID: 34468998 DOI: 10.1002/prot.26226] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2021] [Revised: 08/07/2021] [Accepted: 08/16/2021] [Indexed: 12/24/2022]
Abstract
It is well known that cervical cancer poses the fourth most malignancy threat to women worldwide among all cancer types. There is a tremendous improvement in realizing the underlying molecular associations in cervical cancer. Several studies reported pieces of evidence for the involvement of various genes in the disease progression. However, with the ever-evolving bioinformatics tools, there has been an upsurge in predicting numerous genes responsible for cervical cancer progression and making it highly complex to target the genes for further evaluation. In this article, we prioritized the candidate genes based on the sequence similarity analysis with known cancer genes. For this purpose, we used the concept of the moment of inertia tensor, which reveals the similarities between the protein sequences more efficiently. Tensor for moment of inertia explores the similarity of the protein sequences based on the physicochemical properties of amino acids. From our analysis, we obtained 14 candidate cervical cancer genes, which are highly similar to known cervical cancer genes. Further, we analyzed the GO terms and prioritized these genes based on the number of hits with biological process, molecular functions, and their involvement in KEGG pathways. We also discussed the evidence-based involvement of the prioritized genes in other cancers and listed the available drugs for those genes.
Collapse
Affiliation(s)
- Neelesh Babu Thummadi
- Department of Animal Biology, School of Life Sciences, University of Hyderabad, Gachibowli, Hyderabad, India
| | - Mallikarjuna T
- Department of Biotechnology and Bioinformatics, School of Life Sciences, University of Hyderabad, Gachibowli, Hyderabad, India
| | - Vaibhav Vindal
- Department of Biotechnology and Bioinformatics, School of Life Sciences, University of Hyderabad, Gachibowli, Hyderabad, India
| | - Manimaran P
- School of Physics, University of Hyderabad, Gachibowli, Hyderabad, India
| |
Collapse
|
4
|
He X, Kuang L, Chen Z, Tan Y, Wang L. Method for Identifying Essential Proteins by Key Features of Proteins in a Novel Protein-Domain Network. Front Genet 2021; 12:708162. [PMID: 34267785 PMCID: PMC8276041 DOI: 10.3389/fgene.2021.708162] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2021] [Accepted: 05/31/2021] [Indexed: 11/21/2022] Open
Abstract
In recent years, due to low accuracy and high costs of traditional biological experiments, more and more computational models have been proposed successively to infer potential essential proteins. In this paper, a novel prediction method called KFPM is proposed, in which, a novel protein-domain heterogeneous network is established first by combining known protein-protein interactions with known associations between proteins and domains. Next, based on key topological characteristics extracted from the newly constructed protein-domain network and functional characteristics extracted from multiple biological information of proteins, a new computational method is designed to effectively integrate multiple biological features to infer potential essential proteins based on an improved PageRank algorithm. Finally, in order to evaluate the performance of KFPM, we compared it with 13 state-of-the-art prediction methods, experimental results show that, among the top 1, 5, and 10% of candidate proteins predicted by KFPM, the prediction accuracy can achieve 96.08, 83.14, and 70.59%, respectively, which significantly outperform all these 13 competitive methods. It means that KFPM may be a meaningful tool for prediction of potential essential proteins in the future.
Collapse
Affiliation(s)
- Xin He
- College of Computer, Xiangtan University, Xiangtan, China
| | - Linai Kuang
- College of Computer, Xiangtan University, Xiangtan, China
| | - Zhiping Chen
- College of Computer Engineering & Applied Mathematics, Changsha University, Changsha, China
| | - Yihong Tan
- College of Computer Engineering & Applied Mathematics, Changsha University, Changsha, China
| | - Lei Wang
- College of Computer, Xiangtan University, Xiangtan, China
- College of Computer Engineering & Applied Mathematics, Changsha University, Changsha, China
| |
Collapse
|