1
|
Wu S, Zhou Y, Dai L, Yang A, Qiao J. Assembly of functional microbial ecosystems: from molecular circuits to communities. FEMS Microbiol Rev 2024; 48:fuae026. [PMID: 39496507 PMCID: PMC11585282 DOI: 10.1093/femsre/fuae026] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2024] [Revised: 08/15/2024] [Accepted: 10/17/2024] [Indexed: 11/06/2024] Open
Abstract
Microbes compete and cooperate with each other via a variety of chemicals and circuits. Recently, to decipher, simulate, or reconstruct microbial communities, many researches have been engaged in engineering microbiomes with bottom-up synthetic biology approaches for diverse applications. However, they have been separately focused on individual perspectives including genetic circuits, communications tools, microbiome engineering, or promising applications. The strategies for coordinating microbial ecosystems based on different regulation circuits have not been systematically summarized, which calls for a more comprehensive framework for the assembly of microbial communities. In this review, we summarize diverse cross-talk and orthogonal regulation modules for de novo bottom-up assembling functional microbial ecosystems, thus promoting further consortia-based applications. First, we review the cross-talk communication-based regulations among various microbial communities from intra-species and inter-species aspects. Then, orthogonal regulations are summarized at metabolites, transcription, translation, and post-translation levels, respectively. Furthermore, to give more details for better design and optimize various microbial ecosystems, we propose a more comprehensive design-build-test-learn procedure including function specification, chassis selection, interaction design, system build, performance test, modeling analysis, and global optimization. Finally, current challenges and opportunities are discussed for the further development and application of microbial ecosystems.
Collapse
Affiliation(s)
- Shengbo Wu
- School of Chemical Engineering and Technology, Tianjin University, Tianjin, 300072, China
- Zhejiang Institute of Tianjin University, Shaoxing, 312300, China
| | - Yongsheng Zhou
- School of Chemical Engineering and Technology, Tianjin University, Tianjin, 300072, China
- Zhejiang Institute of Tianjin University, Shaoxing, 312300, China
| | - Lei Dai
- CAS Key Laboratory of Quantitative Engineering Biology, Shenzhen Institute of Synthetic Biology, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, China
| | - Aidong Yang
- Department of Engineering Science, University of Oxford, Oxford, OX1 3PJ, UK
| | - Jianjun Qiao
- School of Chemical Engineering and Technology, Tianjin University, Tianjin, 300072, China
- Zhejiang Institute of Tianjin University, Shaoxing, 312300, China
| |
Collapse
|
2
|
Deng L, Jin Y, Zheng X, Yang Y, Feng Y, Zhou H, Zeng Q. Pharmacological and toxicological characteristics of baicalin in preventing spontaneous abortion and recurrent pregnancy loss: A multi-level critical review. Heliyon 2024; 10:e38633. [PMID: 39640688 PMCID: PMC11619987 DOI: 10.1016/j.heliyon.2024.e38633] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2024] [Revised: 09/15/2024] [Accepted: 09/26/2024] [Indexed: 12/07/2024] Open
Abstract
Relevance Spontaneous abortion (SAB) and recurrent pregnancy loss (RPL) occur alone or concurrently with increasing incidences recently. Scutellaria baicalensis Georgi (SBG) has been used to prevent pregnancy loss for thousands of years, which is recognized as a "pregnancy-stabilizing herb" in ancient China. Baicalin (BA) and its metabolite baicalein (BE) are the main bioactive flavonoids in the root of SBG. Methods In this study, we focused particularly on the metabolism, toxicology, and pharmacological effects of BA at the maternal-fetal interface based on the biological process prediction by network pharmacology. Focused on the systematic review of BA's regulatory mechanisms of immune homeostasis, cell proliferation and invasion, programmed cell death, inflammatory microenvironment, angiogenesis, oxidative stress and vascular remodeling at the maternal-fetal interface, it was found that BA exerts its biological effects to treat SAB and RPL through multiple perspectives and targets. We also critically elucidated the limitations of using BA from a clinical perspective. Results We explored the bioavailability, targeting and efficacy of BA from a new perspective (optimization of the BA delivery system, organoid studies based on BA, potential effects of BA on uterine flora and bioactive components). Finally, we propose a multimodal stereo sequencing study of biologically active components based on pathological dynamics incorporating single-cell RNA sequencing, spatially resolved transcriptomics, and single-cell multimodal omics to delve deeper into the fetal-preserving mechanism of BA and to promote the application of BA in clinical practice.
Collapse
Affiliation(s)
- Linwen Deng
- Department of Gynecology, Hospital of Chengdu University of Traditional Chinese Medicine, Sichuan, China
| | - Yue Jin
- Combined Traditional Chinese Medicine and Western Medicine Clinics, Hospital of Chengdu University of Traditional Chinese Medicine, Sichuan, China
| | - Xiaoyan Zheng
- College of Acupuncture and Massage, Chengdu University of Traditional Chinese Medicine, Sichuan, China
| | - Yi Yang
- Combined Traditional Chinese Medicine and Western Medicine Clinics, Mianyang Central Hospital, Sichuan, China
| | - Yong Feng
- Combined Traditional Chinese Medicine and Western Medicine Clinics, Mianyang Central Hospital, Sichuan, China
| | - Hang Zhou
- School of Basic Medical Sciences, Chengdu University of Traditional Chinese Medicine, Sichuan, China
| | - Qian Zeng
- Department of Gynecology, Hospital of Chengdu University of Traditional Chinese Medicine, Sichuan, China
| |
Collapse
|
3
|
Yi Y, Liu G, Li Y, Wang C, Zhang B, Lou H, Yu S. Baicalin Ameliorates Depression-like Behaviors via Inhibiting Neuroinflammation and Apoptosis in Mice. Int J Mol Sci 2024; 25:10259. [PMID: 39408591 PMCID: PMC11476789 DOI: 10.3390/ijms251910259] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2024] [Revised: 09/09/2024] [Accepted: 09/22/2024] [Indexed: 10/20/2024] Open
Abstract
Depression is a common neuropsychiatric disease which brings an increasing burden to all countries globally. Baicalin, a flavonoid extracted from the dried roots of Scutellaria, has been reported to exert anti-inflammatory, antioxidant, and neuroprotective effects in the treatment of depression. However, the potential biological mechanisms underlying its antidepressant effect are still unclear. In the present study, we conducted extensive research on the potential mechanisms of baicalin's antidepressant effect using the methods of network pharmacology, including overlapped terms-based analysis, protein-protein interaction (PPI) network topology analysis, and enrichment analysis. Moreover, these results were further verified through molecular docking, weighted gene co-expression network analysis (WGCNA), differential gene expression analysis, and subsequent animal experiments. We identified forty-one genes as the targets of baicalin in the treatment of depression, among which AKT1, IL6, TP53, IL1B, and CASP3 have higher centrality in the more core position. Meanwhile, the roles of peripheral genes derived from direct potential targets were also observed. Our study suggested that biological processes, such as inflammatory reaction, apoptosis, and oxidative stress, may be involved in the therapeutic process of baicalin on depression. These mechanisms were validated at the level of structure, gene, protein, and signaling pathway in the present study. Taken together, these findings propose a new perspective on the potential mechanisms underlying baicalin's antidepressant effect, and also provide a new basis and clarified perspective for its clinical application.
Collapse
Affiliation(s)
- Yuhang Yi
- Department of Physiology, School of Basic Medical Sciences, Cheeloo College of Medicine, Shandong University, Jinan 250012, China; (Y.Y.); (G.L.); (Y.L.); (C.W.)
| | - Guiyu Liu
- Department of Physiology, School of Basic Medical Sciences, Cheeloo College of Medicine, Shandong University, Jinan 250012, China; (Y.Y.); (G.L.); (Y.L.); (C.W.)
| | - Ye Li
- Department of Physiology, School of Basic Medical Sciences, Cheeloo College of Medicine, Shandong University, Jinan 250012, China; (Y.Y.); (G.L.); (Y.L.); (C.W.)
| | - Changmin Wang
- Department of Physiology, School of Basic Medical Sciences, Cheeloo College of Medicine, Shandong University, Jinan 250012, China; (Y.Y.); (G.L.); (Y.L.); (C.W.)
| | - Bin Zhang
- Department of Pharmacology, School of Basic Medical Sciences, Cheeloo College of Medicine, Shandong University, Jinan 250012, China; (B.Z.); (H.L.)
| | - Haiyan Lou
- Department of Pharmacology, School of Basic Medical Sciences, Cheeloo College of Medicine, Shandong University, Jinan 250012, China; (B.Z.); (H.L.)
| | - Shuyan Yu
- Department of Physiology, School of Basic Medical Sciences, Cheeloo College of Medicine, Shandong University, Jinan 250012, China; (Y.Y.); (G.L.); (Y.L.); (C.W.)
- Shandong Provincial Key Laboratory of Mental Disorders, School of Basic Medical Sciences, Jinan 250012, China
| |
Collapse
|
4
|
Chen B, Zhang J, Shao C, Bian J, Kang R, Shang X. QIGTD: identifying critical genes in the evolution of lung adenocarcinoma with tensor decomposition. BioData Min 2024; 17:30. [PMID: 39232802 PMCID: PMC11376055 DOI: 10.1186/s13040-024-00386-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2024] [Accepted: 08/28/2024] [Indexed: 09/06/2024] Open
Abstract
BACKGROUND Identifying critical genes is important for understanding the pathogenesis of complex diseases. Traditional studies typically comparing the change of biomecules between normal and disease samples or detecting important vertices from a single static biomolecular network, which often overlook the dynamic changes that occur between different disease stages. However, investigating temporal changes in biomolecular networks and identifying critical genes is critical for understanding the occurrence and development of diseases. METHODS A novel method called Quantifying Importance of Genes with Tensor Decomposition (QIGTD) was proposed in this study. It first constructs a time series network by integrating both the intra and inter temporal network information, which preserving connections between networks at adjacent stages according to the local similarities. A tensor is employed to describe the connections of this time series network, and a 3-order tensor decomposition method was proposed to capture both the topological information of each network snapshot and the time series characteristics of the whole network. QIGTD is also a learning-free and efficient method that can be applied to datasets with a small number of samples. RESULTS The effectiveness of QIGTD was evaluated using lung adenocarcinoma (LUAD) datasets and three state-of-the-art methods: T-degree, T-closeness, and T-betweenness were employed as benchmark methods. Numerical experimental results demonstrate that QIGTD outperforms these methods in terms of the indices of both precision and mAP. Notably, out of the top 50 genes, 29 have been verified to be highly related to LUAD according to the DisGeNET Database, and 36 are significantly enriched in LUAD related Gene Ontology (GO) terms, including nuclear division, mitotic nuclear division, chromosome segregation, organelle fission, and mitotic sister chromatid segregation. CONCLUSION In conclusion, QIGTD effectively captures the temporal changes in gene networks and identifies critical genes. It provides a valuable tool for studying temporal dynamics in biological networks and can aid in understanding the underlying mechanisms of diseases such as LUAD.
Collapse
Affiliation(s)
- Bolin Chen
- School of Computer Science, Northwestern Polytechnical University, Xi'an, 710012, China.
- Key Laboratory of Big Data Storage and Management, Northwestern Polytechnical University, Ministry of Industry and Information Technology, Xi'an, 710012, China.
| | - Jinlei Zhang
- School of Computer Science, Northwestern Polytechnical University, Xi'an, 710012, China
| | - Ci Shao
- School of Computer Science, Northwestern Polytechnical University, Xi'an, 710012, China
| | - Jun Bian
- Department of General Surgery, Xi'an Children's Hosptial, Xi'an Jiaotong University Affiliated Children's Hosptial, Xi'an, 710003, China
| | - Ruiming Kang
- Rewise (Hangzhou) Information Technology Co., LTD, Hangzhou, 310000, China
| | - Xuequn Shang
- School of Computer Science, Northwestern Polytechnical University, Xi'an, 710012, China
- Key Laboratory of Big Data Storage and Management, Northwestern Polytechnical University, Ministry of Industry and Information Technology, Xi'an, 710012, China
| |
Collapse
|
5
|
Hsiao YC, Dutta A. Network Modeling and Control of Dynamic Disease Pathways, Review and Perspectives. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2024; 21:1211-1230. [PMID: 38498762 DOI: 10.1109/tcbb.2024.3378155] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/20/2024]
Abstract
Dynamic disease pathways are a combination of complex dynamical processes among bio-molecules in a cell that leads to diseases. Network modeling of disease pathways considers disease-related bio-molecules (e.g. DNA, RNA, transcription factors, enzymes, proteins, and metabolites) and their interaction (e.g. DNA methylation, histone modification, alternative splicing, and protein modification) to study disease progression and predict therapeutic responses. These bio-molecules and their interactions are the basic elements in the study of the misregulation in the disease-related gene expression that lead to abnormal cellular responses. Gene regulatory networks, cell signaling networks, and metabolic networks are the three major types of intracellular networks for the study of the cellular responses elicited from extracellular signals. The disease-related cellular responses can be prevented or regulated by designing control strategies to manipulate these extracellular or other intracellular signals. The paper reviews the regulatory mechanisms, the dynamic models, and the control strategies for each intracellular network. The applications, limitations and the prospective for modeling and control are also discussed.
Collapse
|
6
|
Wang Y, Yang Y, Liang C, Zhang H. Exploring the Roles of Key Mediators IKBKE and HSPA1A in Alzheimer's Disease and Hepatocellular Carcinoma through Bioinformatics Analysis. Int J Mol Sci 2024; 25:6934. [PMID: 39000042 PMCID: PMC11241202 DOI: 10.3390/ijms25136934] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2024] [Revised: 06/18/2024] [Accepted: 06/21/2024] [Indexed: 07/14/2024] Open
Abstract
Recent studies have hinted at a potential link between Alzheimer's Disease (AD) and cancer. Thus, our study focused on finding genes common to AD and Liver Hepatocellular Carcinoma (LIHC), assessing their promise as diagnostic indicators and guiding future treatment approaches for both conditions. Our research utilized a broad methodology, including differential gene expression analysis, Weighted Gene Co-expression Network Analysis (WGCNA), gene enrichment analysis, Receiver Operating Characteristic (ROC) curves, and Kaplan-Meier plots, supplemented with immunohistochemistry data from the Human Protein Atlas (HPA) and machine learning techniques, to identify critical genes and significant pathways shared between AD and LIHC. Through differential gene expression analysis, WGCNA, and machine learning methods, we identified nine key genes associated with AD, which served as entry points for LIHC analysis. Subsequent analyses revealed IKBKE and HSPA1A as shared pivotal genes in patients with AD and LIHC, suggesting these genes as potential targets for intervention in both conditions. Our study indicates that IKBKE and HSPA1A could influence the onset and progression of AD and LIHC by modulating the infiltration levels of immune cells. This lays a foundation for future research into targeted therapies based on their shared mechanisms.
Collapse
Affiliation(s)
| | | | | | - Hailin Zhang
- Department of Pharmacology, The Key Laboratory of Neural and Vascular Biology, Ministry of Education, The Key Laboratory of New Drug Pharmacology and Toxicology, Collaborative Innovation Center of Hebei Province for Mechanism, Diagnosis and Treatment of Neuropsychiatric Diseases, Hebei Medical University, Shijiazhuang 050017, China; (Y.W.); (Y.Y.); (C.L.)
| |
Collapse
|
7
|
Rout T, Mohapatra A, Kar M. A systematic review of graph-based explorations of PPI networks: methods, resources, and best practices. NETWORK MODELING ANALYSIS IN HEALTH INFORMATICS AND BIOINFORMATICS 2024; 13:29. [DOI: 10.1007/s13721-024-00467-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/07/2023] [Revised: 04/09/2024] [Accepted: 05/16/2024] [Indexed: 01/03/2025]
|
8
|
Osmanoglu Ö, Gupta SK, Almasi A, Yagci S, Srivastava M, Araujo GHM, Nagy Z, Balkenhol J, Dandekar T. Signaling network analysis reveals fostamatinib as a potential drug to control platelet hyperactivation during SARS-CoV-2 infection. Front Immunol 2023; 14:1285345. [PMID: 38187394 PMCID: PMC10768010 DOI: 10.3389/fimmu.2023.1285345] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2023] [Accepted: 12/06/2023] [Indexed: 01/09/2024] Open
Abstract
Introduction Pro-thrombotic events are one of the prevalent causes of intensive care unit (ICU) admissions among COVID-19 patients, although the signaling events in the stimulated platelets are still unclear. Methods We conducted a comparative analysis of platelet transcriptome data from healthy donors, ICU, and non-ICU COVID-19 patients to elucidate these mechanisms. To surpass previous analyses, we constructed models of involved networks and control cascades by integrating a global human signaling network with transcriptome data. We investigated the control of platelet hyperactivation and the specific proteins involved. Results Our study revealed that control of the platelet network in ICU patients is significantly higher than in non-ICU patients. Non-ICU patients require control over fewer proteins for managing platelet hyperactivity compared to ICU patients. Identification of indispensable proteins highlighted key subnetworks, that are targetable for system control in COVID-19-related platelet hyperactivity. We scrutinized FDA-approved drugs targeting indispensable proteins and identified fostamatinib as a potent candidate for preventing thrombosis in COVID-19 patients. Discussion Our findings shed light on how SARS-CoV-2 efficiently affects host platelets by targeting indispensable and critical proteins involved in the control of platelet activity. We evaluated several drugs for specific control of platelet hyperactivity in ICU patients suffering from platelet hyperactivation. The focus of our approach is repurposing existing drugs for optimal control over the signaling network responsible for platelet hyperactivity in COVID-19 patients. Our study offers specific pharmacological recommendations, with drug prioritization tailored to the distinct network states observed in each patient condition. Interactive networks and detailed results can be accessed at https://fostamatinib.bioinfo-wuerz.eu/.
Collapse
Affiliation(s)
- Özge Osmanoglu
- Functional Genomics & Systems Biology Group, Department of Bioinformatics, Biocenter, University of Wuerzburg, Wuerzburg, Germany
| | - Shishir K. Gupta
- Evolutionary Genomics Group, Center for Computational and Theoretical Biology, University of Würzburg, Würzburg, Germany
- Institute of Botany, Heinrich Heine University, Düsseldorf, Germany
| | - Anna Almasi
- Functional Genomics & Systems Biology Group, Department of Bioinformatics, Biocenter, University of Wuerzburg, Wuerzburg, Germany
| | - Seray Yagci
- Functional Genomics & Systems Biology Group, Department of Bioinformatics, Biocenter, University of Wuerzburg, Wuerzburg, Germany
| | - Mugdha Srivastava
- Core Unit Systems Medicine, University of Wuerzburg, Wuerzburg, Germany
- Algorithmic Bioinformatics, Department of Computer Science, Heinrich Heine University, Düsseldorf, Germany
| | - Gabriel H. M. Araujo
- University Hospital Würzburg, Institute of Experimental Biomedicine, Würzburg, Germany
| | - Zoltan Nagy
- University Hospital Würzburg, Institute of Experimental Biomedicine, Würzburg, Germany
| | - Johannes Balkenhol
- Functional Genomics & Systems Biology Group, Department of Bioinformatics, Biocenter, University of Wuerzburg, Wuerzburg, Germany
- Chair of Molecular Microscopy, Rudolf Virchow Center for Integrative and Translation Bioimaging, University of Würzburg, Würzburg, Germany
| | - Thomas Dandekar
- Functional Genomics & Systems Biology Group, Department of Bioinformatics, Biocenter, University of Wuerzburg, Wuerzburg, Germany
- European Molecular Biology Laboratory (EMBL) Heidelberg, BioComputing Unit, Heidelberg, Germany
| |
Collapse
|
9
|
Khojasteh H, Pirgazi J, Ghanbari Sorkhi A. Improving prediction of drug-target interactions based on fusing multiple features with data balancing and feature selection techniques. PLoS One 2023; 18:e0288173. [PMID: 37535616 PMCID: PMC10399861 DOI: 10.1371/journal.pone.0288173] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2023] [Accepted: 06/21/2023] [Indexed: 08/05/2023] Open
Abstract
Drug discovery relies on predicting drug-target interaction (DTI), which is an important challenging task. The purpose of DTI is to identify the interaction between drug chemical compounds and protein targets. Traditional wet lab experiments are time-consuming and expensive, that's why in recent years, the use of computational methods based on machine learning has attracted the attention of many researchers. Actually, a dry lab environment focusing more on computational methods of interaction prediction can be helpful in limiting search space for wet lab experiments. In this paper, a novel multi-stage approach for DTI is proposed that called SRX-DTI. In the first stage, combination of various descriptors from protein sequences, and a FP2 fingerprint that is encoded from drug are extracted as feature vectors. A major challenge in this application is the imbalanced data due to the lack of known interactions, in this regard, in the second stage, the One-SVM-US technique is proposed to deal with this problem. Next, the FFS-RF algorithm, a forward feature selection algorithm, coupled with a random forest (RF) classifier is developed to maximize the predictive performance. This feature selection algorithm removes irrelevant features to obtain optimal features. Finally, balanced dataset with optimal features is given to the XGBoost classifier to identify DTIs. The experimental results demonstrate that our proposed approach SRX-DTI achieves higher performance than other existing methods in predicting DTIs. The datasets and source code are available at: https://github.com/Khojasteh-hb/SRX-DTI.
Collapse
Affiliation(s)
- Hakimeh Khojasteh
- Department of Computer Engineering, University of Zanjan, Zanjan, Iran
- School of Biological Sciences Institute for Research in Fundamental Sciences (IPM), Tehran, Iran
| | - Jamshid Pirgazi
- School of Biological Sciences Institute for Research in Fundamental Sciences (IPM), Tehran, Iran
- Department of Computer Engineering, University of Science and Technology of Mazandaran, Behshahr, Iran
| | - Ali Ghanbari Sorkhi
- Department of Computer Engineering, University of Science and Technology of Mazandaran, Behshahr, Iran
| |
Collapse
|
10
|
Hasman M, Mayr M, Theofilatos K. Uncovering Protein Networks in Cardiovascular Proteomics. Mol Cell Proteomics 2023; 22:100607. [PMID: 37356494 PMCID: PMC10460687 DOI: 10.1016/j.mcpro.2023.100607] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2022] [Revised: 05/01/2023] [Accepted: 06/20/2023] [Indexed: 06/27/2023] Open
Abstract
Biological networks have been widely used in many different diseases to identify potential biomarkers and design drug targets. In the present review, we describe the main computational techniques for reconstructing and analyzing different types of protein networks and summarize the previous applications of such techniques in cardiovascular diseases. Existing tools are critically compared, discussing when each method is preferred such as the use of co-expression networks for functional annotation of protein clusters and the use of directed networks for inferring regulatory associations. Finally, we are presenting examples of reconstructing protein networks of different types (regulatory, co-expression, and protein-protein interaction networks). We demonstrate the necessity to reconstruct networks separately for each cardiovascular tissue type and disease entity and provide illustrative examples of the importance of taking into consideration relevant post-translational modifications. Finally, we demonstrate and discuss how the findings of protein networks could be interpreted using single-cell RNA-sequencing data.
Collapse
Affiliation(s)
- Maria Hasman
- King's British Heart Foundation Centre, Kings College London, London, United Kingdom
| | - Manuel Mayr
- King's British Heart Foundation Centre, Kings College London, London, United Kingdom
| | | |
Collapse
|
11
|
Ginsberg SD, Sharma S, Norton L, Chiosis G. Targeting stressor-induced dysfunctions in protein-protein interaction networks via epichaperomes. Trends Pharmacol Sci 2023; 44:20-33. [PMID: 36414432 PMCID: PMC9789192 DOI: 10.1016/j.tips.2022.10.006] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2022] [Revised: 10/31/2022] [Accepted: 10/31/2022] [Indexed: 11/21/2022]
Abstract
Diseases are manifestations of complex changes in protein-protein interaction (PPI) networks whereby stressors, genetic, environmental, and combinations thereof, alter molecular interactions and perturb the individual from the level of cells and tissues to the entire organism. Targeting stressor-induced dysfunctions in PPI networks has therefore become a promising but technically challenging frontier in therapeutics discovery. This opinion provides a new framework based upon disrupting epichaperomes - pathological entities that enable dysfunctional rewiring of PPI networks - as a mechanism to revert context-specific PPI network dysfunction to a normative state. We speculate on the implications of recent research in this area for a precision medicine approach to detecting and treating complex diseases, including cancer and neurodegenerative disorders.
Collapse
Affiliation(s)
- Stephen D Ginsberg
- Center for Dementia Research, Nathan Kline Institute, Orangeburg, NY 10962, USA; Department of Psychiatry, New York University Grossman School of Medicine, New York, NY 10016, USA; Department of Neuroscience and Physiology, New York University Grossman School of Medicine, New York, NY 10016, USA; NYU Neuroscience Institute, New York University Grossman School of Medicine, New York, NY 10016, USA
| | - Sahil Sharma
- Program in Chemical Biology, Sloan Kettering Institute, New York, NY 10065, USA
| | - Larry Norton
- Breast Cancer Medicine Service, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA
| | - Gabriela Chiosis
- Program in Chemical Biology, Sloan Kettering Institute, New York, NY 10065, USA; Breast Cancer Medicine Service, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA.
| |
Collapse
|
12
|
Manzo M, Giordano M, Maddalena L, Guarracino MR, Granata I. Novel Data Science Methodologies for Essential Genes Identification Based on Network Analysis. STUDIES IN COMPUTATIONAL INTELLIGENCE 2023:117-145. [DOI: 10.1007/978-3-031-24453-7_7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/02/2023]
|
13
|
Pan X, Hu L, Hu P, You ZH. Identifying Protein Complexes From Protein-Protein Interaction Networks Based on Fuzzy Clustering and GO Semantic Information. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:2882-2893. [PMID: 34242171 DOI: 10.1109/tcbb.2021.3095947] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Protein complexes are of great significance to provide valuable insights into the mechanisms of biological processes of proteins. A variety of computational algorithms have thus been proposed to identify protein complexes in a protein-protein interaction network. However, few of them can perform their tasks by taking into account both network topology and protein attribute information in a unified fuzzy-based clustering framework. Since proteins in the same complex are similar in terms of their attribute information and the consideration of fuzzy clustering can also make it possible for us to identify overlapping complexes, we target to propose such a novel fuzzy-based clustering framework, namely FCAN-PCI, for an improved identification accuracy. To do so, the semantic similarity between the attribute information of proteins is calculated and we then integrate it into a well-established fuzzy clustering model together with the network topology. After that, a momentum method is adopted to accelerate the clustering procedure. FCAN-PCI finally applies a heuristical search strategy to identify overlapping protein complexes. A series of extensive experiments have been conducted to evaluate the performance of FCAN-PCI by comparing it with state-of-the-art identification algorithms and the results demonstrate the promising performance of FCAN-PCI.
Collapse
|
14
|
Wang P, Wang D. Gene Differential Co-Expression Networks Based on RNA-Seq: Construction and Its Applications. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:2829-2841. [PMID: 34383649 DOI: 10.1109/tcbb.2021.3103280] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Gene co-expression network (GCN) becomes an increasingly important tool in omics data analysis. A great challenge for GCN construction is that the sample size is far lower than the number of genes. Traditional methods rely on considerable samples. Moreover, association signals are likely weak, nonlinear and stochastic, which are difficult to be identified among thousands of candidates. In this paper, the gray correlation coefficient (GCC) is introduced, and a novel method to construct gene differential co-expression networks (GDCNs) is proposed. Based on the GDCNs, three measures are proposed to explore informative genes. The proposed method can make full use of the information provided by a handful of samples and overcome the shortages of GCNs, which can evaluate the changes of co-expression relationships that are possibly triggered by treatments. Based on RNA-seq data of Brassica napus, GDCNs under multiple experimental conditions are constructed and investigated. It is found that the GCC-based method is very robust to data processing. The GDCNs facilitate the inference of gene functions and the identification of informative genes that are responsible for stress responsiveness. The GDCN-based approaches integrate the 'guilt by association' and the 'guilt by rewiring' rules, which provide alternative tools for omics data analysis.
Collapse
|
15
|
Complex Network Analysis of Mass Violation, Specifically Mass Killing. ENTROPY 2022; 24:e24081017. [PMID: 35892998 PMCID: PMC9394321 DOI: 10.3390/e24081017] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/20/2022] [Revised: 07/15/2022] [Accepted: 07/18/2022] [Indexed: 11/17/2022]
Abstract
News reports in media contain news about society’s social and political conditions. With the help of publicly available digital datasets of events, it is possible to study a complex network of mass violations, i.e., Mass Killings. Multiple approaches have been applied to bring essential insights into the events and involved actors. Power law distribution behavior finds in the tail of actor mention, co-actor mention, and actor degree tells us about the dominant behavior of influential actors that grows their network with time. The United States, France, Israel, and a few other countries have been identified as major players in the propagation of Mass Killing throughout the past 20 years. It is demonstrated that targeting the removal of influential actors may stop the spreading of such conflicting events and help policymakers and organizations. This paper aims to identify and formulate the conflicts with the actor’s perspective at a global level for a period of time. This process is a generalization to be applied to any level of news, i.e., it is not restricted to only the global level.
Collapse
|
16
|
Feng C, Wu J, Wei H, Xu L, Zou Q. CRCF: A Method of Identifying Secretory Proteins of Malaria Parasites. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:2149-2157. [PMID: 34061749 DOI: 10.1109/tcbb.2021.3085589] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Malaria is a mosquito-borne disease that results in millions of cases and deaths annually. The development of a fast computational method that identifies secretory proteins of the malaria parasite is important for research on antimalarial drugs and vaccines. Thus, a method was developed to identify the secretory proteins of malaria parasites. In this method, a reduced alphabet was selected to recode the original protein sequence. A feature synthesis method was used to synthesise three different types of feature information. Finally, the random forest method was used as a classifier to identify the secretory proteins. In addition, a web server was developed to share the proposed algorithm. Experiments using the benchmark dataset demonstrated that the overall accuracy achieved by the proposed method was greater than 97.8 percent using the 10-fold cross-validation method. Furthermore, the reduced schemes and characteristic performance analyses are discussed.
Collapse
|
17
|
Discovering driver nodes in chronic kidney disease-related networks using Trader as a newly developed algorithm. Comput Biol Med 2022; 148:105892. [PMID: 35932730 DOI: 10.1016/j.compbiomed.2022.105892] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2022] [Revised: 07/04/2022] [Accepted: 07/16/2022] [Indexed: 11/18/2022]
|
18
|
Wang Z, Zhang Y, Li Q, Zou Q, Liu Q. A road map for happiness: The psychological factors related cell types in various parts of human body from single cell RNA-seq data analysis. Comput Biol Med 2022; 143:105286. [PMID: 35183972 DOI: 10.1016/j.compbiomed.2022.105286] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2022] [Revised: 01/16/2022] [Accepted: 01/24/2022] [Indexed: 12/13/2022]
Abstract
Massive evidence from all sources including zoology, neurobiology and immunology has confirmed that psychological factors can raise remarkable physiological effects. Researchers have long been aware of the potential value of these effects and wanted to harness them in the development of new drugs and therapies, for which the mechanism study is a necessary prerequisite. However, most of these studies are restricted to neuroscience, or starts with blood sample and fall into the area of immunity. In this study, we choose to focus on the psychological factor of happiness, mining existing publicly available single cell RNA sequencing (scRNA-seq) data for the expression of happiness-related genes collected from various sources of literature in all types of cells in the samples, finding that the expression of these genes is not restricted within neuro-regulated cells or tissue-resident immune cells, on the opposite, cell types that are unique to tissue and organ without direct regulation from nervous system account for the majority to express the happiness-related genes. Our research is a preliminary exploration of where our body respond to our mind at cell level, and lays the foundation for more detailed mechanism research.
Collapse
Affiliation(s)
- Ziwei Wang
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology, China
| | - Ying Zhang
- Department of Anesthesiology, Hospital T.C.M Affiliated to Southwest Medical University, Luzhou, China
| | - Qun Li
- Department of Pain, The Affiliated Traditional Chinese Medicine Hospital of Southwest Medical University, Luzhou, China
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology, China; Yangtze Delta Region Institute Quzhou, University of Electronic Science and Technology of China, Quzhou, Zhejiang, China.
| | - Qing Liu
- Department of Algology, Hospital T.C.M Affiliated to Southwest Medical University, Luzhou, China.
| |
Collapse
|
19
|
Qi Y, Su B, Lin X, Zhou H. A New Feature Selection Method Based on Feature Distinguishing Ability and Network Influence. J Biomed Inform 2022; 128:104048. [DOI: 10.1016/j.jbi.2022.104048] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2021] [Revised: 02/04/2022] [Accepted: 03/01/2022] [Indexed: 12/18/2022]
|
20
|
Popescu VB, Kanhaiya K, Năstac DI, Czeizler E, Petre I. Network controllability solutions for computational drug repurposing using genetic algorithms. Sci Rep 2022; 12:1437. [PMID: 35082323 PMCID: PMC8791995 DOI: 10.1038/s41598-022-05335-3] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2021] [Accepted: 12/29/2021] [Indexed: 12/22/2022] Open
Abstract
Control theory has seen recently impactful applications in network science, especially in connections with applications in network medicine. A key topic of research is that of finding minimal external interventions that offer control over the dynamics of a given network, a problem known as network controllability. We propose in this article a new solution for this problem based on genetic algorithms. We tailor our solution for applications in computational drug repurposing, seeking to maximize its use of FDA-approved drug targets in a given disease-specific protein-protein interaction network. We demonstrate our algorithm on several cancer networks and on several random networks with their edges distributed according to the Erdős-Rényi, the Scale-Free, and the Small World properties. Overall, we show that our new algorithm is more efficient in identifying relevant drug targets in a disease network, advancing the computational solutions needed for new therapeutic and drug repurposing approaches.
Collapse
Affiliation(s)
| | | | - Dumitru Iulian Năstac
- POLITEHNICA University of Bucharest, Faculty of Electronics, Telecommunications and Information Technology, 061071, Bucharest, Romania
| | - Eugen Czeizler
- Computer Science, Åbo Akademi University, 20500, Turku, Finland
- National Institute for Research and Development in Biological Sciences, 060031, Bucharest, Romania
| | - Ion Petre
- Department of Mathematics and Statistics, University of Turku, 20014, Turku, Finland.
- National Institute for Research and Development in Biological Sciences, 060031, Bucharest, Romania.
| |
Collapse
|
21
|
Li H, Shi L, Gao W, Zhang Z, Zhang L, Wang G. dPromoter-XGBoost: Detecting promoters and strength by combining multiple descriptors and feature selection using XGBoost. Methods 2022; 204:215-222. [PMID: 34998983 DOI: 10.1016/j.ymeth.2022.01.001] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2021] [Revised: 12/13/2021] [Accepted: 01/02/2022] [Indexed: 12/12/2022] Open
Abstract
Promoters play an irreplaceable role in biological processes and genetics, which are responsible for stimulating the transcription and expression of specific genes. Promoter abnormalities have been found in some diseases, and the level of promoter-binding transcription factors can be used as a marker before a disease occurs. Hence, detecting promoters from DNA sequences has important biological significance, particular, distinguishing strong promoters can help to elucidate differences in gene expression and the mechanisms of specific diseases. With the introduction of third-generation sequencing, it is difficult to match the speed of sequencing to the speed of labeling promoters experimentally. Many computing models have been designed to fill this gap and identify unlabeled DNA. However, their feature representation methods are very singular, which cannot reflect the information contained in the original samples. With the aim of avoiding information loss, we propose a computational model based on multiple descriptors and feature selection to jointly express samples. It is worth mentioning that a new feature descriptor called K-mer word vector is defined. The promoter model of multiple feature descriptors dominated by K-mer word vector achieves similar performance to existing methods, the sensitivity of 85.72% can distinguish the promoter more effectively than other methods. Furthermore, the performance of the promoter strength has surpassed published methods, and accuracy of 77.00% greatly improves the ability to distinguish between strong and weak promoters.
Collapse
Affiliation(s)
- Hongfei Li
- College of Information and Computer Engineering, Northeast Forestry University, Harbin, China; Yangtze Delta Region Institute, University of Electronic Science and Technology, Quzhou,China
| | - Lei Shi
- Department of Spine Surgery, Changzheng Hospital, Naval Medical University, Shanghai, China
| | - Wentao Gao
- College of Information and Computer Engineering, Northeast Forestry University, Harbin, China
| | - Zixiao Zhang
- College of Information and Computer Engineering, Northeast Forestry University, Harbin, China
| | - Lichao Zhang
- School of Intelligent Manufacturing and Equipment, Shenzhen Institute of Information Technology, Shenzhen, China
| | - Guohua Wang
- College of Information and Computer Engineering, Northeast Forestry University, Harbin, China.
| |
Collapse
|
22
|
Zhang L, Lv Y, Xu L, Zhou M. A Review of DNA Data Storage Technologies Based on Biomolecules. Curr Bioinform 2022. [DOI: 10.2174/1574893616666210813101237] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
:
In the information age, data storage technology has become the key to improving computer
systems. Since traditional storage technologies cannot meet the demand for massive storage, new DNA
storage technology based on biomolecules attracts much attention. DNA storage refers to the technology
that uses artificially synthesized deoxynucleotide chains to store and read all information, such as documents,
pictures, and audio. First, data are encoded into binary number strings. Then, the four types of
base, A(Adenine), T(Thymine), C(Cytosine), and G(Guanine), are used to encode the corresponding binary
numbers so that the data can be used to construct the target DNA molecules in the form of deoxynucleotide
chains. Subsequently, the corresponding DNA molecules are artificially synthesized, enabling
the data to be stored within them. Compared with traditional storage systems, DNA storage has
major advantages, such as high storage density, long duration, as well as low hardware cost, high access
parallelism, and strong scalability, which satisfies the demands for big data storage. This manuscript
first reviews the origin and development of DNA storage technology, then the storage principles, contents,
and methods are introduced. Finally, the development of DNA storage technology is analyzed.
From the initial research to the cutting edge of this field and beyond, the advantages, disadvantages, and
practical applications of DNA storage technology require continuous exploration.
Collapse
Affiliation(s)
- Lichao Zhang
- Shenzhen Key Laboratory of Photonic Devices and Sensing Systems for Internet of Things, College of Physics and Optoelectronic
Engineering, Shenzhen University, Shenzhen 518060, China
| | - Yuanyuan Lv
- Yangtze Delta Region Institute
(Quzhou), University of Electronic Science and Technology of China, Quzhou 324000, Zhejiang, China
| | - Lei Xu
- School of
Electronic and Communication Engineering, ShenZhen Polytechnic, Shenzhen 518000, China
| | - Murong Zhou
- College of Information
and Computer Engineering, Northeast Forestry University, Harbin, 150000, China
| |
Collapse
|
23
|
Zhang Z, Cui F, Cao C, Wang Q, Zou Q. Single-cell RNA analysis reveals the potential risk of organ-specific cell types vulnerable to SARS-CoV-2 infections. Comput Biol Med 2022; 140:105092. [PMID: 34864302 PMCID: PMC8628631 DOI: 10.1016/j.compbiomed.2021.105092] [Citation(s) in RCA: 71] [Impact Index Per Article: 23.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2021] [Revised: 11/22/2021] [Accepted: 11/26/2021] [Indexed: 12/20/2022]
Abstract
Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has caused a global pandemic of coronavirus disease 2019 (COVID-19) since December 2019 that has led to more than 160 million confirmed cases, including 3.3 million deaths. To understand the mechanism by which SARS-CoV-2 invades human cells and reveal organ-specific susceptible cell types for COVID-19, we conducted comprehensive bioinformatic analysis using public single-cell RNA sequencing datasets. Utilizing the expression information of six confirmed COVID-19 receptors (ACE2, TMPRSS2, NRP1, AXL, FURIN and CTSL), we demonstrated that macrophages are the most likely cells that may be associated with SARS-CoV-2 pathogenesis in lung. Besides the widely reported 'chemokine storm', we identified ribosome related pathways that may also be potential therapeutic target for COVID-19 lung infection patients. Moreover, cell-cell communication analysis and trajectory analysis revealed that M1-like macrophages showed the highest relation to severe COVID-19 patients. And we also demonstrated that up-regulation of chemokine pathways generally lead to severe symptoms, while down-regulation of ribosome and RNA activity related pathways are more likely to be mild. Other organ-specific susceptible cell type analyses could also provide potential targets for COVID-19 therapy. This work can provide clues for understanding the pathogenesis of COVID-19 and contribute to understanding the mechanism by which SARS-CoV-2 invades human cells.
Collapse
Affiliation(s)
- Zilong Zhang
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, 610054, China; Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, 324000, China
| | - Feifei Cui
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, 610054, China; Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, 324000, China
| | - Chen Cao
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, 324000, China
| | - Qingsuo Wang
- Beidahuang Industry Group General Hospital, Harbin, 150001, China.
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, 610054, China; Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, 324000, China.
| |
Collapse
|
24
|
Mouse4mC-BGRU: deep learning for predicting DNA N4-methylcytosine sites in mouse genome. Methods 2022; 204:258-262. [DOI: 10.1016/j.ymeth.2022.01.009] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2021] [Revised: 01/14/2022] [Accepted: 01/24/2022] [Indexed: 12/12/2022] Open
|
25
|
Guo Y, Ju Y, Chen D, Wang L. Research on the Computational Prediction of Essential Genes. Front Cell Dev Biol 2021; 9:803608. [PMID: 34938741 PMCID: PMC8685449 DOI: 10.3389/fcell.2021.803608] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2021] [Accepted: 11/22/2021] [Indexed: 11/19/2022] Open
Abstract
Genes, the nucleotide sequences that encode a polypeptide chain or functional RNA, are the basic genetic unit controlling biological traits. They are the guarantee of the basic structures and functions in organisms, and they store information related to biological factors and processes such as blood type, gestation, growth, and apoptosis. The environment and genetics jointly affect important physiological processes such as reproduction, cell division, and protein synthesis. Genes are related to a wide range of phenomena including growth, decline, illness, aging, and death. During the evolution of organisms, there is a class of genes that exist in a conserved form in multiple species. These genes are often located on the dominant strand of DNA and tend to have higher expression levels. The protein encoded by it usually either performs very important functions or is responsible for maintaining and repairing these essential functions. Such genes are called persistent genes. Among them, the irreplaceable part of the body’s life activities is the essential gene. For example, when starch is the only source of energy, the genes related to starch digestion are essential genes. Without them, the organism will die because it cannot obtain enough energy to maintain basic functions. The function of the proteins encoded by these genes is thought to be fundamental to life. Nowadays, DNA can be extracted from blood, saliva, or tissue cells for genetic testing, and detailed genetic information can be obtained using the most advanced scientific instruments and technologies. The information gained from genetic testing is useful to assess the potential risks of disease, and to help determine the prognosis and development of diseases. Such information is also useful for developing personalized medication and providing targeted health guidance to improve the quality of life. Therefore, it is of great theoretical and practical significance to identify important and essential genes. In this paper, the research status of essential genes and the essential genome database of bacteria are reviewed, the computational prediction method of essential genes based on communication coding theory is expounded, and the significance and practical application value of essential genes are discussed.
Collapse
Affiliation(s)
- Yuxin Guo
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, China.,Key Laboratory of Computational Science and Application of Hainan Province, Haikou, China.,Key Laboratory of Data Science and Intelligence Education, Hainan Normal University, Ministry of Education, Haikou, China.,School of Mathematics and Statistics, Hainan Normal University, Haikou, China
| | - Ying Ju
- School of Informatics, Xiamen University, Xiamen, China
| | - Dong Chen
- College of Electrical and Information Engineering, Quzhou University, Quzhou, China
| | - Lihong Wang
- Beidahuang Industry Group General Hospital, Harbin, China
| |
Collapse
|
26
|
Dou L, Zhou W, Zhang L, Xu L, Han K. Accurate identification of RNA D modification using multiple features. RNA Biol 2021; 18:2236-2246. [PMID: 33729104 PMCID: PMC8632091 DOI: 10.1080/15476286.2021.1898160] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2020] [Revised: 02/13/2021] [Accepted: 02/23/2021] [Indexed: 10/21/2022] Open
Abstract
As one of the common post-transcriptional modifications in tRNAs, dihydrouridine (D) has prominent effects on regulating the flexibility of tRNA as well as cancerous diseases. Facing with the expensive and time-consuming sequencing techniques to detect D modification, precise computational tools can largely promote the progress of molecular mechanisms and medical developments. We proposed a novel predictor, called iRNAD_XGBoost, to identify potential D sites using multiple RNA sequence representations. In this method, by considering the imbalance problem using hybrid sampling method SMOTEEEN, the XGBoost-selected top 30 features are applied to construct model. The optimized model showed high Sn and Sp values of 97.13% and 97.38% over jackknife test, respectively. For the independent experiment, these two metrics separately achieved 91.67% and 94.74%. Compared with iRNAD method, this model illustrated high generalizability and consistent prediction efficiencies for positive and negative samples, which yielded satisfactory MCC scores of 0.94 and 0.86, respectively. It is inferred that the chemical property and nucleotide density features (CPND), electron-ion interaction pseudopotential (EIIP and PseEIIP) as well as dinucleotide composition (DNC) are crucial to the recognition of D modification. The proposed predictor is a promising tool to help experimental biologists investigate molecular functions.
Collapse
Affiliation(s)
- Lijun Dou
- School of Automotive and Transportation Engineering, Shenzhen Polytechnic, Shenzhen, GuangdongChina
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, SichuanChina
| | - Wenyang Zhou
- School of Life Science and Technology, Harbin Institute of Technology, Harbin, HeilongjiangChina
| | - Lichao Zhang
- School of Intelligent Manufacturing and Equipment, Shenzhen Institute of Information Technology, Shenzhen, Guangdong, China
| | - Lei Xu
- School of Electronic and Communication Engineering, Shenzhen Polytechnic, Shenzhen, GuangdongChina
| | - Ke Han
- School of Computer and Information Engineering, Harbin University of Commerce, Harbin, HeilongjiangChina
| |
Collapse
|
27
|
Liu Y, Chen W, He Z. Essential Protein Recognition via Community Significance. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021; 18:2788-2794. [PMID: 34347602 DOI: 10.1109/tcbb.2021.3102018] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Essential protein plays a vital role in understanding the cellular life. With the advance in high-throughput technologies, a number of protein-protein interaction (PPI) networks have been constructed such that essential proteins can be identified from a system biology perspective. Although a series of network-based essential protein discovery methods have been proposed, these existing methods still have some drawbacks. Recently, it has been shown that the significance-based method SigEP is promising on overcoming the defects that are inherent in currently available essential protein identification methods. However, the SigEP method is developed under the unrealistic Erdös-Rényi (E-R) model and its time complexity is very high. Hence, we propose a new significance-based essential protein recognition method named EPCS in which the essential protein discovery problem is formulated as a community significance testing problem. Experimental results on four PPI networks show that EPCS performs better than nine state-of-the-art essential protein identification methods and the only significance-based essential protein identification method SigEP.
Collapse
|
28
|
Béczi E, Gaskó N. Approaching the bi-objective critical node detection problem with a smart initialization-based evolutionary algorithm. PeerJ Comput Sci 2021; 7:e750. [PMID: 34805505 PMCID: PMC8576564 DOI: 10.7717/peerj-cs.750] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2021] [Accepted: 09/28/2021] [Indexed: 06/13/2023]
Abstract
Determining the critical nodes in a complex network is an essential computation problem. Several variants of this problem have emerged due to its wide applicability in network analysis. In this article we study the bi-objective critical node detection problem (BOCNDP), which is a new variant of the well-known critical node detection problem, optimizing two objectives at the same time: maximizing the number of connected components and minimizing the variance of their cardinalities. Evolutionary multi-objective algorithms (EMOA) are a straightforward choice to solve this type of problem. We propose three different smart initialization strategies which can be incorporated into any EMOA. These initialization strategies take into account the basic properties of the networks. They are based on the highest degree, random walk (RW) and depth-first search. Numerical experiments were conducted on synthetic and real-world network data. The three different initialization types significantly improve the performance of the EMOA.
Collapse
Affiliation(s)
- Eliézer Béczi
- Babeş-Bolyai University of Cluj-Napoca, Cluj-Napoca, Romania
| | - Noémi Gaskó
- Babeş-Bolyai University of Cluj-Napoca, Cluj-Napoca, Romania
| |
Collapse
|
29
|
A Network Pharmacology and Molecular Docking Strategy to Explore Potential Targets and Mechanisms Underlying the Effect of Curcumin on Osteonecrosis of the Femoral Head in Systemic Lupus Erythematosus. BIOMED RESEARCH INTERNATIONAL 2021; 2021:5538643. [PMID: 34557547 PMCID: PMC8455200 DOI: 10.1155/2021/5538643] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/18/2021] [Accepted: 08/06/2021] [Indexed: 11/25/2022]
Abstract
Background Systemic lupus erythematosus (SLE) is a refractory immune disease, which is often complicated with osteonecrosis of the femoral head (ONFH). Curcumin, the most active ingredient of Curcuma longa with a variety of biological activities, has wide effects on the body system. The study is aimed at exploring the potential therapeutic targets underlying the effect of curcumin on SLE-ONFH by utilizing a network pharmacology approach and molecular docking strategy. Methods Curcumin and its drug targets were identified using network analysis. First, the Swiss target prediction, GeneCards, and OMIM databases were mined for information relevant to the prediction of curcumin targets and SLE-ONFH-related targets. Second, the curcumin target gene, SLE-ONFH shared gene, and curcumin-SLE-ONFH target gene networks were created in Cytoscape software followed by collecting the candidate targets of each component by R software. Third, the targets and enriched pathways were examined by Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analysis. Eventually, a gene-pathway network was constructed and visualized by Cytoscape software; key potential central targets were verified and checked by molecular docking and literature review. Results 201 potential targets of curcumin and 170 related targets involved in SLE-ONFH were subjected to network analysis, and the 36 intersection targets indicated the potential targets of curcumin for the treatment of SLE-ONFH. Additionally, for getting more comprehensive and accurate candidate genes, the 36 potential targets were determined to be analyzed by network topology and 285 candidate genes were obtained finally. The top 20 biological processes, cellular components, and molecular functions were identified, when corrected by a P value ≤ 0.05. 20 related signaling pathways were identified by KEGG analysis, when corrected according to a Bonferroni P value ≤ 0.05. Molecular docking showed that the top three genes (TP53, IL6, VEGFA) have good binding force with curcumin; combined with literature review, some other genes such as TNF, CCND1, CASP3, and MMP9 were also identified. Conclusion The present study explored the potential targets and signaling pathways of curcumin against SLE-ONFH, which could provide a better understanding of its effects in terms of regulating cell cycle, angiogenesis, immunosuppression, inflammation, and bone destruction.
Collapse
|
30
|
Chen X, Lin Y, Qu Q, Ning B, Chen H, Li X. An epistasis and heterogeneity analysis method based on maximum correlation and maximum consistence criteria. MATHEMATICAL BIOSCIENCES AND ENGINEERING : MBE 2021; 18:7711-7726. [PMID: 34814271 DOI: 10.3934/mbe.2021382] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Tumor heterogeneity significantly increases the difficulty of tumor treatment. The same drugs and treatment methods have different effects on different tumor subtypes. Therefore, tumor heterogeneity is one of the main sources of poor prognosis, recurrence and metastasis. At present, there have been some computational methods to study tumor heterogeneity from the level of genome, transcriptome, and histology, but these methods still have certain limitations. In this study, we proposed an epistasis and heterogeneity analysis method based on genomic single nucleotide polymorphism (SNP) data. First of all, a maximum correlation and maximum consistence criteria was designed based on Bayesian network score K2 and information entropy for evaluating genomic epistasis. As the number of SNPs increases, the epistasis combination space increases sharply, resulting in a combination explosion phenomenon. Therefore, we next use an improved genetic algorithm to search the SNP epistatic combination space for identifying potential feasible epistasis solutions. Multiple epistasis solutions represent different pathogenic gene combinations, which may lead to different tumor subtypes, that is, heterogeneity. Finally, the XGBoost classifier is trained with feature SNPs selected that constitute multiple sets of epistatic solutions to verify that considering tumor heterogeneity is beneficial to improve the accuracy of tumor subtype prediction. In order to demonstrate the effectiveness of our method, the power of multiple epistatic recognition and the accuracy of tumor subtype classification measures are evaluated. Extensive simulation results show that our method has better power and prediction accuracy than previous methods.
Collapse
Affiliation(s)
- Xia Chen
- School of Basic Education, Changsha Aeronautical Vocational and Technical College, Changsha, Hunan 410124, China
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, Hunan 410082, China
| | - Yexiong Lin
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, Hunan 410082, China
| | - Qiang Qu
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, Hunan 410082, China
| | - Bin Ning
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, Hunan 410082, China
| | - Haowen Chen
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, Hunan 410082, China
| | - Xiong Li
- School of Software, East China Jiaotong University, Nanchang 330013, China
| |
Collapse
|
31
|
Liu Y, Wei X, Chen W, Hu L, He Z. A graph-traversal approach to identify influential nodes in a network. PATTERNS 2021; 2:100321. [PMID: 34553168 PMCID: PMC8441579 DOI: 10.1016/j.patter.2021.100321] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/29/2021] [Revised: 05/16/2021] [Accepted: 07/07/2021] [Indexed: 11/19/2022]
Abstract
Influential node identification plays a significant role in understanding network structure and functions. Here we propose a general method for detecting influential nodes in a graph-traversal framework. We evaluate the influence of each node by constructing a breadth-first search (BFS) tree in which the target node is the root node. From the BFS tree, we generate a curve in which the x axis is the level number and the y axis is the cumulative scores of all nodes visited so far. We use the area under the curve value as the final influence score of the target node. Experimental results on various networks across different domains demonstrate that our method can be significantly superior to widely used centrality measures on the task of influential node detection. We propose an influential node detection method, TARank, in a graph-traversal framework We evaluate the influence of each node by constructing a breadth-first search tree TARank is capable of enhancing existing centrality measures TARank can yield new, yet effective, centrality measures as well
The discovery of influential nodes is a fundamental research issue in network science. To quantify the influence of each node in a network, various methods have been presented in the literature. To the best of our knowledge, no previous research efforts address the influential node identification problem from a graph-traversal perspective. To fulfill this void, we propose the TARank method that integrates the information collected from the breadth-first search tree to identify influential nodes. The formulation under the graph-traversal framework opens the door to a fundamentally new type of method of influential node identification. In the future, more effective recognition methods can be expected to be constructed based on this general framework. Since empirical studies have validated the effectiveness of TARank, it would be plausible to employ this method in different applications to reveal new findings.
Collapse
Affiliation(s)
- Yan Liu
- School of Software, Dalian University of Technology, Dalian 116024, China
| | - Xiaoqi Wei
- School of Software, Dalian University of Technology, Dalian 116024, China
| | - Wenfang Chen
- School of Software, Dalian University of Technology, Dalian 116024, China
| | - Lianyu Hu
- School of Software, Dalian University of Technology, Dalian 116024, China
| | - Zengyou He
- School of Software, Dalian University of Technology, Dalian 116024, China
- Key Laboratory for Ubiquitous Network and Service Software of Liaoning Province, Dalian 116024, China
- Corresponding author
| |
Collapse
|
32
|
Lu X, Wang X, Ding L, Li J, Gao Y, He K. frDriver: A Functional Region Driver Identification for Protein Sequence. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021; 18:1773-1783. [PMID: 32870797 DOI: 10.1109/tcbb.2020.3020096] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Identifying cancer drivers is a crucial challenge to explain the underlying mechanisms of cancer development. There are many methods to identify cancer drivers based on the single mutation site or the entire gene. But they ignore a large number of functional elements with medium in size. It is hypothesized that mutations occurring in different regions of the protein sequence have different effects on the progression of cancer. Here, we develop a novel functional region driver(frDriver) identification method based on Bayesian probability and multiple linear regression models to identify protein regions that can regulate gene expression levels and have high functional impact potential. Combining gene expression data and somatic mutation data, with functional impact scores(SIFT, PROVEAN) as a priori knowledge, we identified cancer driver regions that are most accurate in predicting gene expression levels. We evaluated the performance of frDriver on the BRCA and GBM datasets from TCGA. The results showed that frDriver identified known cancer drivers and outperformed the other three state-of-the-art methods(eDriver, ActiveDriver and OncodriveCLUST). In addition, we performed KEGG pathway and GO term enrichment analysis, and the results indicated that the cancer drivers predicted by frDriver were related to processes such as cancer formation and gene regulation.
Collapse
|
33
|
Wei H, Liao Q, Liu B. iLncRNAdis-FB: Identify lncRNA-Disease Associations by Fusing Biological Feature Blocks Through Deep Neural Network. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021; 18:1946-1957. [PMID: 31905146 DOI: 10.1109/tcbb.2020.2964221] [Citation(s) in RCA: 33] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Identification of lncRNA-disease associations is not only important for exploring the disease mechanism, but will also facilitate the molecular targeting drug discovery. Fusing multiple biological information is able to generate a more comprehensive view of lncRNA-disease association feature. However, the existing fusion strategies in this field fail to remove the noisy and irrelevant information from each data source. As a result, their predictive performance is still too low to be applied to real world applications. In this regard, a novel computational predictor called iLncRNAdis-FB is proposed based on the Convolution Neural Network (CNN) to integrate different data sources by using the feature blocks in a supervised manner. The lncRNA similarity matrix and disease similarity matrix are constructed, based on which the three-dimensional feature blocks are generated. These feature blocks are then fed into CNN to train the model so as to predict unknown lncRNA-disease associations. Experimental results show that iLncRNAdis-FB achieves better performance compared with other state-of-the-art predictors. Furthermore, a web server of iLncRNAdis-FB has been established at http://bliulab.net/iLncRNAdis-FB/, by which users can submit lncRNA sequences to detect their potential associated diseases.
Collapse
|
34
|
Wang JH, Chen YH. Network-adjusted Kendall's Tau Measure for Feature Screening with Application to High-dimensional Survival Genomic Data. Bioinformatics 2021; 37:2150-2156. [PMID: 33595070 DOI: 10.1093/bioinformatics/btab064] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2020] [Revised: 12/17/2020] [Accepted: 01/26/2021] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION In high-dimensional genetic/genomic data, the identification of genes related to clinical survival trait is a challenging and important issue. In particular, right-censored survival outcomes and contaminated biomarker data make the relevant feature screening difficult. Several independence screening methods have been developed, but they fail to account for gene-gene dependency information, and may be sensitive to outlying feature data. RESULTS We improve the inverse probability-of-censoring weighted (IPCW) Kendall's tau statistic by using Google's PageRank Markov matrix to incorporate feature dependency network information. Also, to tackle outlying feature data, the nonparanormal approach transforming the feature data to multivariate normal variates are utilized in the graphical lasso procedure to estimate the network structure in feature data. Simulation studies under various scenarios show that the proposed network-adjusted weighted Kendall's tau approach leads to more accurate feature selection and survival prediction than the methods without accounting for feature dependency network information and outlying feature data. The applications on the clinical survival outcome data of diffuse large B-cell lymphoma and of The Cancer Genome Atlas lung adenocarcinoma patients demonstrate clearly the advantages of the new proposal over the alternative methods. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Jie-Huei Wang
- Department of Statistics, Feng Chia University, Seatwen, Taichung 40724, Taiwan
| | - Yi-Hau Chen
- Institute of Statistical Science, Academia Sinica, Nankang, Taipei 11529, Taiwan
| |
Collapse
|
35
|
Xu L, Ru X, Song R. Application of Machine Learning for Drug-Target Interaction Prediction. Front Genet 2021; 12:680117. [PMID: 34234813 PMCID: PMC8255962 DOI: 10.3389/fgene.2021.680117] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2021] [Accepted: 05/28/2021] [Indexed: 11/13/2022] Open
Abstract
Exploring drug–target interactions by biomedical experiments requires a lot of human, financial, and material resources. To save time and cost to meet the needs of the present generation, machine learning methods have been introduced into the prediction of drug–target interactions. The large amount of available drug and target data in existing databases, the evolving and innovative computer technologies, and the inherent characteristics of various types of machine learning have made machine learning techniques the mainstream method for drug–target interaction prediction research. In this review, details of the specific applications of machine learning in drug–target interaction prediction are summarized, the characteristics of each algorithm are analyzed, and the issues that need to be further addressed and explored for future research are discussed. The aim of this review is to provide a sound basis for the construction of high-performance models.
Collapse
Affiliation(s)
- Lei Xu
- School of Electronic and Communication Engineering, Shenzhen Polytechnic, Shenzhen, China
| | - Xiaoqing Ru
- Department of Computer Science, University of Tsukuba, Tsukuba, Japan
| | - Rong Song
- School of Electronic and Communication Engineering, Shenzhen Polytechnic, Shenzhen, China
| |
Collapse
|
36
|
Alofairi AA, Mabrouk E, Elsemman IE. Constraint-based models for dominating protein interaction networks. IET Syst Biol 2021; 15:148-162. [PMID: 34048146 PMCID: PMC8675806 DOI: 10.1049/syb2.12021] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2021] [Revised: 05/10/2021] [Accepted: 05/11/2021] [Indexed: 11/19/2022] Open
Abstract
The minimum dominating set (MDSet) comprises the smallest number of graph nodes, where other graph nodes are connected with at least one MDSet node. The MDSet has been successfully applied to extract proteins that control protein–protein interaction (PPI) networks and to reveal the correlation between structural analysis and biological functions. Although the PPI network contains many MDSets, the identification of multiple MDSets is an NP‐complete problem, and it is difficult to determine the best MDSets, enriched with biological functions. Therefore, the MDSet model needs to be further expanded and validated to find constrained solutions that differ from those generated by the traditional models. Moreover, by identifying the critical set of the network, the set of nodes common to all MDSets can be time‐consuming. Herein, the authors adopted the minimisation of metabolic adjustment (MOMA) algorithm to develop a new framework, called maximisation of interaction adjustment (MOIA). In MOIA, they provide three models; the first one generates two MDSets with a minimum number of shared proteins, the second model generates constrained multiple MDSets (k‐MDSets), and the third model generates user‐defined MDSets, containing the maximum number of essential genes and/or other important genes of the PPI network. In practice, these models significantly reduce the cost of finding the critical set and classifying the graph nodes. Herein, the authors termed the critical set as the k‐critical set, where k is the number of MDSets generated by the proposed model. Then, they defined a new set of proteins called the (k−1)‐critical set, where each node belongs to (k−1) MDSets. This set has been shown to be as important as the k‐critical set and contains many essential genes, transcription factors, and protein kinases as the k‐critical set. The (k−1)‐critical set can be used to extend the search for drug target proteins. Based on the performance of the MOIA models, the authors believe the proposed methods contribute to answering key questions about the MDSets of PPI networks, and their results and analysis can be extended to other network types.
Collapse
Affiliation(s)
- Adel A Alofairi
- Department of Computer Science and Information Technology, Faculty of Science, Ibb University, Ibb, Yemen.,Department of Mathematics, Faculty of Science, Assiut University, Assiut, Egypt
| | - Emad Mabrouk
- Department of Mathematics, Faculty of Science, Assiut University, Assiut, Egypt.,College of Engineering and Technology, American University of the Middle East, Kuwait, Kuwait
| | - Ibrahim E Elsemman
- Department of Information Systems, Faculty of Computers and Information, Assiut University, Assiut, Egypt
| |
Collapse
|
37
|
Zeng R, Cheng S, Liao M. 4mCPred-MTL: Accurate Identification of DNA 4mC Sites in Multiple Species Using Multi-Task Deep Learning Based on Multi-Head Attention Mechanism. Front Cell Dev Biol 2021; 9:664669. [PMID: 34041243 PMCID: PMC8141656 DOI: 10.3389/fcell.2021.664669] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2021] [Accepted: 03/17/2021] [Indexed: 01/10/2023] Open
Abstract
DNA methylation is one of the most extensive epigenetic modifications. DNA 4mC modification plays a key role in regulating chromatin structure and gene expression. In this study, we proposed a generic 4mC computational predictor, namely, 4mCPred-MTL using multi-task learning coupled with Transformer to predict 4mC sites in multiple species. In this predictor, we utilize a multi-task learning framework, in which each task is to train species-specific data based on Transformer. Extensive experimental results show that our multi-task predictive model can significantly improve the performance of the model based on single task and outperform existing methods on benchmarking comparison. Moreover, we found that our model can sufficiently capture better characteristics of 4mC sites as compared to existing commonly used feature descriptors, demonstrating the strong feature learning ability of our model. Therefore, based on the above results, it can be expected that our 4mCPred-MTL can be a useful tool for research communities of interest.
Collapse
Affiliation(s)
- Rao Zeng
- Department of Software Engineering, School of Informatics, Xiamen University, Xiamen, China
| | - Song Cheng
- Department of Thoracic Surgery, Heilongjiang Province Land Reclamation Headquarters General Hospital, Harbin, China
| | - Minghong Liao
- Department of Software Engineering, School of Informatics, Xiamen University, Xiamen, China
| |
Collapse
|
38
|
Shang Y, Gao L, Zou Q, Yu L. Prediction of drug-target interactions based on multi-layer network representation learning. Neurocomputing 2021. [DOI: 10.1016/j.neucom.2020.12.068] [Citation(s) in RCA: 30] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
|
39
|
Lv Y, Huang S, Zhang T, Gao B. Application of Multilayer Network Models in Bioinformatics. Front Genet 2021; 12:664860. [PMID: 33868392 PMCID: PMC8044439 DOI: 10.3389/fgene.2021.664860] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2021] [Accepted: 02/26/2021] [Indexed: 11/24/2022] Open
Abstract
Multilayer networks provide an efficient tool for studying complex systems, and with current, dramatic development of bioinformatics tools and accumulation of data, researchers have applied network concepts to all aspects of research problems in the field of biology. Addressing the combination of multilayer networks and bioinformatics, through summarizing the applications of multilayer network models in bioinformatics, this review classifies applications and presents a summary of the latest results. Among them, we classify the applications of multilayer networks according to the object of study. Furthermore, because of the systemic nature of biology, we classify the subjects into several hierarchical categories, such as cells, tissues, organs, and groups, according to the hierarchical nature of biological composition. On the basis of the complexity of biological systems, we selected brain research for a detailed explanation. We describe the application of multilayer networks and chronological networks in brain research to demonstrate the primary ideas associated with the application of multilayer networks in biological studies. Finally, we mention a quality assessment method focusing on multilayer and single-layer networks as an evaluation method emphasizing network studies.
Collapse
Affiliation(s)
- Yuanyuan Lv
- Hainan Key Laboratory for Computational Science and Application, Hainan Normal University, Haikou, China
- Yangtze Delta Region Institute, University of Electronic Science and Technology of China, Quzhou, China
| | - Shan Huang
- Department of Neurology, The Second Affiliated Hospital of Harbin Medical University, Harbin, China
| | - Tianjiao Zhang
- College of Information and Computer Engineering, Northeast Forestry University, Harbin, China
| | - Bo Gao
- Department of Radiology, The Second Affiliated Hospital, Harbin Medical University, Harbin, China
| |
Collapse
|
40
|
Chen Z, Shen Z, Zhang Z, Zhao D, Xu L, Zhang L. RNA-Associated Co-expression Network Identifies Novel Biomarkers for Digestive System Cancer. Front Genet 2021; 12:659788. [PMID: 33841514 PMCID: PMC8033200 DOI: 10.3389/fgene.2021.659788] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2021] [Accepted: 02/25/2021] [Indexed: 01/04/2023] Open
Abstract
Cancers of the digestive system are malignant diseases. Our study focused on colon cancer, esophageal cancer (ESCC), rectal cancer, gastric cancer (GC), and rectosigmoid junction cancer to identify possible biomarkers for these diseases. The transcriptome data were downloaded from the TCGA database (The Cancer Genome Atlas Program), and a network was constructed using the WGCNA algorithm. Two significant modules were found, and coexpression networks were constructed. CytoHubba was used to identify hub genes of the two networks. GO analysis suggested that the network genes were involved in metabolic processes, biological regulation, and membrane and protein binding. KEGG analysis indicated that the significant pathways were the calcium signaling pathway, fatty acid biosynthesis, and pathways in cancer and insulin resistance. Some of the most significant hub genes were hsa-let-7b-3p, hsa-miR-378a-5p, hsa-miR-26a-5p, hsa-miR-382-5p, and hsa-miR-29b-2-5p and SECISBP2 L, NCOA1, HERC1, HIPK3, and MBNL1, respectively. These genes were predicted to be associated with the tumor prognostic reference for this patient population.
Collapse
Affiliation(s)
- Zheng Chen
- School of Applied Chemistry and Biological Technology, Shenzhen Polytechnic, Shenzhen, China
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
| | - Zijie Shen
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
| | - Zilong Zhang
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
| | - Da Zhao
- School of Applied Chemistry and Biological Technology, Shenzhen Polytechnic, Shenzhen, China
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
| | - Lei Xu
- School of Electronic and Communication Engineering, Shenzhen Polytechnic, Shenzhen, China
| | - Lijun Zhang
- School of Applied Chemistry and Biological Technology, Shenzhen Polytechnic, Shenzhen, China
| |
Collapse
|
41
|
Yang J, Li H, Wang F, Xiao F, Yan W, Hu G. Network-Based Target Prioritization and Drug Candidate Identification for Multiple Sclerosis: From Analyzing "Omics Data" to Druggability Simulations. ACS Chem Neurosci 2021; 12:917-929. [PMID: 33565875 DOI: 10.1021/acschemneuro.1c00011] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
Abstract
Multiple sclerosis (MS) is the most common chronic inflammatory demyelinating disease of the central nervous system. While the drugs currently available for MS provide symptomatic benefit, there is no curative treatment. The emergence of large-scale multiomics data and network theory provide new opportunities for drug discovery in MS, as these are promising strategies for developing novel drugs. In this study, we proposed a computational framework that combined biomolecular network modeling and structural dynamics analysis to facilitate the discovery of new drugs with potential activity in MS. First, we developed a new shortest path-based algorithm that prioritized differentially expressed genes using a newly topological and functional exploration of protein-protein interaction network. Then, pathway enrichment analysis and an assessment of target druggability suggested that TNF-α-induced protein 3 (TNFAIP3), which is involved in NF-κ B signaling, could be a potential therapeutic target for MS. Finally, druggability simulations and mutation enrichment analysis of the TNFAIP3 dimer presented two druggable sites. Follow-up pharmacophore model-based virtual screening of the two sites yielded 30 hit compounds with low energy scores. In summary, this novel method based on analyzing "omics data" and performing druggability simulations, is a systematic approach that unravels disease mechanisms and links them to the chemical space to develop treatments and can be applied to other complex diseases.
Collapse
Affiliation(s)
- Ji Yang
- Center for Systems Biology, Department of Bioinformatics, School of Biology and Basic Medical Sciences, Soochow University, Suzhou 215123, China
| | - Hongchun Li
- Research Center for Computer-Aided Drug Discovery, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China
| | - Fan Wang
- Center for Systems Biology, Department of Bioinformatics, School of Biology and Basic Medical Sciences, Soochow University, Suzhou 215123, China
| | - Fei Xiao
- Center for Systems Biology, Department of Bioinformatics, School of Biology and Basic Medical Sciences, Soochow University, Suzhou 215123, China
| | - Wenying Yan
- Center for Systems Biology, Department of Bioinformatics, School of Biology and Basic Medical Sciences, Soochow University, Suzhou 215123, China
| | - Guang Hu
- Center for Systems Biology, Department of Bioinformatics, School of Biology and Basic Medical Sciences, Soochow University, Suzhou 215123, China
| |
Collapse
|
42
|
Wang Z, Wang D, Bao X, Wu T. A parallel biological computing algorithm to solve the vertex coloring problem with polynomial time complexity. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS 2021. [DOI: 10.3233/jifs-200025] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
The vertex coloring problem is a well-known combinatorial problem that requires each vertex to be assigned a corresponding color so that the colors on adjacent vertices are different, and the total number of colors used is minimized. It is a famous NP-hard problem in graph theory. As of now, there is no effective algorithm to solve it. As a kind of intelligent computing algorithm, DNA computing has the advantages of high parallelism and high storage density, so it is widely used in solving classical combinatorial optimization problems. In this paper, we propose a new DNA algorithm that uses DNA molecular operations to solve the vertex coloring problem. For a simple n-vertex graph and k different kinds of color, we appropriately use DNA strands to indicate edges and vertices. Through basic biochemical reaction operations, the solution to the problem is obtained in the O (kn2) time complexity. Our proposed DNA algorithm is a new attempt and application for solving Nondeterministic Polynomial (NP) problem, and it provides clear evidence for the ability of DNA calculations to perform such difficult computational problems in the future.
Collapse
Affiliation(s)
- Zhaocai Wang
- State Key Laboratory of Simulation and Regulation of River Basin Water Cycle, China Institute of Water Resources and Hydropower Research, Beijing, P. R. China
- College of Information, Shanghai Ocean University, Shanghai, P. R. China
| | - Dangwei Wang
- State Key Laboratory of Simulation and Regulation of River Basin Water Cycle, China Institute of Water Resources and Hydropower Research, Beijing, P. R. China
| | - Xiaoguang Bao
- College of Information, Shanghai Ocean University, Shanghai, P. R. China
| | - Tunhua Wu
- School of Information Engineering, Wenzhou Business College, Wenzhou, P. R. China
| |
Collapse
|
43
|
Wang F, Han S, Yang J, Yan W, Hu G. Knowledge-Guided "Community Network" Analysis Reveals the Functional Modules and Candidate Targets in Non-Small-Cell Lung Cancer. Cells 2021; 10:cells10020402. [PMID: 33669233 PMCID: PMC7919838 DOI: 10.3390/cells10020402] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2021] [Revised: 02/06/2021] [Accepted: 02/15/2021] [Indexed: 12/24/2022] Open
Abstract
Non-small-cell lung cancer (NSCLC) represents a heterogeneous group of malignancies that are the leading cause of cancer-related death worldwide. Although many NSCLC-related genes and pathways have been identified, there remains an urgent need to mechanistically understand how these genes and pathways drive NSCLC. Here, we propose a knowledge-guided and network-based integration method, called the node and edge Prioritization-based Community Analysis, to identify functional modules and their candidate targets in NSCLC. The protein–protein interaction network was prioritized by performing a random walk with restart algorithm based on NSCLC seed genes and the integrating edge weights, and then a “community network” was constructed by combining Girvan–Newman and Label Propagation algorithms. This systems biology analysis revealed that the CCNB1-mediated network in the largest community provides a modular biomarker, the second community serves as a drug regulatory module, and the two are connected by some contextual signaling motifs. Moreover, integrating structural information into the signaling network suggested novel protein–protein interactions with therapeutic significance, such as interactions between GNG11 and CXCR2, CXCL3, and PPBP. This study provides new mechanistic insights into the landscape of cellular functions in the context of modular networks and will help in developing therapeutic targets for NSCLC.
Collapse
Affiliation(s)
- Fan Wang
- Center for Systems Biology, Department of Bioinformatics, School of Biology and Basic Medical Sciences, Soochow University, Suzhou 215123, China; (F.W.); (S.H.); (J.Y.)
| | - Shuqing Han
- Center for Systems Biology, Department of Bioinformatics, School of Biology and Basic Medical Sciences, Soochow University, Suzhou 215123, China; (F.W.); (S.H.); (J.Y.)
| | - Ji Yang
- Center for Systems Biology, Department of Bioinformatics, School of Biology and Basic Medical Sciences, Soochow University, Suzhou 215123, China; (F.W.); (S.H.); (J.Y.)
| | - Wenying Yan
- Center for Systems Biology, Department of Bioinformatics, School of Biology and Basic Medical Sciences, Soochow University, Suzhou 215123, China; (F.W.); (S.H.); (J.Y.)
- Correspondence: (W.Y.); (G.H.)
| | - Guang Hu
- Center for Systems Biology, Department of Bioinformatics, School of Biology and Basic Medical Sciences, Soochow University, Suzhou 215123, China; (F.W.); (S.H.); (J.Y.)
- State Key Laboratory of Radiation Medicine and Protection, Soochow University, Suzhou 215123, China
- Correspondence: (W.Y.); (G.H.)
| |
Collapse
|
44
|
Cui F, Zhang Z, Zou Q. Sequence representation approaches for sequence-based protein prediction tasks that use deep learning. Brief Funct Genomics 2021; 20:61-73. [PMID: 33527980 DOI: 10.1093/bfgp/elaa030] [Citation(s) in RCA: 31] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2020] [Revised: 12/16/2020] [Accepted: 12/18/2020] [Indexed: 11/12/2022] Open
Abstract
Deep learning has been increasingly used in bioinformatics, especially in sequence-based protein prediction tasks, as large amounts of biological data are available and deep learning techniques have been developed rapidly in recent years. For sequence-based protein prediction tasks, the selection of a suitable model architecture is essential, whereas sequence data representation is a major factor in controlling model performance. Here, we summarized all the main approaches that are used to represent protein sequence data (amino acid sequence encoding or embedding), which include end-to-end embedding methods, non-contextual embedding methods and embedding methods that use transfer learning and others that are applied for some specific tasks (such as protein sequence embedding based on extracted features for protein structure predictions and graph convolutional network-based embedding for drug discovery tasks). We have also reviewed the architectures of various types of embedding models theoretically and the development of these types of sequence embedding approaches to facilitate researchers and users in selecting the model that best suits their requirements.
Collapse
Affiliation(s)
- Feifei Cui
- University of Electronic Science and Technology of China, Chengdu, Sichuan, China
| | - Zilong Zhang
- University of Electronic Science and Technology of China, Chengdu, Sichuan, China
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, Sichuan, China
| |
Collapse
|
45
|
Shi W, Chen X, Deng L. A Review of Recent Developments and Progress in Computational Drug Repositioning. Curr Pharm Des 2021; 26:3059-3068. [PMID: 31951162 DOI: 10.2174/1381612826666200116145559] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2019] [Accepted: 01/09/2020] [Indexed: 12/27/2022]
Abstract
Computational drug repositioning is an efficient approach towards discovering new indications for existing drugs. In recent years, with the accumulation of online health-related information and the extensive use of biomedical databases, computational drug repositioning approaches have achieved significant progress in drug discovery. In this review, we summarize recent advancements in drug repositioning. Firstly, we explicitly demonstrated the available data source information which is conducive to identifying novel indications. Furthermore, we provide a summary of the commonly used computing approaches. For each method, we briefly described techniques, case studies, and evaluation criteria. Finally, we discuss the limitations of the existing computing approaches.
Collapse
Affiliation(s)
- Wanwan Shi
- School of Computer Science and Engineering, Central South University, Changsha, China
| | - Xuegong Chen
- School of Computer Science and Engineering, Central South University, Changsha, China
| | - Lei Deng
- School of Computer Science and Engineering, Central South University, Changsha, China
| |
Collapse
|
46
|
Wu Z, Liao Q, Fan S, Liu B. idenPC-CAP: Identify protein complexes from weighted RNA-protein heterogeneous interaction networks using co-assemble partner relation. Brief Bioinform 2020; 22:6041167. [PMID: 33333549 DOI: 10.1093/bib/bbaa372] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2020] [Revised: 11/07/2020] [Accepted: 11/20/2020] [Indexed: 12/18/2022] Open
Abstract
Protein complexes play important roles in most cellular processes. The available genome-wide protein-protein interaction (PPI) data make it possible for computational methods identifying protein complexes from PPI networks. However, PPI datasets usually contain a large ratio of false positive noise. Moreover, different types of biomolecules in a living cell cooperate to form a union interaction network. Because previous computational methods focus only on PPIs ignoring other types of biomolecule interactions, their predicted protein complexes often contain many false positive proteins. In this study, we develop a novel computational method idenPC-CAP to identify protein complexes from the RNA-protein heterogeneous interaction network consisting of RNA-RNA interactions, RNA-protein interactions and PPIs. By considering interactions among proteins and RNAs, the new method reduces the ratio of false positive proteins in predicted protein complexes. The experimental results demonstrate that idenPC-CAP outperforms the other state-of-the-art methods in this field.
Collapse
Affiliation(s)
- Zhourun Wu
- School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen, Guangdong, China
| | - Qing Liao
- School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen, Guangdong, China
| | - Shixi Fan
- School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen, Guangdong, China
| | - Bin Liu
- School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen, China
| |
Collapse
|
47
|
Nandi S, Ganguli P, Sarkar RR. Essential gene prediction using limited gene essentiality information-An integrative semi-supervised machine learning strategy. PLoS One 2020; 15:e0242943. [PMID: 33253254 PMCID: PMC7703937 DOI: 10.1371/journal.pone.0242943] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2020] [Accepted: 11/12/2020] [Indexed: 11/24/2022] Open
Abstract
Essential gene prediction helps to find minimal genes indispensable for the survival of any organism. Machine learning (ML) algorithms have been useful for the prediction of gene essentiality. However, currently available ML pipelines perform poorly for organisms with limited experimental data. The objective is the development of a new ML pipeline to help in the annotation of essential genes of less explored disease-causing organisms for which minimal experimental data is available. The proposed strategy combines unsupervised feature selection technique, dimension reduction using the Kamada-Kawai algorithm, and semi-supervised ML algorithm employing Laplacian Support Vector Machine (LapSVM) for prediction of essential and non-essential genes from genome-scale metabolic networks using very limited labeled dataset. A novel scoring technique, Semi-Supervised Model Selection Score, equivalent to area under the ROC curve (auROC), has been proposed for the selection of the best model when supervised performance metrics calculation is difficult due to lack of data. The unsupervised feature selection followed by dimension reduction helped to observe a distinct circular pattern in the clustering of essential and non-essential genes. LapSVM then created a curve that dissected this circle for the classification and prediction of essential genes with high accuracy (auROC > 0.85) even with 1% labeled data for model training. After successful validation of this ML pipeline on both Eukaryotes and Prokaryotes that show high accuracy even when the labeled dataset is very limited, this strategy is used for the prediction of essential genes of organisms with inadequate experimentally known data, such as Leishmania sp. Using a graph-based semi-supervised machine learning scheme, a novel integrative approach has been proposed for essential gene prediction that shows universality in application to both Prokaryotes and Eukaryotes with limited labeled data. The essential genes predicted using the pipeline provide an important lead for the prediction of gene essentiality and identification of novel therapeutic targets for antibiotic and vaccine development against disease-causing parasites.
Collapse
Affiliation(s)
- Sutanu Nandi
- Chemical Engineering and Process Development, CSIR-National Chemical Laboratory, Pune, Maharashtra, India
- Academy of Scientific & Innovative Research (AcSIR), Ghaziabad, India
| | - Piyali Ganguli
- Chemical Engineering and Process Development, CSIR-National Chemical Laboratory, Pune, Maharashtra, India
- Academy of Scientific & Innovative Research (AcSIR), Ghaziabad, India
| | - Ram Rup Sarkar
- Chemical Engineering and Process Development, CSIR-National Chemical Laboratory, Pune, Maharashtra, India
- Academy of Scientific & Innovative Research (AcSIR), Ghaziabad, India
| |
Collapse
|
48
|
Dou L, Li X, Zhang L, Xiang H, Xu L. iGlu_AdaBoost: Identification of Lysine Glutarylation Using the AdaBoost Classifier. J Proteome Res 2020; 20:191-201. [PMID: 33090794 DOI: 10.1021/acs.jproteome.0c00314] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/01/2023]
Abstract
Lysine glutarylation is a newly reported post-translational modification (PTM) that plays significant roles in regulating metabolic and mitochondrial processes. Accurate identification of protein glutarylation is the primary task to better investigate molecular functions and various applications. Due to the common disadvantages of the time-consuming and expensive nature of traditional biological sequencing techniques as well as the explosive growth of protein data, building precise computational models to rapidly diagnose glutarylation is a popular and feasible solution. In this work, we proposed a novel AdaBoost-based predictor called iGlu_AdaBoost to distinguish glutarylation and non-glutarylation sequences. Here, the top 37 features were chosen from a total of 1768 combined features using Chi2 following incremental feature selection (IFS) to build the model, including 188D, the composition of k-spaced amino acid pairs (CKSAAP), and enhanced amino acid composition (EAAC). With the help of the hybrid-sampling method SMOTE-Tomek, the AdaBoost algorithm was performed with satisfactory recall, specificity, and AUC values of 87.48%, 72.49%, and 0.89 over 10-fold cross validation as well as 72.73%, 71.92%, and 0.63 over independent test, respectively. Further feature analysis inferred that positively charged amino acids RK play critical roles in glutarylation recognition. Our model presented the well generalization ability and consistency of the prediction results of positive and negative samples, which is comparable to four published tools. The proposed predictor is an efficient tool to find potential glutarylation sites and provides helpful suggestions for further research on glutarylation mechanisms and concerned disease treatments.
Collapse
Affiliation(s)
- Lijun Dou
- School of Automotive and Transportation Engineering, Shenzhen Polytechnic, Shenzhen 518055, China.,Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Xiaoling Li
- Department of Oncology, Heilongjiang Province Land Reclamation Headquarters General Hospital, Harbin 150000, China
| | - Lichao Zhang
- School of Intelligent Manufacturing and Equipment, Shenzhen Institute of Information Technology, Shenzhen 518172, China
| | - Huaikun Xiang
- School of Automotive and Transportation Engineering, Shenzhen Polytechnic, Shenzhen 518055, China
| | - Lei Xu
- School of Electronic and Communication Engineering, Shenzhen Polytechnic, Shenzhen 518055, China
| |
Collapse
|
49
|
Li Q, Zhou W, Wang D, Wang S, Li Q. Prediction of Anticancer Peptides Using a Low-Dimensional Feature Model. Front Bioeng Biotechnol 2020; 8:892. [PMID: 32903381 PMCID: PMC7434836 DOI: 10.3389/fbioe.2020.00892] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2020] [Accepted: 07/10/2020] [Indexed: 01/09/2023] Open
Abstract
Cancer is still a severe health problem globally. The therapy of cancer traditionally involves the use of radiotherapy or anticancer drugs to kill cancer cells, but these methods are quite expensive and have side effects, which will cause great harm to patients. With the find of anticancer peptides (ACPs), significant progress has been achieved in the therapy of tumors. Therefore, it is invaluable to accurately identify anticancer peptides. Although biochemical experiments can solve this work, this method is expensive and time-consuming. To promote the application of anticancer peptides in cancer therapy, machine learning can be used to recognize anticancer peptides by extracting the feature vectors of anticancer peptides. Nevertheless, poor performance usually be found in training the machine learning model to utilizing high-dimensional features in practice. In order to solve the above job, this paper put forward a 19-dimensional feature model based on anticancer peptide sequences, which has lower dimensionality and better performance than some existing methods. In addition, this paper also separated a model with a low number of dimensions and acceptable performance. The few features identified in this study may represent the important features of anticancer peptides.
Collapse
Affiliation(s)
- Qingwen Li
- College of Animal Science and Technology, Northeast Agricultural University, Harbin, China
| | - Wenyang Zhou
- Center for Bioinformatics, School of Life Sciences and Technology, Harbin Institute of Technology, Harbin, China
| | - Donghua Wang
- Department of General Surgery, Heilongjiang Province Land Reclamation Headquarters General Hospital, Harbin, China
| | - Sui Wang
- Key Laboratory of Soybean Biology in Chinese Ministry of Education, Northeast Agricultural University, Harbin, China
- State Key Laboratory of Tree Genetics and Breeding, Northeast Forestry University, Harbin, China
| | - Qingyuan Li
- Forestry and Fruit Tree Research Institute, Wuhan Academy of Agricultural Sciences, Wuhan, China
| |
Collapse
|
50
|
Liu Z, Zhang Y, Han X, Li C, Yang X, Gao J, Xie G, Du N. Identifying Cancer-Related lncRNAs Based on a Convolutional Neural Network. Front Cell Dev Biol 2020; 8:637. [PMID: 32850792 PMCID: PMC7432192 DOI: 10.3389/fcell.2020.00637] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2020] [Accepted: 06/24/2020] [Indexed: 12/15/2022] Open
Abstract
Millions of people are suffering from cancers, but accurate early diagnosis and effective treatment are still tough for all doctors. In recent years, long non-coding RNAs (lncRNAs) have been proven to play an important role in diseases, especially cancers. These lncRNAs execute their functions by regulating gene expression. Therefore, identifying lncRNAs which are related to cancers could help researchers gain a deeper understanding of cancer mechanisms and help them find treatment options. A large number of relationships between lncRNAs and cancers have been verified by biological experiments, which give us a chance to use computational methods to identify cancer-related lncRNAs. In this paper, we applied the convolutional neural network (CNN) to identify cancer-related lncRNAs by lncRNA's target genes and their tissue expression specificity. Since lncRNA regulates target gene expression and it has been reported to have tissue expression specificity, their target genes and expression in different tissues were used as features of lncRNAs. Then, the deep belief network (DBN) was used to unsupervised encode features of lncRNAs. Finally, CNN was used to predict cancer-related lncRNAs based on known relationships between lncRNAs and cancers. For each type of cancer, we built a CNN model to predict its related lncRNAs. We identified more related lncRNAs for 41 kinds of cancers. Ten-cross validation has been used to prove the performance of our method. The results showed that our method is better than several previous methods with area under the curve (AUC) 0.81 and area under the precision–recall curve (AUPR) 0.79. To verify the accuracy of our results, case studies have been done.
Collapse
Affiliation(s)
- Zihao Liu
- Department of Oncology, Medical School of Chinese PLA, Chinese PLA General Hospital, Beijing, China.,Department of Oncology, The Fourth Medical Center, Chinese PLA General Hospital, Beijing, China
| | - Ying Zhang
- Department of Pharmacy, Heilongjiang Province Land Reclamation Headquarters General Hospital, Harbin, China
| | - Xudong Han
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Chenxi Li
- Department of Oncology, The Fourth Medical Center, Chinese PLA General Hospital, Beijing, China
| | - Xuhui Yang
- Department of Oncology, Medical School of Chinese PLA, Chinese PLA General Hospital, Beijing, China
| | - Jie Gao
- Department of Oncology, The Fourth Medical Center, Chinese PLA General Hospital, Beijing, China
| | - Ganfeng Xie
- Department of Oncology, Southwest Hospital, Army Medical University, Chongqing, China
| | - Nan Du
- Department of Oncology, Medical School of Chinese PLA, Chinese PLA General Hospital, Beijing, China.,Department of Oncology, The Fourth Medical Center, Chinese PLA General Hospital, Beijing, China
| |
Collapse
|