1
|
Jiang Y, Wang J, Zhang Y, Cao Z, Zhang Q, Su J, He S, Bo X. Graph based recurrent network for context specific synthetic lethality prediction. SCIENCE CHINA. LIFE SCIENCES 2025; 68:527-540. [PMID: 39422810 DOI: 10.1007/s11427-023-2618-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/23/2024] [Accepted: 05/10/2024] [Indexed: 10/19/2024]
Abstract
The concept of synthetic lethality (SL) has been successfully used for targeted therapies. To further explore SL for cancer therapy, identifying more SL interactions with therapeutic potential are essential. Recently, graph neural network-based deep learning methods have been proposed for SL prediction, which reduce the SL search space of wet-lab based methods. However, these methods ignore that most SL interactions depend strongly on genetic context, which limits the application of the predicted results. In this study, we proposed a graph recurrent network-based model for specific context-dependent SL prediction (SLGRN). In particular, we introduced a Graph Recurrent Network-based encoder to acquire a context-specific, low-dimensional feature representation for each node, facilitating the prediction of novel SL. SLGRN leveraged gate recurrent unit (GRU) and it incorporated a context-dependent-level state to effectively integrate information from all nodes. As a result, SLGRN outperforms the state-of-the-arts models for SL prediction. We subsequently validate novel SL interactions under different contexts based on combination therapy or patient survival analysis. Through in vitro experiments and retrospective clinical analysis, we emphasize the potential clinical significance of this context-specific SL prediction model.
Collapse
Affiliation(s)
- Yuyang Jiang
- Academy of Medical Engineering and Translational Medicine, Tianjin University, Tianjin, 300072, China
- Department of Bioinformatics, Institute of Health Service and Transfusion Medicine, Beijing, 100850, China
| | - Jing Wang
- School of Medicine, Tsinghua University, Beijing, 100084, China
| | - Yixin Zhang
- Department of Bioinformatics, Institute of Health Service and Transfusion Medicine, Beijing, 100850, China
| | - ZhiWei Cao
- School of Informatics, Xiamen University, Xiamen, 361005, China
| | - Qinglong Zhang
- Department of Bioinformatics, Institute of Health Service and Transfusion Medicine, Beijing, 100850, China
| | - Jinsong Su
- School of Informatics, Xiamen University, Xiamen, 361005, China.
| | - Song He
- Department of Bioinformatics, Institute of Health Service and Transfusion Medicine, Beijing, 100850, China.
| | - Xiaochen Bo
- Department of Bioinformatics, Institute of Health Service and Transfusion Medicine, Beijing, 100850, China.
| |
Collapse
|
2
|
Feng Y, Long Y, Wang H, Ouyang Y, Li Q, Wu M, Zheng J. Benchmarking machine learning methods for synthetic lethality prediction in cancer. Nat Commun 2024; 15:9058. [PMID: 39428397 PMCID: PMC11491473 DOI: 10.1038/s41467-024-52900-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2023] [Accepted: 09/23/2024] [Indexed: 10/22/2024] Open
Abstract
Synthetic lethality (SL) is a gold mine of anticancer drug targets, exposing cancer-specific dependencies of cellular survival. To complement resource-intensive experimental screening, many machine learning methods for SL prediction have emerged recently. However, a comprehensive benchmarking is lacking. This study systematically benchmarks 12 recent machine learning methods for SL prediction, assessing their performance across diverse data splitting scenarios, negative sample ratios, and negative sampling techniques, on both classification and ranking tasks. We observe that all the methods can perform significantly better by improving data quality, e.g., excluding computationally derived SLs from training and sampling negative labels based on gene expression. Among the methods, SLMGAE performs the best. Furthermore, the methods have limitations in realistic scenarios such as cold-start independent tests and context-specific SLs. These results, together with source code and datasets made freely available, provide guidance for selecting suitable methods and developing more powerful techniques for SL virtual screening.
Collapse
Affiliation(s)
- Yimiao Feng
- School of Information Science and Technology, ShanghaiTech University, Shanghai, China
- Lingang Laboratory, Shanghai, China
| | - Yahui Long
- Bioinformatics Institute (BII), Agency for Science, Technology and Research (A*STAR), Singapore, Singapore
| | - He Wang
- School of Information Science and Technology, ShanghaiTech University, Shanghai, China
| | - Yang Ouyang
- School of Information Science and Technology, ShanghaiTech University, Shanghai, China
| | - Quan Li
- School of Information Science and Technology, ShanghaiTech University, Shanghai, China
| | - Min Wu
- Institute for Infocomm Research, Agency for Science, Technology and Research (A*STAR), Singapore, Singapore.
| | - Jie Zheng
- School of Information Science and Technology, ShanghaiTech University, Shanghai, China.
- Shanghai Engineering Research Center of Intelligent Vision and Imaging, Shanghai, China.
| |
Collapse
|
3
|
Fan K, Gökbağ B, Tang S, Li S, Huang Y, Wang L, Cheng L, Li L. Synthetic lethal connectivity and graph transformer improve synthetic lethality prediction. Brief Bioinform 2024; 25:bbae425. [PMID: 39210507 PMCID: PMC11361842 DOI: 10.1093/bib/bbae425] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2024] [Revised: 06/14/2024] [Accepted: 08/16/2024] [Indexed: 09/04/2024] Open
Abstract
Synthetic lethality (SL) has shown great promise for the discovery of novel targets in cancer. CRISPR double-knockout (CDKO) technologies can only screen several hundred genes and their combinations, but not genome-wide. Therefore, good SL prediction models are highly needed for genes and gene pairs selection in CDKO experiments. However, lack of scalable SL properties prevents generalizability of SL interactions to out-of-sample data, thereby hindering modeling efforts. In this paper, we recognize that SL connectivity is a scalable and generalizable SL property. We develop a novel two-step multilayer encoder for individual sample-specific SL prediction model (MLEC-iSL), which predicts SL connectivity first and SL interactions subsequently. MLEC-iSL has three encoders, namely, gene, graph, and transformer encoders. MLEC-iSL achieves high SL prediction performance in K562 (AUPR, 0.73; AUC, 0.72) and Jurkat (AUPR, 0.73; AUC, 0.71) cells, while no existing methods exceed 0.62 AUPR and AUC. The prediction performance of MLEC-iSL is validated in a CDKO experiment in 22Rv1 cells, yielding a 46.8% SL rate among 987 selected gene pairs. The screen also reveals SL dependency between apoptosis and mitosis cell death pathways.
Collapse
Affiliation(s)
- Kunjie Fan
- Department of Biomedical Informatics, College of Medicine, The Ohio State University, 1800 Cannon Drive, Columbus, OH 43210, United States
| | - Birkan Gökbağ
- Department of Biomedical Informatics, College of Medicine, The Ohio State University, 1800 Cannon Drive, Columbus, OH 43210, United States
| | - Shan Tang
- Department of Biomedical Informatics, College of Pharmacy, The Ohio State University, 500 W. 12 ave, Columbus, OH 43210, United States
| | - Shangjia Li
- Department of Biomedical Informatics, College of Medicine, The Ohio State University, 1800 Cannon Drive, Columbus, OH 43210, United States
| | - Yirui Huang
- Department of Biomedical Informatics, College of Pharmacy, The Ohio State University, 500 W. 12 ave, Columbus, OH 43210, United States
| | - Lingling Wang
- Department of Biomedical Informatics, College of Medicine, The Ohio State University, 1800 Cannon Drive, Columbus, OH 43210, United States
| | - Lijun Cheng
- Department of Biomedical Informatics, College of Medicine, The Ohio State University, 1800 Cannon Drive, Columbus, OH 43210, United States
| | - Lang Li
- Department of Biomedical Informatics, College of Medicine, The Ohio State University, 1800 Cannon Drive, Columbus, OH 43210, United States
- Department of Biomedical Informatics, College of Pharmacy, The Ohio State University, 500 W. 12 ave, Columbus, OH 43210, United States
| |
Collapse
|
4
|
Fan K, Tang S, Gökbağ B, Cheng L, Li L. Multi-view graph convolutional network for cancer cell-specific synthetic lethality prediction. Front Genet 2023; 13:1103092. [PMID: 36699450 PMCID: PMC9868610 DOI: 10.3389/fgene.2022.1103092] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2022] [Accepted: 12/22/2022] [Indexed: 01/11/2023] Open
Abstract
Synthetic lethal (SL) genetic interactions have been regarded as a promising focus for investigating potential targeted therapeutics to tackle cancer. However, the costly investment of time and labor associated with wet-lab experimental screenings to discover potential SL relationships motivates the development of computational methods. Although graph neural network (GNN) models have performed well in the prediction of SL gene pairs, existing GNN-based models are not designed for predicting cancer cell-specific SL interactions that are more relevant to experimental validation in vitro. Besides, neither have existing methods fully utilized diverse graph representations of biological features to improve prediction performance. In this work, we propose MVGCN-iSL, a novel multi-view graph convolutional network (GCN) model to predict cancer cell-specific SL gene pairs, by incorporating five biological graph features and multi-omics data. Max pooling operation is applied to integrate five graph-specific representations obtained from GCN models. Afterwards, a deep neural network (DNN) model serves as the prediction module to predict the SL interactions in individual cancer cells (iSL). Extensive experiments have validated the model's successful integration of the multiple graph features and state-of-the-art performance in the prediction of potential SL gene pairs as well as generalization ability to novel genes.
Collapse
Affiliation(s)
- Kunjie Fan
- Department of Biomedical Informatics, The Ohio State University, Columbus, OH, United States
| | - Shan Tang
- College of Pharmacy, The Ohio State University, Columbus, OH, United States
| | - Birkan Gökbağ
- Department of Biomedical Informatics, The Ohio State University, Columbus, OH, United States
| | - Lijun Cheng
- Department of Biomedical Informatics, The Ohio State University, Columbus, OH, United States
| | - Lang Li
- Department of Biomedical Informatics, The Ohio State University, Columbus, OH, United States,College of Pharmacy, The Ohio State University, Columbus, OH, United States,*Correspondence: Lang Li,
| |
Collapse
|
5
|
Chen L, Dou P, Li L, Chen Y, Yang H. Transcriptome-wide analysis reveals core transcriptional regulators associated with culm development and variation in Dendrocalamus sinicus, the strongest woody bamboo in the world. Heliyon 2022; 8:e12600. [PMID: 36593818 PMCID: PMC9803789 DOI: 10.1016/j.heliyon.2022.e12600] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2022] [Revised: 07/15/2022] [Accepted: 12/15/2022] [Indexed: 12/24/2022] Open
Abstract
Transcription factors (TFs) play indispensable roles in plant development and stress responses. As the largest woody bamboo species in the world, Dendrocalamus sinicus is endemic to Yunnan Province, China, and possesses two natural variants characterized by culm shape, namely straight or bent culms. Understanding the transcriptional regulation network of D. sinicus provides a unique opportunity to clarify the growth and development characteristics of woody bamboos. In this study, 10,236 TF transcripts belonging to 57 families were identified from transcriptome data of two variants at different developmental stages, from which we constructed a transcriptional regulatory network and unigene-coding protein-TFs interactive network of culm development for this attractive species. Gene function enrichment analysis revealed that hormone signaling and MAPK signaling pathways were two most enriched pathways in TF-regulated network. Based on PPI analysis, 50 genes interacting with nine TFs were screened as the core regulation components related to culm development. Of them, 18 synergistic genes of seven TFs, including nuclear cap-binding protein subunit 1, transcription factor GTE9-like, and ATP-dependent DNA helicase DDX11 isoform X1, involved in culm-shape variation. Most of these genes would interact with MYB, C3H, and ARF transcription factors. Six members with two each from ARF, C3H, and MYB transcription factor families and six key interacting genes (IAA3, IAA19, leucine-tRNA ligase, nuclear cap-binding protein subunit 1, elongation factor 2, and coiled-coil domain-containing protein 94) cooperate with these transcription factors were differentially expressed at development stage of young culms, and were validated by quantitative PCR. Our results represent a crucial step towards understanding the regulatory mechanisms of TFs involved in culm development and variation of D. sinicus.
Collapse
Affiliation(s)
- Lingna Chen
- Institute of Highland Forest Science, Chinese Academy of Forestry, Bailongsi, Panlong District, Kunming 650233, PR China,College of Life Science, Xinjiang Normal University, Xinyi Road, Shayibake District, Urumqi 830054, PR China
| | - Peitong Dou
- Institute of Highland Forest Science, Chinese Academy of Forestry, Bailongsi, Panlong District, Kunming 650233, PR China
| | - Lushuang Li
- Institute of Highland Forest Science, Chinese Academy of Forestry, Bailongsi, Panlong District, Kunming 650233, PR China
| | - Yongkun Chen
- College of Life Science, Xinjiang Normal University, Xinyi Road, Shayibake District, Urumqi 830054, PR China,Xinjiang Key Laboratory of Special Species Conservation and Regulatory Biology, Xinyi Road, Shayibake District, Urumqi 830054, PR China,Corresponding author.
| | - Hanqi Yang
- Institute of Highland Forest Science, Chinese Academy of Forestry, Bailongsi, Panlong District, Kunming 650233, PR China,Corresponding author.
| |
Collapse
|
6
|
Zeng Z, Zheng W, Hou P. The role of drug-metabolizing enzymes in synthetic lethality of cancer. Pharmacol Ther 2022; 240:108219. [PMID: 35636517 DOI: 10.1016/j.pharmthera.2022.108219] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2022] [Revised: 05/23/2022] [Accepted: 05/24/2022] [Indexed: 12/14/2022]
Abstract
Drug-metabolizing enzymes (DMEs) have shown increasing importance in anticancer therapy. It is not only due to their effect on activation or deactivation of anticancer drugs, but also because of their extensive connections with pathological and biochemistry changes during tumorigenesis. Meanwhile, it has become more accessible to discovery anticancer drugs that selectively targeted cancer cells with the development of synthetic lethal screen technology. Synthetic lethal strategy makes use of unique genetic markers that different cancer cells from normal tissues to discovery anticancer agents. Dysregulation of DMEs has been found in various cancers, making them promising candidates for synthetic lethal strategy. In this review, we will systematically discuss about the role of DMEs in tumor progression, the application of synthetic lethality strategy in drug discovery, and a link between DMEs and synthetic lethal of cancer.
Collapse
Affiliation(s)
- Zekun Zeng
- Department of Endocrinology, The First Affiliated Hospital of Xi'an Jiaotong University, Xi'an 710061, PR China
| | - Wenfang Zheng
- Department of Endocrinology, The First Affiliated Hospital of Xi'an Jiaotong University, Xi'an 710061, PR China
| | - Peng Hou
- Department of Endocrinology, The First Affiliated Hospital of Xi'an Jiaotong University, Xi'an 710061, PR China; Key Laboratory for Tumor Precision Medicine of Shaanxi Province, The First Affiliated Hospital of Xi'an Jiaotong University, Xi'an 710061, PR China.
| |
Collapse
|
7
|
Wang J, Zhang Q, Han J, Zhao Y, Zhao C, Yan B, Dai C, Wu L, Wen Y, Zhang Y, Leng D, Wang Z, Yang X, He S, Bo X. Computational methods, databases and tools for synthetic lethality prediction. Brief Bioinform 2022; 23:6555403. [PMID: 35352098 PMCID: PMC9116379 DOI: 10.1093/bib/bbac106] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2021] [Revised: 02/15/2022] [Accepted: 03/02/2022] [Indexed: 12/17/2022] Open
Abstract
Synthetic lethality (SL) occurs between two genes when the inactivation of either gene alone has no effect on cell survival but the inactivation of both genes results in cell death. SL-based therapy has become one of the most promising targeted cancer therapies in the last decade as PARP inhibitors achieve great success in the clinic. The key point to exploiting SL-based cancer therapy is the identification of robust SL pairs. Although many wet-lab-based methods have been developed to screen SL pairs, known SL pairs are less than 0.1% of all potential pairs due to large number of human gene combinations. Computational prediction methods complement wet-lab-based methods to effectively reduce the search space of SL pairs. In this paper, we review the recent applications of computational methods and commonly used databases for SL prediction. First, we introduce the concept of SL and its screening methods. Second, various SL-related data resources are summarized. Then, computational methods including statistical-based methods, network-based methods, classical machine learning methods and deep learning methods for SL prediction are summarized. In particular, we elaborate on the negative sampling methods applied in these models. Next, representative tools for SL prediction are introduced. Finally, the challenges and future work for SL prediction are discussed.
Collapse
Affiliation(s)
- Jing Wang
- Department of Bioinformatics, Institute of Health Service and Transfusion Medicine, Beijing 100850, China
| | - Qinglong Zhang
- Department of Bioinformatics, Institute of Health Service and Transfusion Medicine, Beijing 100850, China
| | - Junshan Han
- Department of Bioinformatics, Institute of Health Service and Transfusion Medicine, Beijing 100850, China
| | - Yanpeng Zhao
- Department of Bioinformatics, Institute of Health Service and Transfusion Medicine, Beijing 100850, China
| | - Caiyun Zhao
- Department of Bioinformatics, Institute of Health Service and Transfusion Medicine, Beijing 100850, China
| | - Bowei Yan
- Department of Bioinformatics, Institute of Health Service and Transfusion Medicine, Beijing 100850, China
| | - Chong Dai
- Department of Bioinformatics, Institute of Health Service and Transfusion Medicine, Beijing 100850, China
| | - Lianlian Wu
- Department of Bioinformatics, Institute of Health Service and Transfusion Medicine, Beijing 100850, China
| | - Yuqi Wen
- Department of Bioinformatics, Institute of Health Service and Transfusion Medicine, Beijing 100850, China
| | - Yixin Zhang
- Department of Bioinformatics, Institute of Health Service and Transfusion Medicine, Beijing 100850, China
| | - Dongjin Leng
- Department of Bioinformatics, Institute of Health Service and Transfusion Medicine, Beijing 100850, China
| | - Zhongming Wang
- Department of Bioinformatics, Institute of Health Service and Transfusion Medicine, Beijing 100850, China
| | - Xiaoxi Yang
- Department of Bioinformatics, Institute of Health Service and Transfusion Medicine, Beijing 100850, China
| | - Song He
- Department of Bioinformatics, Institute of Health Service and Transfusion Medicine, Beijing 100850, China
| | - Xiaochen Bo
- Department of Bioinformatics, Institute of Health Service and Transfusion Medicine, Beijing 100850, China
| |
Collapse
|
8
|
Lai M, Chen G, Yang H, Yang J, Jiang Z, Wu M, Zheng J. Predicting Synthetic Lethality in Human Cancers via Multi-Graph Ensemble Neural Network. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2021; 2021:1731-1734. [PMID: 34891621 DOI: 10.1109/embc46164.2021.9630716] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Synthetic lethality (SL) is currently one of the most effective methods to identify new drugs for cancer treatment. It means that simultaneous inactivation target of two non-lethal genes will cause cell death, but loss of either will not. However, detecting SL pair is challenging due to the experimental costs. Artificial intelligence (AI) is a low-cost way to predict the potential SL relation between two genes. In this paper, a new Multi-Graph Ensemble (MGE) network structure combining graph neural network and existing knowledge about genes is proposed to predict SL pairs, which integrates the embedding of each feature with different neural networks to predict if a pair of genes have SL relation. It has a higher prediction performance compared with existing SL prediction methods. Also, with the integration of other biological knowledge, it has the potential of interpretability.
Collapse
|
9
|
Zhang H, Yan C, Zhou S, Guan J, Zhang J. Combined cause inference: Definition, model and performance. Inf Sci (N Y) 2021. [DOI: 10.1016/j.ins.2021.06.004] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
|
10
|
Benfatto S, Serçin Ö, Dejure FR, Abdollahi A, Zenke FT, Mardin BR. Uncovering cancer vulnerabilities by machine learning prediction of synthetic lethality. Mol Cancer 2021; 20:111. [PMID: 34454516 PMCID: PMC8401190 DOI: 10.1186/s12943-021-01405-8] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2021] [Accepted: 08/10/2021] [Indexed: 12/12/2022] Open
Abstract
BACKGROUND Synthetic lethality describes a genetic interaction between two perturbations, leading to cell death, whereas neither event alone has a significant effect on cell viability. This concept can be exploited to specifically target tumor cells. CRISPR viability screens have been widely employed to identify cancer vulnerabilities. However, an approach to systematically infer genetic interactions from viability screens is missing. METHODS Here we describe PAn-canceR Inferred Synthetic lethalities (PARIS), a machine learning approach to identify cancer vulnerabilities. PARIS predicts synthetic lethal (SL) interactions by combining CRISPR viability screens with genomics and transcriptomics data across hundreds of cancer cell lines profiled within the Cancer Dependency Map. RESULTS Using PARIS, we predicted 15 high confidence SL interactions within 549 DNA damage repair (DDR) genes. We show experimental validation of an SL interaction between the tumor suppressor CDKN2A, thymidine phosphorylase (TYMP) and the thymidylate synthase (TYMS), which may allow stratifying patients for treatment with TYMS inhibitors. Using genome-wide mapping of SL interactions for DDR genes, we unraveled a dependency between the aldehyde dehydrogenase ALDH2 and the BRCA-interacting protein BRIP1. Our results suggest BRIP1 as a potential therapeutic target in ~ 30% of all tumors, which express low levels of ALDH2. CONCLUSIONS PARIS is an unbiased, scalable and easy to adapt platform to identify SL interactions that should aid in improving cancer therapy with increased availability of cancer genomics data.
Collapse
Affiliation(s)
- Salvatore Benfatto
- BioMed X Institute (GmbH), Im Neuenheimer Feld 583, 69120, Heidelberg, Germany
| | - Özdemirhan Serçin
- BioMed X Institute (GmbH), Im Neuenheimer Feld 583, 69120, Heidelberg, Germany
| | - Francesca R Dejure
- BioMed X Institute (GmbH), Im Neuenheimer Feld 583, 69120, Heidelberg, Germany
| | - Amir Abdollahi
- Division of Molecular and Translational Radiation Oncology, National Centre for Tumour Diseases (NCT), Heidelberg University Hospital, 69120, Heidelberg, Germany
| | - Frank T Zenke
- Translational Innovation Platform Oncology & Immuno-Oncology, Merck KGaA, Frankfurter Str. 250, 64293, Darmstadt, Germany
| | - Balca R Mardin
- BioMed X Institute (GmbH), Im Neuenheimer Feld 583, 69120, Heidelberg, Germany.
| |
Collapse
|
11
|
Chen L, Li Z, Zeng T, Zhang YH, Li H, Huang T, Cai YD. Predicting gene phenotype by multi-label multi-class model based on essential functional features. Mol Genet Genomics 2021; 296:905-918. [PMID: 33914130 DOI: 10.1007/s00438-021-01789-8] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2021] [Accepted: 04/13/2021] [Indexed: 12/19/2022]
Abstract
Phenotype is one of the most significant concepts in genetics, which is used to describe all the characteristics of a research object that can be observed. Considering that phenotype reflects the integrated features of genotype and environment factors, it is hard to define phenotype characteristics, even difficult to predict unknown phenotypes. Restricted by current biological techniques, it is still quite expensive and time-consuming to obtain sufficient structural information of large-scale phenotype-associated genes/proteins. Various bioinformatics methods have been presented to solve such problem, and researchers have confirmed the efficacy and prediction accuracy of functional network-based prediction. But general functional descriptions have highly complicated inner structures for phenotype prediction. To further address this issue and improve the efficacy of phenotype prediction on more than ten kinds of phenotypes, we first extract functional enrichment features from GO and KEGG, and then use node2vec to learn functional embedding features of genes from a gene-gene network. All these features are analyzed by some feature selection methods (Boruta, minimum redundancy maximum relevance) to generate a feature list. Such list is fed into the incremental feature selection, incorporating some multi-label classifiers built by RAkEL and some classic base classifiers, to build an optimum multi-label multi-class classification model for phenotype prediction. According to recent researches, our method has indeed identified many literature-supported genes/proteins and their associated phenotypes, and even some candidate genes with re-assigned new phenotypes, which provide a new computational tool for the accurate and effective phenotypic prediction.
Collapse
Affiliation(s)
- Lei Chen
- School of Life Sciences, Shanghai University, Shanghai, 200444, People's Republic of China.,College of Information Engineering, Shanghai Maritime University, Shanghai, 201306, People's Republic of China
| | - Zhandong Li
- College of Food Engineering, Jilin Engineering Normal University, Changchun, 130052, People's Republic of China
| | - Tao Zeng
- CAS Key Laboratory of Computational Biology, Bio-Med Big Data Center, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, 200031, People's Republic of China
| | - Yu-Hang Zhang
- Channing Division of Network Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
| | - Hao Li
- College of Food Engineering, Jilin Engineering Normal University, Changchun, 130052, People's Republic of China
| | - Tao Huang
- Key Laboratory of Tissue Microenvironment and Tumor, Shanghai Institute of Nutrition and Health, Chinese Academy of Sciences, Shanghai, 200031, People's Republic of China.
| | - Yu-Dong Cai
- School of Life Sciences, Shanghai University, Shanghai, 200444, People's Republic of China.
| |
Collapse
|
12
|
Identifying Infliximab- (IFX-) Responsive Blood Signatures for the Treatment of Rheumatoid Arthritis. BIOMED RESEARCH INTERNATIONAL 2021. [DOI: 10.1155/2021/5556784] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Rheumatoid arthritis (RA) is a severe chronic pathogenic inflammatory abnormality that damages small joints. Comprehensive diagnosis and treatment procedures for RA have been established because of its severe symptoms and relatively high morbidity. Medication and surgery are the two major therapeutic approaches. Infliximab (IFX) is a novel biological agent applied for the treatment of RA. IFX improves physical functions and benefits the achievement of clinical remission even under discontinuous medication. However, not all patients react to IFX, and distinguishing IFX-sensitive and IFX-resistant patients is quite difficult. Thus, how to predict the therapeutic effects of IFX on patients with RA is one of the urgent translational medicine problems in the clinical treatment of RA. In this study, we present a novel computational method for the identification of the applicable and substantial blood gene signatures of IFX sensitivity by liquid biopsy, which may assist in the establishment of a clinical drug sensitivity test standard for RA and contribute to the revelation of unique IFX-associated pharmacological mechanisms.
Collapse
|
13
|
Qin X, Liu M, Zhang L, Liu G. Structural protein fold recognition based on secondary structure and evolutionary information using machine learning algorithms. Comput Biol Chem 2021; 91:107456. [PMID: 33610129 DOI: 10.1016/j.compbiolchem.2021.107456] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2020] [Revised: 01/04/2021] [Accepted: 02/06/2021] [Indexed: 11/18/2022]
Abstract
Understanding the function of protein is conducive to research in advanced fields such as gene therapy of diseases, the development and design of new drugs, etc. The prerequisite for understanding the function of a protein is to determine its tertiary structure. The realization of protein structure classification is indispensable for this problem and fold recognition is a commonly used method of protein structure classification. Protein sequences of 40% identity in the ASTRAL protein classification database are used for fold recognition research in current work to predict 27 folding types which mostly belong to four protein structural classes: α, β, α/β and α + β. We extract features from primary structure of protein using methods covering DSSP, PSSM and HMM which are based on secondary structure and evolutionary information to convert protein sequences into feature vectors that can be recognized by machine learning algorithm and utilize the combination of LightGBM feature selection algorithm and incremental feature selection method (IFS) to find the optimal classifiers respectively constructed by machine learning algorithms on the basis of tree structure including Random Forest, XGBoost and LightGBM. Bayesian optimization method is used for hyper-parameter adjustment of machine learning algorithms to make the accuracy of fold recognition reach as high as 93.45% at last. The result obtained by the model we propose is outstanding in the study of protein fold recognition.
Collapse
Affiliation(s)
- Xinyi Qin
- College of Information Engineering, Shanghai Maritime University, Shanghai 201306, China.
| | - Min Liu
- College of Information Engineering, Shanghai Maritime University, Shanghai 201306, China.
| | - Lu Zhang
- College of Information Engineering, Shanghai Maritime University, Shanghai 201306, China.
| | - Guangzhong Liu
- College of Information Engineering, Shanghai Maritime University, Shanghai 201306, China.
| |
Collapse
|
14
|
Identifying the Immunological Gene Signatures of Immune Cell Subtypes. BIOMED RESEARCH INTERNATIONAL 2021. [DOI: 10.1155/2021/6639698] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]
Abstract
The immune system is a complicated defensive system that comprises multiple functional cells and molecules acting against endogenous and exogenous pathogenic factors. Identifying immune cell subtypes and recognizing their unique immunological functions are difficult because of the complicated cellular components and immunological functions of the immune system. With the development of transcriptomics and high-throughput sequencing, the gene expression profiling of immune cells can provide a new strategy to explore the immune cell subtyping. On the basis of the new profiling data of mouse immune cell gene expression from the Immunological Genome Project (ImmGen), a novel computational pipeline was applied to identify different immune cell subtypes, including αβ T cells, B cells, γδ T cells, and innate lymphocytes. First, the profiling data was analyzed by a powerful feature selection method, Monte-Carlo Feature Selection, resulting in a feature list and some informative features. For the list, the two-stage incremental feature selection method, incorporating random forest as the classification algorithm, was applied to extract essential gene signatures and build an efficient classifier. On the other hand, a rule learning scheme was applied on the informative features to construct quantitative expression rules. A group of gene signatures was found as qualitatively related to the biological processes of four immune cell subtypes. The quantitative expression rules can efficiently cluster immune cells. This work provides a novel computational tool for immune cell quantitative subtyping and biomarker recognition.
Collapse
|
15
|
Cai R, Chen X, Fang Y, Wu M, Hao Y. Dual-dropout graph convolutional network for predicting synthetic lethality in human cancers. Bioinformatics 2021; 36:4458-4465. [PMID: 32221609 DOI: 10.1093/bioinformatics/btaa211] [Citation(s) in RCA: 44] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2019] [Revised: 03/02/2020] [Accepted: 03/25/2020] [Indexed: 12/20/2022] Open
Abstract
MOTIVATION Synthetic lethality (SL) is a promising form of gene interaction for cancer therapy, as it is able to identify specific genes to target at cancer cells without disrupting normal cells. As high-throughput wet-lab settings are often costly and face various challenges, computational approaches have become a practical complement. In particular, predicting SLs can be formulated as a link prediction task on a graph of interacting genes. Although matrix factorization techniques have been widely adopted in link prediction, they focus on mapping genes to latent representations in isolation, without aggregating information from neighboring genes. Graph convolutional networks (GCN) can capture such neighborhood dependency in a graph. However, it is still challenging to apply GCN for SL prediction as SL interactions are extremely sparse, which is more likely to cause overfitting. RESULTS In this article, we propose a novel dual-dropout GCN (DDGCN) for learning more robust gene representations for SL prediction. We employ both coarse-grained node dropout and fine-grained edge dropout to address the issue that standard dropout in vanilla GCN is often inadequate in reducing overfitting on sparse graphs. In particular, coarse-grained node dropout can efficiently and systematically enforce dropout at the node (gene) level, while fine-grained edge dropout can further fine-tune the dropout at the interaction (edge) level. We further present a theoretical framework to justify our model architecture. Finally, we conduct extensive experiments on human SL datasets and the results demonstrate the superior performance of our model in comparison with state-of-the-art methods. AVAILABILITY AND IMPLEMENTATION DDGCN is implemented in Python 3.7, open-source and freely available at https://github.com/CXX1113/Dual-DropoutGCN. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Ruichu Cai
- School of Computer Science, Guangdong University of Technology, Guangzhou 510006, China
| | - Xuexin Chen
- School of Computer Science, Guangdong University of Technology, Guangzhou 510006, China
| | - Yuan Fang
- School of Information Systems, Singapore Management University, 178902 Singapore
| | - Min Wu
- Institute for Infocomm Research (I2R), A*STAR, 138632 Singapore
| | - Yuexing Hao
- Computer Science Department, Rutgers Univeristy New Brunswick, New Brunswick, NJ 08854, USA
| |
Collapse
|
16
|
Pan X, Li H, Zeng T, Li Z, Chen L, Huang T, Cai YD. Identification of Protein Subcellular Localization With Network and Functional Embeddings. Front Genet 2021; 11:626500. [PMID: 33584818 PMCID: PMC7873866 DOI: 10.3389/fgene.2020.626500] [Citation(s) in RCA: 44] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2020] [Accepted: 12/21/2020] [Indexed: 01/15/2023] Open
Abstract
The functions of proteins are mainly determined by their subcellular localizations in cells. Currently, many computational methods for predicting the subcellular localization of proteins have been proposed. However, these methods require further improvement, especially when used in protein representations. In this study, we present an embedding-based method for predicting the subcellular localization of proteins. We first learn the functional embeddings of KEGG/GO terms, which are further used in representing proteins. Then, we characterize the network embeddings of proteins on a protein–protein network. The functional and network embeddings are combined as novel representations of protein locations for the construction of the final classification model. In our collected benchmark dataset with 4,861 proteins from 16 locations, the best model shows a Matthews correlation coefficient of 0.872 and is thus superior to multiple conventional methods.
Collapse
Affiliation(s)
- Xiaoyong Pan
- School of Life Sciences, Shanghai University, Shanghai, China.,Key Laboratory of System Control and Information Processing, Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, Ministry of Education of China, Shanghai, China
| | - Hao Li
- College of Food Engineering, Jilin Engineering Normal University, Changchun, China
| | - Tao Zeng
- Bio-Med Big Data Center, CAS Key Laboratory of Computational Biology, CAS-MPG Partner Institute for Computational Biology, Shanghai Institute of Nutrition and Health, Chinese Academy of Sciences, Shanghai, China
| | - Zhandong Li
- College of Food Engineering, Jilin Engineering Normal University, Changchun, China
| | - Lei Chen
- College of Information Engineering, Shanghai Maritime University, Shanghai, China
| | - Tao Huang
- Key Laboratory of Tissue Microenvironment and Tumor, Shanghai Institute of Nutrition and Health, Chinese Academy of Sciences, Shanghai, China
| | - Yu-Dong Cai
- School of Life Sciences, Shanghai University, Shanghai, China
| |
Collapse
|
17
|
Zhang YH, Li H, Zeng T, Chen L, Li Z, Huang T, Cai YD. Identifying Transcriptomic Signatures and Rules for SARS-CoV-2 Infection. Front Cell Dev Biol 2021; 8:627302. [PMID: 33505977 PMCID: PMC7829664 DOI: 10.3389/fcell.2020.627302] [Citation(s) in RCA: 51] [Impact Index Per Article: 12.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2020] [Accepted: 12/14/2020] [Indexed: 12/26/2022] Open
Abstract
The world-wide Coronavirus Disease 2019 (COVID-19) pandemic was triggered by the widespread of a new strain of coronavirus named as severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). Multiple studies on the pathogenesis of SARS-CoV-2 have been conducted immediately after the spread of the disease. However, the molecular pathogenesis of the virus and related diseases has still not been fully revealed. In this study, we attempted to identify new transcriptomic signatures as candidate diagnostic models for clinical testing or as therapeutic targets for vaccine design. Using the recently reported transcriptomics data of upper airway tissue with acute respiratory illnesses, we integrated multiple machine learning methods to identify effective qualitative biomarkers and quantitative rules for the distinction of SARS-CoV-2 infection from other infectious diseases. The transcriptomics data was first analyzed by Boruta so that important features were selected, which were further evaluated by the minimum redundancy maximum relevance method. A feature list was produced. This list was fed into the incremental feature selection, incorporating some classification algorithms, to extract qualitative biomarker genes and construct quantitative rules. Also, an efficient classifier was built to identify patients infected with SARS-COV-2. The findings reported in this study may help in revealing the potential pathogenic mechanisms of COVID-19 and finding new targets for vaccine design.
Collapse
Affiliation(s)
- Yu-Hang Zhang
- School of Life Sciences, Shanghai University, Shanghai, China
- Channing Division of Network Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA, United States
| | - Hao Li
- College of Food Engineering, Jilin Engineering Normal University, Changchun, China
| | - Tao Zeng
- Bio-Med Big Data Center, CAS Key Laboratory of Computational Biology, CAS-MPG Partner Institute for Computational Biology, Shanghai Institute of Nutrition and Health, Chinese Academy of Sciences, Shanghai, China
| | - Lei Chen
- College of Information Engineering, Shanghai Maritime University, Shanghai, China
| | - Zhandong Li
- College of Food Engineering, Jilin Engineering Normal University, Changchun, China
| | - Tao Huang
- Shanghai Institute of Nutrition and Health, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, China
| | - Yu-Dong Cai
- School of Life Sciences, Shanghai University, Shanghai, China
| |
Collapse
|
18
|
Myers S, Ortega JA, Cavalli A. Synthetic Lethality through the Lens of Medicinal Chemistry. J Med Chem 2020; 63:14151-14183. [PMID: 33135887 PMCID: PMC8015234 DOI: 10.1021/acs.jmedchem.0c00766] [Citation(s) in RCA: 23] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2020] [Indexed: 02/07/2023]
Abstract
Personalized medicine and therapies represent the goal of modern medicine, as drug discovery strives to move away from one-cure-for-all and makes use of the various targets and biomarkers within differing disease areas. This approach, especially in oncology, is often undermined when the cells make use of alternative survival pathways. As such, acquired resistance is unfortunately common. In order to combat this phenomenon, synthetic lethality is being investigated, making use of existing genetic fragilities within the cancer cell. This Perspective highlights exciting targets within synthetic lethality, (PARP, ATR, ATM, DNA-PKcs, WEE1, CDK12, RAD51, RAD52, and PD-1) and discusses the medicinal chemistry programs being used to interrogate them, the challenges these programs face, and what the future holds for this promising field.
Collapse
Affiliation(s)
- Samuel
H. Myers
- Computational
& Chemical Biology, Istituto Italiano
di Tecnologia, 16163 Genova, Italy
| | - Jose Antonio Ortega
- Computational
& Chemical Biology, Istituto Italiano
di Tecnologia, 16163 Genova, Italy
| | - Andrea Cavalli
- Computational
& Chemical Biology, Istituto Italiano
di Tecnologia, 16163 Genova, Italy
- Department
of Pharmacy and Biotechnology, University
of Bologna, 40126 Bologna, Italy
| |
Collapse
|
19
|
Wu Z, Shou L, Wang J, Huang T, Xu X. The Methylation Pattern for Knee and Hip Osteoarthritis. Front Cell Dev Biol 2020; 8:602024. [PMID: 33240895 PMCID: PMC7677303 DOI: 10.3389/fcell.2020.602024] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2020] [Accepted: 10/22/2020] [Indexed: 01/08/2023] Open
Abstract
Osteoarthritis is one of the most prevalent chronic joint diseases for middle-aged and elderly people. But in recent years, the number of young people suffering from the disease increases quickly. It is known that osteoarthritis is a common degenerative disease caused by the combination and interaction of many factors such as natural and environmental factors. DNA methylations reflect the effects of environmental factors. Several researches on DNA methylation at specific genes in OA cartilage indicated the great potential roles of DNA methylation in OA. To systematically investigate the methylation pattern in knee and hip osteoarthritis, we analyzed the methylation profiles in cartilage of 16 OA hip samples, 19 control hip samples and 62 OA knee samples. 12 discriminative methylation sites were identified using advanced minimal Redundancy Maximal Relevance (mRMR) and Incremental Feature Selection (IFS) methods. The SVM classifier of these 12 methylation sites from genes like MEIS1, GABRG3, RXRA, and EN1, can perfectly classify the OA hip samples, control hip samples and OA knee samples evaluated with LOOCV (Leave-One Out-Cross Validation). These 12 methylation sites can not only serve as biomarker, but also provide underlying mechanism of OA.
Collapse
Affiliation(s)
- Zhen Wu
- Departmemt of Orthopaedics, Tongde Hospital of Zhejiang Province, Hangzhou, China
| | - Lu Shou
- Departmemt of Pneumology, Tongde Hospital of Zhejiang Province, Hangzhou, China
| | - Jian Wang
- Departmemt of Orthopaedics, Tongde Hospital of Zhejiang Province, Hangzhou, China
| | - Tao Huang
- Shanghai Institute of Nutrition and Health, Chinese Academy of Sciences, Shanghai, China
| | - Xinwei Xu
- Departmemt of Orthopaedics, Tongde Hospital of Zhejiang Province, Hangzhou, China
| |
Collapse
|
20
|
Identification of Latent Oncogenes with a Network Embedding Method and Random Forest. BIOMED RESEARCH INTERNATIONAL 2020; 2020:5160396. [PMID: 33029511 PMCID: PMC7530476 DOI: 10.1155/2020/5160396] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/20/2020] [Revised: 09/09/2020] [Accepted: 09/14/2020] [Indexed: 12/29/2022]
Abstract
Oncogene is a special type of genes, which can promote the tumor initiation. Good study on oncogenes is helpful for understanding the cause of cancers. Experimental techniques in early time are quite popular in detecting oncogenes. However, their defects become more and more evident in recent years, such as high cost and long time. The newly proposed computational methods provide an alternative way to study oncogenes, which can provide useful clues for further investigations on candidate genes. Considering the limitations of some previous computational methods, such as lack of learning procedures and terming genes as individual subjects, a novel computational method was proposed in this study. The method adopted the features derived from multiple protein networks, viewing proteins in a system level. A classic machine learning algorithm, random forest, was applied on these features to capture the essential characteristic of oncogenes, thereby building the prediction model. All genes except validated oncogenes were ranked with a measurement yielded by the prediction model. Top genes were quite different from potential oncogenes discovered by previous methods, and they can be confirmed to become novel oncogenes. It was indicated that the newly identified genes can be essential supplements for previous results.
Collapse
|
21
|
Zhang YH, Jin M, Li J, Kong X. Identifying circulating miRNA biomarkers for early diagnosis and monitoring of lung cancer. Biochim Biophys Acta Mol Basis Dis 2020; 1866:165847. [DOI: 10.1016/j.bbadis.2020.165847] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2020] [Revised: 04/28/2020] [Accepted: 05/19/2020] [Indexed: 02/09/2023]
|
22
|
Xu Y, Zhang YH, Li J, Pan XY, Huang T, Cai YD. New Computational Tool Based on Machine-learning Algorithms for the Identification of Rhinovirus Infection-Related Genes. Comb Chem High Throughput Screen 2020; 22:665-674. [PMID: 31782358 DOI: 10.2174/1386207322666191129114741] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2019] [Revised: 05/22/2019] [Accepted: 07/09/2019] [Indexed: 12/14/2022]
Abstract
BACKGROUND Human rhinovirus has different identified serotypes and is the most common cause of cold in humans. To date, many genes have been discovered to be related to rhinovirus infection. However, the pathogenic mechanism of rhinovirus is difficult to elucidate through experimental approaches due to the high cost and consuming time. METHODS AND RESULTS In this study, we presented a novel approach that relies on machine-learning algorithms and identified two genes OTOF and SOCS1. The expression levels of these genes in the blood samples can be used to accurately distinguish virus-infected and non-infected individuals. CONCLUSION Our findings suggest the crucial roles of these two genes in rhinovirus infection and the robustness of the computational tool in dissecting pathogenic mechanisms.
Collapse
Affiliation(s)
- Yan Xu
- School of Life Sciences, Shanghai University, Shanghai 200444, China
| | - Yu-Hang Zhang
- Shanghai Institute of Nutrition and Health, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai 200031, China
| | - JiaRui Li
- School of Life Sciences, Shanghai University, Shanghai 200444, China
| | - Xiao Y Pan
- BASF & IDLab, Ghent University, Ghent, Belgium
| | - Tao Huang
- Shanghai Institute of Nutrition and Health, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai 200031, China
| | - Yu-Dong Cai
- School of Life Sciences, Shanghai University, Shanghai 200444, China
| |
Collapse
|
23
|
Pan X, Zeng T, Zhang YH, Chen L, Feng K, Huang T, Cai YD. Investigation and Prediction of Human Interactome Based on Quantitative Features. Front Bioeng Biotechnol 2020; 8:730. [PMID: 32766217 PMCID: PMC7379396 DOI: 10.3389/fbioe.2020.00730] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2020] [Accepted: 06/09/2020] [Indexed: 01/27/2023] Open
Abstract
Protein is one of the most significant components of all living creatures. All significant and essential biological structures and functions relies on proteins and their respective biological functions. However, proteins cannot perform their unique biological significance independently. They have to interact with each other to realize the complicated biological processes in all living creatures including human beings. In other words, proteins depend on interactions (protein-protein interactions) to realize their significant effects. Thus, the significance comparison and quantitative contribution of candidate PPI features must be determined urgently. According to previous studies, 258 physical and chemical characteristics of proteins have been reported and confirmed to definitively affect the interaction efficiency of the related proteins. Among such features, essential physiochemical features of proteins like stoichiometric balance, protein abundance, molecular weight and charge distribution have been validated to be quite significant and irreplaceable for protein-protein interactions (PPIs). Therefore, in this study, we, on one hand, presented a novel computational framework to identify the key factors affecting PPIs with Boruta feature selection (BFS), Monte Carlo feature selection (MCFS), incremental feature selection (IFS), and on the other hand, built a quantitative decision-rule system to evaluate the potential PPIs under real conditions with random forest (RF) and RIPPER algorithms, thereby supplying several new insights into the detailed biological mechanisms of complicated PPIs. The main datasets and codes can be downloaded at https://github.com/xypan1232/Mass-PPI.
Collapse
Affiliation(s)
- Xiaoyong Pan
- School of Life Sciences, Shanghai University, Shanghai, China.,Key Laboratory of System Control and Information Processing, Ministry of Education of China, Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, Shanghai, China
| | - Tao Zeng
- Key Laboratory of Systems Biology, Institute of Biochemistry and Cell Biology, Chinese Academy of Sciences, Shanghai, China
| | - Yu-Hang Zhang
- Shanghai Institute of Nutrition and Health, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, China
| | - Lei Chen
- College of Information Engineering, Shanghai Maritime University, Shanghai, China
| | - Kaiyan Feng
- Department of Computer Science, Guangdong AIB Polytechnic, Guangzhou, China
| | - Tao Huang
- Shanghai Institute of Nutrition and Health, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, China
| | - Yu-Dong Cai
- School of Life Sciences, Shanghai University, Shanghai, China
| |
Collapse
|
24
|
Alternative Polyadenylation Modification Patterns Reveal Essential Posttranscription Regulatory Mechanisms of Tumorigenesis in Multiple Tumor Types. BIOMED RESEARCH INTERNATIONAL 2020; 2020:6384120. [PMID: 32626751 PMCID: PMC7315320 DOI: 10.1155/2020/6384120] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/28/2020] [Revised: 05/26/2020] [Accepted: 05/30/2020] [Indexed: 12/11/2022]
Abstract
Among various risk factors for the initiation and progression of cancer, alternative polyadenylation (APA) is a remarkable endogenous contributor that directly triggers the malignant phenotype of cancer cells. APA affects biological processes at a transcriptional level in various ways. As such, APA can be involved in tumorigenesis through gene expression, protein subcellular localization, or transcription splicing pattern. The APA sites and status of different cancer types may have diverse modification patterns and regulatory mechanisms on transcripts. Potential APA sites were screened by applying several machine learning algorithms on a TCGA-APA dataset. First, a powerful feature selection method, minimum redundancy maximum relevancy, was applied on the dataset, resulting in a feature list. Then, the feature list was fed into the incremental feature selection, which incorporated the support vector machine as the classification algorithm, to extract key APA features and build a classifier. The classifier can classify cancer patients into cancer types with perfect performance. The key APA-modified genes had a potential prognosis ability because of their significant power in the survival analysis of TCGA pan-cancer data.
Collapse
|
25
|
Zhang S, Zeng T, Hu B, Zhang YH, Feng K, Chen L, Niu Z, Li J, Huang T, Cai YD. Discriminating Origin Tissues of Tumor Cell Lines by Methylation Signatures and Dys-Methylated Rules. Front Bioeng Biotechnol 2020; 8:507. [PMID: 32528944 PMCID: PMC7264161 DOI: 10.3389/fbioe.2020.00507] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2020] [Accepted: 04/30/2020] [Indexed: 12/18/2022] Open
Abstract
DNA methylation is an essential epigenetic modification for multiple biological processes. DNA methylation in mammals acts as an epigenetic mark of transcriptional repression. Aberrant levels of DNA methylation can be observed in various types of tumor cells. Thus, DNA methylation has attracted considerable attention among researchers to provide new and feasible tumor therapies. Conventional studies considered single-gene methylation or specific loci as biomarkers for tumorigenesis. However, genome-scale methylated modification has not been completely investigated. Thus, we proposed and compared two novel computational approaches based on multiple machine learning algorithms for the qualitative and quantitative analyses of methylation-associated genes and their dys-methylated patterns. This study contributes to the identification of novel effective genes and the establishment of optimal quantitative rules for aberrant methylation distinguishing tumor cells with different origin tissues.
Collapse
Affiliation(s)
- Shiqi Zhang
- School of Life Sciences, Shanghai University, Shanghai, China.,Department of Biostatistics, University of Copenhagen, Copenhagen, Denmark
| | - Tao Zeng
- Shanghai Research Center for Brain Science and Brain-Inspired Intelligence, Shanghai, China
| | - Bin Hu
- State Key Laboratory of Livestock and Poultry Breeding, Guangdong Public Laboratory of Animal Breeding and Nutrition, Guangdong Key Laboratory of Animal Breeding and Nutrition, Institute of Animal Science, Guangdong Academy of Agricultural Sciences, Guangzhou, China
| | - Yu-Hang Zhang
- Shanghai Institute of Nutrition and Health, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, China
| | - Kaiyan Feng
- Department of Computer Science, Guangdong AIB Polytechnic, Guangzhou, China
| | - Lei Chen
- College of Information Engineering, Shanghai Maritime University, Shanghai, China
| | - Zhibin Niu
- College of Intelligence and Computing, Tianjin University, Tianjin, China
| | - Jianhao Li
- State Key Laboratory of Livestock and Poultry Breeding, Guangdong Public Laboratory of Animal Breeding and Nutrition, Guangdong Key Laboratory of Animal Breeding and Nutrition, Institute of Animal Science, Guangdong Academy of Agricultural Sciences, Guangzhou, China
| | - Tao Huang
- Shanghai Institute of Nutrition and Health, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, China
| | - Yu-Dong Cai
- School of Life Sciences, Shanghai University, Shanghai, China
| |
Collapse
|
26
|
Che J, Chen L, Guo ZH, Wang S, Aorigele. Drug Target Group Prediction with Multiple Drug Networks. Comb Chem High Throughput Screen 2020; 23:274-284. [DOI: 10.2174/1386207322666190702103927] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2018] [Revised: 03/11/2019] [Accepted: 04/15/2019] [Indexed: 02/07/2023]
Abstract
Background:
Identification of drug-target interaction is essential in drug discovery. It is
beneficial to predict unexpected therapeutic or adverse side effects of drugs. To date, several
computational methods have been proposed to predict drug-target interactions because they are
prompt and low-cost compared with traditional wet experiments.
Methods:
In this study, we investigated this problem in a different way. According to KEGG,
drugs were classified into several groups based on their target proteins. A multi-label classification
model was presented to assign drugs into correct target groups. To make full use of the known drug
properties, five networks were constructed, each of which represented drug associations in one
property. A powerful network embedding method, Mashup, was adopted to extract drug features
from above-mentioned networks, based on which several machine learning algorithms, including
RAndom k-labELsets (RAKEL) algorithm, Label Powerset (LP) algorithm and Support Vector
Machine (SVM), were used to build the classification model.
Results and Conclusion:
Tenfold cross-validation yielded the accuracy of 0.839, exact match of
0.816 and hamming loss of 0.037, indicating good performance of the model. The contribution of
each network was also analyzed. Furthermore, the network model with multiple networks was
found to be superior to the one with a single network and classic model, indicating the superiority
of the proposed model.
Collapse
Affiliation(s)
- Jingang Che
- College of Information Engineering, Shanghai Maritime University, Shanghai 201306, China
| | - Lei Chen
- College of Information Engineering, Shanghai Maritime University, Shanghai 201306, China
| | - Zi-Han Guo
- College of Information Engineering, Shanghai Maritime University, Shanghai 201306, China
| | - Shuaiqun Wang
- College of Information Engineering, Shanghai Maritime University, Shanghai 201306, China
| | - Aorigele
- Faculty of Engineering, University of Toyama, Toyama, Japan
| |
Collapse
|
27
|
Yuan F, Pan X, Zeng T, Zhang YH, Chen L, Gan Z, Huang T, Cai YD. Identifying Cell-Type Specific Genes and Expression Rules Based on Single-Cell Transcriptomic Atlas Data. Front Bioeng Biotechnol 2020; 8:350. [PMID: 32411685 PMCID: PMC7201067 DOI: 10.3389/fbioe.2020.00350] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2020] [Accepted: 03/30/2020] [Indexed: 01/07/2023] Open
Abstract
Single-cell sequencing technologies have emerged to address new and longstanding biological and biomedical questions. Previous studies focused on the analysis of bulk tissue samples composed of millions of cells. However, the genomes within the cells of an individual multicellular organism are not always the same. In this study, we aimed to identify the crucial and characteristically expressed genes that may play functional roles in tissue development and organogenesis, by analyzing a single-cell transcriptomic atlas of mice. We identified the most relevant gene features and decision rules classifying 18 cell categories, providing a list of genes that may perform important functions in the process of tissue development because of their tissue-specific expression patterns. These genes may serve as biomarkers to identify the origin of unknown cell subgroups so as to recognize specific cell stages/states during the dynamic process, and also be applied as potential therapy targets for developmental disorders.
Collapse
Affiliation(s)
- Fei Yuan
- School of Life Sciences, Shanghai University, Shanghai, China.,Department of Science and Technology, Binzhou Medical University Hospital, Binzhou, China
| | - XiaoYong Pan
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai, China
| | - Tao Zeng
- Key Laboratory of Systems Biology, Institute of Biochemistry and Cell Biology, Chinese Academy of Sciences, Shanghai, China
| | - Yu-Hang Zhang
- Shanghai Institute of Nutrition and Health, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, China
| | - Lei Chen
- College of Information Engineering, Shanghai Maritime University, Shanghai, China.,Shanghai Key Laboratory of Pure Mathematics and Mathematical Practice, East China Normal University, Shanghai, China
| | - Zijun Gan
- Shanghai Institute of Nutrition and Health, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, China
| | - Tao Huang
- Shanghai Institute of Nutrition and Health, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, China
| | - Yu-Dong Cai
- School of Life Sciences, Shanghai University, Shanghai, China
| |
Collapse
|
28
|
Analysis of gene expression profiles of lung cancer subtypes with machine learning algorithms. Biochim Biophys Acta Mol Basis Dis 2020; 1866:165822. [PMID: 32360590 DOI: 10.1016/j.bbadis.2020.165822] [Citation(s) in RCA: 26] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2020] [Revised: 04/13/2020] [Accepted: 04/22/2020] [Indexed: 12/14/2022]
Abstract
Lung cancer is one of the most common cancer types worldwide and causes more than one million deaths annually. Lung adenocarcinoma (AC) and lung squamous cell cancer (SCC) are two major lung cancer subtypes and have different characteristics in several aspects. Identifying their differentially expressed genes and different gene expression patterns can deepen our understanding of these two subtypes at the transcriptomic level. In this work, we used several machine learning algorithms to investigate the gene expression profiles of lung AC and lung SCC samples retrieved from Gene Expression Omnibus. First, the profiles were analyzed by using a powerful feature selection method, namely, Monte Carlo feature selection. A feature list, ranking all features according to their importance, and some informative features were obtained. Then, the feature list was used in the incremental feature selection method to extract optimal features, which can allow the support vector machine (SVM) to yield the best performance for classifying lung AC and lung SCC samples. Some top genes (CSTA, TP63, SERPINB13, CLCA2, BICD2, PERP, FAT2, BNC1, ATP11B, FAM83B, KRT5, PARD6G, PKP1) were extensively analyzed to prove that they can be differentially expressed genes between lung AC and lung SCC. Meanwhile, a rule learning procedure was applied on informative features to construct the classification rules. These rules provide a clear procedure of classification and show some different gene expression patterns between lung AC and lung SCC.
Collapse
|
29
|
Al-Harazi O, El Allali A, Colak D. Biomolecular Databases and Subnetwork Identification Approaches of Interest to Big Data Community: An Expert Review. OMICS-A JOURNAL OF INTEGRATIVE BIOLOGY 2020; 23:138-151. [PMID: 30883301 DOI: 10.1089/omi.2018.0205] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
Next-generation sequencing approaches and genome-wide studies have become essential for characterizing the mechanisms of human diseases. Consequently, many researchers have applied these approaches to discover the genetic/genomic causes of common complex and rare human diseases, generating multiomics big data that span the continuum of genomics, proteomics, metabolomics, and many other system science fields. Therefore, there is a significant and unmet need for biological databases and tools that enable and empower the researchers to analyze, integrate, and make sense of big data. There are currently large number of databases that offer different types of biological information. In particular, the integration of gene expression profiles and protein-protein interaction networks provides a deeper understanding of the complex multilayered molecular architecture of human diseases. Therefore, there has been a growing interest in developing methodologies that integrate and contextualize big data from molecular interaction networks to identify biomarkers of human diseases at a subnetwork resolution as well. In this expert review, we provide a comprehensive summary of most popular biomolecular databases for molecular interactions (e.g., Biological General Repository for Interaction Datasets, Kyoto Encyclopedia of Genes and Genomes and Search Tool for The Retrieval of Interacting Genes/Proteins), gene-disease associations (e.g., Online Mendelian Inheritance in Man, Disease-Gene Network, MalaCards), and population-specific databases (e.g., Human Genetic Variation Database), and describe some examples of their usage and potential applications. We also present the most recent subnetwork identification approaches and discuss their main advantages and limitations. As the field of data science continues to emerge, the present analysis offers a deeper and contextualized understanding of the available databases in molecular biomedicine.
Collapse
Affiliation(s)
- Olfat Al-Harazi
- 1 Department of Biostatistics, Epidemiology, and Scientific Computing, King Faisal Specialist Hospital and Research Centre, Riyadh, Saudi Arabia.,2 Computer Science Department, College of Computer and Information Sciences, King Saud University, Riyadh, Saudi Arabia
| | - Achraf El Allali
- 2 Computer Science Department, College of Computer and Information Sciences, King Saud University, Riyadh, Saudi Arabia
| | - Dilek Colak
- 1 Department of Biostatistics, Epidemiology, and Scientific Computing, King Faisal Specialist Hospital and Research Centre, Riyadh, Saudi Arabia
| |
Collapse
|
30
|
Liu F, Dong H, Mei Z, Huang T. Investigation of miRNA and mRNA Co-expression Network in Ependymoma. Front Bioeng Biotechnol 2020; 8:177. [PMID: 32266223 PMCID: PMC7096354 DOI: 10.3389/fbioe.2020.00177] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2020] [Accepted: 02/20/2020] [Indexed: 12/18/2022] Open
Abstract
Ependymoma (EPN) is a rare primary tumor of the central nervous system (CNS) that affects both children and adults. Despite the definition and classification of distinct molecular subgroups, there remains a group of EPNs with a balanced genome, which makes it difficult to predict a prognosis of patients with EPN. The role of miRNA-mRNA network on EPN is still poorly understood. We assessed the involvement of miRNA-mRNA pairs in EPN by applying a weighted co-expression network analysis (WGCNA) approach. Using whole genome expression profile analysis followed by functional enrichment, we detected hub genes involved in active proliferation and DNA replication of nerve cells. Key genes including CYP11B1, KRT33B, RUNX1T1, SIK1, MAP3K4, MLANA, and SFRP5 identified in co-expression networks were regulated by miR-15a and miR-24-1. These seven miRNA-mRNA pairs were considered to influence not only pathways in cancer and tumor suppression process, but also MAPK, NF-kappaB, and WNT signaling pathways which were associated with tumorigenesis and development. This study provides a novel insight into potential diagnostic biomarkers of EPN and may have value in choosing therapeutic targets with clinical utility.
Collapse
Affiliation(s)
- Feili Liu
- Department of Neurosurgery, Huashan Hospital, Fudan University, Shanghai, China
| | - Hang Dong
- Shanghai Institute of Nutrition and Health, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, China
| | - Zi Mei
- Shanghai Institute of Nutrition and Health, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, China
| | - Tao Huang
- Shanghai Institute of Nutrition and Health, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, China
| |
Collapse
|
31
|
Wang C, Zhang J, Wang X, Han K, Guo M. Pathogenic Gene Prediction Algorithm Based on Heterogeneous Information Fusion. Front Genet 2020; 11:5. [PMID: 32117433 PMCID: PMC7010852 DOI: 10.3389/fgene.2020.00005] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2019] [Accepted: 01/06/2020] [Indexed: 12/23/2022] Open
Abstract
Complex diseases seriously affect people's physical and mental health. The discovery of disease-causing genes has become a target of research. With the emergence of bioinformatics and the rapid development of biotechnology, to overcome the inherent difficulties of the long experimental period and high cost of traditional biomedical methods, researchers have proposed many gene prioritization algorithms that use a large amount of biological data to mine pathogenic genes. However, because the currently known gene-disease association matrix is still very sparse and lacks evidence that genes and diseases are unrelated, there are limits to the predictive performance of gene prioritization algorithms. Based on the hypothesis that functionally related gene mutations may lead to similar disease phenotypes, this paper proposes a PU induction matrix completion algorithm based on heterogeneous information fusion (PUIMCHIF) to predict candidate genes involved in the pathogenicity of human diseases. On the one hand, PUIMCHIF uses different compact feature learning methods to extract features of genes and diseases from multiple data sources, making up for the lack of sparse data. On the other hand, based on the prior knowledge that most of the unknown gene-disease associations are unrelated, we use the PU-Learning strategy to treat the unknown unlabeled data as negative examples for biased learning. The experimental results of the PUIMCHIF algorithm regarding the three indexes of precision, recall, and mean percentile ranking (MPR) were significantly better than those of other algorithms. In the top 100 global prediction analysis of multiple genes and multiple diseases, the probability of recovering true gene associations using PUIMCHIF reached 50% and the MPR value was 10.94%. The PUIMCHIF algorithm has higher priority than those from other methods, such as IMC and CATAPULT.
Collapse
Affiliation(s)
- Chunyu Wang
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Jie Zhang
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Xueping Wang
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Ke Han
- School of Computer and Information Engineering, Harbin University of Commerce, Harbin, China
| | - Maozu Guo
- School of Electrical and Information Engineering, Beijing University of Civil Engineering and Architecture, Beijing, China
- Beijing Key Laboratory of Intelligent Processing for Building Big Data, Beijing University of Civil Engineering and Architecture, Beijing, China
| |
Collapse
|
32
|
Zhang J, Hu H, Xu S, Jiang H, Zhu J, Qin E, He Z, Chen E. The Functional Effects of Key Driver KRAS Mutations on Gene Expression in Lung Cancer. Front Genet 2020; 11:17. [PMID: 32117436 PMCID: PMC7010953 DOI: 10.3389/fgene.2020.00017] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2019] [Accepted: 01/07/2020] [Indexed: 12/11/2022] Open
Abstract
Lung cancer is a common malignant cancer. Kirsten rat sarcoma oncogene (KRAS) mutations have been considered as a key driver for lung cancers. KRAS p.G12C mutations were most predominant in NSCLC which was comprised about 11–16% of lung adenocarcinomas (p.G12C accounts for 45–50% of mutant KRAS). But it is still not clear how the KRAS mutation triggers lung cancers. To study the molecular mechanisms of KRAS mutation in lung cancer. We analyzed the gene expression profiles of 156 KRAS mutation samples and other negative samples with two stage feature selection approach: (1) minimal Redundancy Maximal Relevance (mRMR) and (2) Incremental Feature Selection (IFS). At last, 41 predictive genes for KRAS mutation were identified and a KRAS mutation predictor was constructed. Its leave one out cross validation MCC was 0.879. Our results were helpful for understanding the roles of KRAS mutation in lung cancer.
Collapse
Affiliation(s)
- Jisong Zhang
- Department of Pulmonary and Critical Care Medicine, Sir Run Run Shaw Hospital of Zhejiang University, Hangzhou, China
| | - Huihui Hu
- Department of Pulmonary and Critical Care Medicine, Sir Run Run Shaw Hospital of Zhejiang University, Hangzhou, China
| | - Shan Xu
- Department of Pulmonary and Critical Care Medicine, Sir Run Run Shaw Hospital of Zhejiang University, Hangzhou, China
| | - Hanliang Jiang
- Department of Pulmonary and Critical Care Medicine, Sir Run Run Shaw Hospital of Zhejiang University, Hangzhou, China
| | - Jihong Zhu
- Department of Anesthesiology, Sir Run Run Shaw Hospital of Zhejiang University, Hangzhou, China
| | - E Qin
- Department of Respiratory Medicine, Shaoxing People's Hospital (Shaoxing Hospital, Zhejiang University School of Medicine), Shaoxing, China
| | - Zhengfu He
- Department of Thoracic Surgery, Sir Run Run Shaw Hospital of Zhejiang University, Hangzhou, China
| | - Enguo Chen
- Department of Pulmonary and Critical Care Medicine, Sir Run Run Shaw Hospital of Zhejiang University, Hangzhou, China
| |
Collapse
|
33
|
Zhang S, Pan X, Zeng T, Guo W, Gan Z, Zhang YH, Chen L, Zhang Y, Huang T, Cai YD. Copy Number Variation Pattern for Discriminating MACROD2 States of Colorectal Cancer Subtypes. Front Bioeng Biotechnol 2019; 7:407. [PMID: 31921812 PMCID: PMC6930883 DOI: 10.3389/fbioe.2019.00407] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2019] [Accepted: 11/27/2019] [Indexed: 12/24/2022] Open
Abstract
Copy number variation (CNV) is a common structural variation pattern of DNA, and it features a higher mutation rate than single-nucleotide polymorphisms (SNPs) and affects a larger fragment of genomes. CNV is related with the genesis of complex diseases and can thus be used as a strategy to identify novel cancer-predisposing markers or mechanisms. In particular, the frequent deletions of mono-ADP-ribosylhydrolase 2 (MACROD2) locus in human colorectal cancer (CRC) alters DNA repair and the sensitivity to DNA damage and results in chromosomal instability. The relationship between CNV and cancer has not been explained. In this study, on the basis of the genome variation profiling by the SNP array from 651 CRC primary tumors, we computationally analyzed the CNV data to select crucial SNP sites with the most relevance to three different states of MACROD2 (heterozygous deletion, homozygous deletion, and normal state), suggesting that these CNVs may play functional roles in CRC tumorigenesis. Our study can shed new insights into the genesis of cancer based on CNV, providing reference for clinical diagnosis, and treatment prognosis of CRC.
Collapse
Affiliation(s)
- ShiQi Zhang
- School of Life Sciences, Shanghai University, Shanghai, China.,Department of Biostatistics, University of Copenhagen, Copenhagen, Denmark
| | - XiaoYong Pan
- Key Laboratory of System Control and Information Processing, Institute of Image Processing and Pattern Recognition, Ministry of Education of China, Shanghai Jiao Tong University, Shanghai, China
| | - Tao Zeng
- Key Laboratory of Systems Biology, Institute of Biochemistry and Cell Biology, Chinese Academy of Sciences, Shanghai, China
| | - Wei Guo
- Institute of Health Sciences, Chinese Academy of Sciences, Shanghai Jiao Tong University School of Medicine and Shanghai Institutes for Biological Sciences, Shanghai, China
| | - Zijun Gan
- Shanghai Institute of Nutrition and Health, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, China
| | - Yu-Hang Zhang
- Shanghai Institute of Nutrition and Health, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, China
| | - Lei Chen
- College of Information Engineering, Shanghai Maritime University, Shanghai, China.,Shanghai Key Laboratory of PMMP, East China Normal University, Shanghai, China
| | - YunHua Zhang
- Anhui Province Key Laboratory of Farmland Ecological Conservation and Pollution Prevention, School of Resources and Environment, Anhui Agricultural University, Hefei, China
| | - Tao Huang
- Shanghai Institute of Nutrition and Health, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, China
| | - Yu-Dong Cai
- School of Life Sciences, Shanghai University, Shanghai, China
| |
Collapse
|
34
|
Chen L, Pan X, Zeng T, Zhang YH, Zhang Y, Huang T, Cai YD. Immunosignature Screening for Multiple Cancer Subtypes Based on Expression Rule. Front Bioeng Biotechnol 2019; 7:370. [PMID: 31850330 PMCID: PMC6901955 DOI: 10.3389/fbioe.2019.00370] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2019] [Accepted: 11/13/2019] [Indexed: 12/13/2022] Open
Abstract
Liquid biopsy (i.e., fluid biopsy) involves a series of clinical examination approaches. Monitoring of cancer immunological status by the “immunosignature” of patients presents a novel method for tumor-associated liquid biopsy. The major work content and the core technological difficulties for the monitoring of cancer immunosignature are the recognition of cancer-related immune-activating antigens by high-throughput screening approaches. Currently, one key task of immunosignature-based liquid biopsy is the qualitative and quantitative identification of typical tumor-specific antigens. In this study, we reused two sets of peptide microarray data that detected the expression level of potential antigenic peptides derived from tumor tissues to avoid the detection differences induced by chip platforms. Several machine learning algorithms were applied on these two sets. First, the Monte Carlo Feature Selection (MCFS) method was used to analyze features in two sets. A feature list was obtained according to the MCFS results on each set. Second, incremental feature selection method incorporating one classification algorithm (support vector machine or random forest) followed to extract optimal features and construct optimal classifiers. On the other hand, the repeated incremental pruning to produce error reduction, a rule learning algorithm, was applied on key features yielded by the MCFS method to extract quantitative rules for accurate cancer immune monitoring and pathologic diagnosis. Finally, obtained key features and quantitative rules were extensively analyzed.
Collapse
Affiliation(s)
- Lei Chen
- School of Life Sciences, Shanghai University, Shanghai, China.,College of Information Engineering, Shanghai Maritime University, Shanghai, China.,Shanghai Key Laboratory of Pure Mathematics and Mathematical Practice (PMMP), East China Normal University, Shanghai, China
| | - XiaoYong Pan
- Key Laboratory of System Control and Information Processing, Ministry of Education of China, Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, Shanghai, China.,IDLab, Department for Electronics and Information Systems, Ghent University, Ghent, Belgium
| | - Tao Zeng
- Key Laboratory of Systems Biology, Institute of Biochemistry and Cell Biology, Chinese Academy of Sciences, Shanghai, China
| | - Yu-Hang Zhang
- Shanghai Institute of Nutrition and Health, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, China
| | - YunHua Zhang
- Anhui Province Key Laboratory of Farmland Ecological Conservation and Pollution Prevention, School of Resources and Environment, Anhui Agricultural University, Hefei, China
| | - Tao Huang
- Shanghai Institute of Nutrition and Health, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, China
| | - Yu-Dong Cai
- School of Life Sciences, Shanghai University, Shanghai, China
| |
Collapse
|
35
|
Identifying Methylation Pattern and Genes Associated with Breast Cancer Subtypes. Int J Mol Sci 2019; 20:ijms20174269. [PMID: 31480430 PMCID: PMC6747348 DOI: 10.3390/ijms20174269] [Citation(s) in RCA: 30] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2019] [Revised: 08/19/2019] [Accepted: 08/29/2019] [Indexed: 12/18/2022] Open
Abstract
Breast cancer is regarded worldwide as a severe human disease. Various genetic variations, including hereditary and somatic mutations, contribute to the initiation and progression of this disease. The diagnostic parameters of breast cancer are not limited to the conventional protein content and can include newly discovered genetic variants and even genetic modification patterns such as methylation and microRNA. In addition, breast cancer detection extends to detailed breast cancer stratifications to provide subtype-specific indications for further personalized treatment. One genome-wide expression–methylation quantitative trait loci analysis confirmed that different breast cancer subtypes have various methylation patterns. However, recognizing clinically applied (methylation) biomarkers is difficult due to the large number of differentially methylated genes. In this study, we attempted to re-screen a small group of functional biomarkers for the identification and distinction of different breast cancer subtypes with advanced machine learning methods. The findings may contribute to biomarker identification for different breast cancer subtypes and provide a new perspective for differential pathogenesis in breast cancer subtypes.
Collapse
|
36
|
Inferring novel genes related to oral cancer with a network embedding method and one-class learning algorithms. Gene Ther 2019; 26:465-478. [PMID: 31455874 DOI: 10.1038/s41434-019-0099-y] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2019] [Revised: 06/18/2019] [Accepted: 07/15/2019] [Indexed: 12/14/2022]
Abstract
Oral cancer (OC) is one of the most common cancers threatening human lives. However, OC pathogenesis has yet to be fully uncovered, and thus designing effective treatments remains difficult. Identifying genes related to OC is an important way for achieving this purpose. In this study, we proposed three computational models for inferring novel OC-related genes. In contrast to previously proposed computational methods, which lacked the learning procedures, each proposed model adopted a one-class learning algorithm, which can provide a deep insight into features of validated OC-related genes. A network embedding algorithm (i.e., node2vec) was applied to the protein-protein interaction network to produce the representation of genes. The features of the OC-related genes were used in the training of the one-class algorithm, and the performance of the final inferring model was improved through a feature selection procedure. Then, candidate genes were produced by applying the trained inferring model to other genes. Three tests were performed to screen out the important candidate genes. Accordingly, we obtained three inferred gene sets, any two of which were different. The inferred genes were also different from previous reported genes and some of them have been included in the public Oral Cancer Gene Database. Finally, we analyzed several inferred genes to confirm whether they are novel OC-related genes.
Collapse
|
37
|
Analysis of Protein-Protein Functional Associations by Using Gene Ontology and KEGG Pathway. BIOMED RESEARCH INTERNATIONAL 2019; 2019:4963289. [PMID: 31396531 PMCID: PMC6668538 DOI: 10.1155/2019/4963289] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/04/2019] [Revised: 06/04/2019] [Accepted: 06/26/2019] [Indexed: 12/19/2022]
Abstract
Protein–protein interaction (PPI) plays an extremely remarkable role in the growth, reproduction, and metabolism of all lives. A thorough investigation of PPI can uncover the mechanism of how proteins express their functions. In this study, we used gene ontology (GO) terms and biological pathways to study an extended version of PPI (protein–protein functional associations) and subsequently identify some essential GO terms and pathways that can indicate the difference between two proteins with and without functional associations. The protein–protein functional associations validated by experiments were retrieved from STRING, a well-known database on collected associations between proteins from multiple sources, and they were termed as positive samples. The negative samples were constructed by randomly pairing two proteins. Each sample was represented by several features based on GO and KEGG pathway information of two proteins. Then, the mutual information was adopted to evaluate the importance of all features and some important ones could be accessed, from which a number of essential GO terms or KEGG pathways were identified. The final analysis of some important GO terms and one KEGG pathway can partly uncover the difference between proteins with and without functional associations.
Collapse
|
38
|
Li J, Lu L, Zhang YH, Xu Y, Liu M, Feng K, Chen L, Kong X, Huang T, Cai YD. Identification of leukemia stem cell expression signatures through Monte Carlo feature selection strategy and support vector machine. Cancer Gene Ther 2019; 27:56-69. [PMID: 31138902 DOI: 10.1038/s41417-019-0105-y] [Citation(s) in RCA: 30] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2019] [Revised: 04/28/2019] [Accepted: 05/04/2019] [Indexed: 01/09/2023]
Abstract
Acute myeloid leukemia (AML) is a type of blood cancer characterized by the rapid growth of immature white blood cells from the bone marrow. Therapy resistance resulting from the persistence of leukemia stem cells (LSCs) are found in numerous patients. Comparative transcriptome studies have been previously conducted to analyze differentially expressed genes between LSC+ and LSC- cells. However, these studies mainly focused on a limited number of genes with the most obvious expression differences between the two cell types. We developed a computational approach incorporating several machine learning algorithms, including Monte Carlo feature selection (MCFS), incremental feature selection (IFS), support vector machine (SVM), Repeated Incremental Pruning to Produce Error Reduction (RIPPER), to identify gene expression features specific to LSCs. One thousand 0ne hudred fifty-nine features (genes) were first identified, which can be used to build the optimal SVM classifier for distinguishing LSC+ and LSC- cells. Among these 1159 genes, the top 17 genes were identified as LSC-specific biomarkers. In addition, six classification rules were produced by RIPPER algorithm. The subsequent literature review on these features/genes and the classification rules and functional enrichment analyses of the 1159 features/genes confirmed the relevance of extracted genes and rules to the characteristics of LSCs.
Collapse
Affiliation(s)
- JiaRui Li
- Shanghai Institute of Nutrition and Health, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, 200031, P. R. China.,School of Life Sciences, Shanghai University, Shanghai, 200444, P. R. China
| | - Lin Lu
- Department of Radiology, Columbia University Medical Center, New York, NY, 10032, USA
| | - Yu-Hang Zhang
- Shanghai Institute of Nutrition and Health, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, 200031, P. R. China
| | - YaoChen Xu
- Institute of Biochemistry and Cell Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, 200031, P. R. China
| | - Min Liu
- College of Information Engineering, Shanghai Maritime University, Shanghai, 201306, P. R. China
| | - KaiYan Feng
- Department of Computer Science, Guangdong AIB Polytechnic, Guangzhou, 510507, P. R. China
| | - Lei Chen
- College of Information Engineering, Shanghai Maritime University, Shanghai, 201306, P. R. China.,Shanghai Key Laboratory of PMMP, East China Normal University, Shanghai, 200241, P. R. China
| | - XiangYin Kong
- Shanghai Institute of Nutrition and Health, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, 200031, P. R. China.
| | - Tao Huang
- Shanghai Institute of Nutrition and Health, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, 200031, P. R. China.
| | - Yu-Dong Cai
- School of Life Sciences, Shanghai University, Shanghai, 200444, P. R. China.
| |
Collapse
|
39
|
Chen L, Pan X, Zhang YH, Huang T, Cai YD. Analysis of Gene Expression Differences between Different Pancreatic Cells. ACS OMEGA 2019; 4:6421-6435. [DOI: 10.1021/acsomega.8b02171] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/30/2023]
Affiliation(s)
- Lei Chen
- School of Life Sciences, Shanghai University, Shanghai 200444, China
- College of Information Engineering, Shanghai Maritime University, Shanghai 201306, China
- Shanghai Key Laboratory of PMMP, East China Normal University, Shanghai 200241, China
| | - Xiaoyong Pan
- Department of Medical Informatics, Erasmus MC, Rotterdam 3014ZK, Netherlands
| | - Yu-Hang Zhang
- Institute of Health Sciences, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai 200031, China
| | - Tao Huang
- Institute of Health Sciences, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai 200031, China
| | - Yu-Dong Cai
- School of Life Sciences, Shanghai University, Shanghai 200444, China
| |
Collapse
|
40
|
Wang T, Chen L, Zhao X. Prediction of Drug Combinations with a Network Embedding Method. Comb Chem High Throughput Screen 2019; 21:789-797. [DOI: 10.2174/1386207322666181226170140] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2018] [Revised: 11/02/2018] [Accepted: 11/28/2018] [Indexed: 01/10/2023]
Abstract
Aim and Objective:
There are several diseases having a complicated mechanism. For such
complicated diseases, a single drug cannot treat them very well because these diseases always
involve several targets and single targeted drugs cannot modulate these targets simultaneously. Drug
combination is an effective way to treat such diseases. However, determination of effective drug
combinations is time- and cost-consuming via traditional methods. It is urgent to build quick and
cheap methods in this regard. Designing effective computational methods incorporating advanced
computational techniques to predict drug combinations is an alternative and feasible way.
Method:
In this study, we proposed a novel network embedding method, which can extract
topological features of each drug combination from a drug network that was constructed using
chemical-chemical interaction information retrieved from STITCH. These topological features were
combined with individual features of drug combination reported in one previous study. Several
advanced computational methods were employed to construct an effective prediction model, such as
synthetic minority oversampling technique (SMOTE) that was used to tackle imbalanced dataset,
minimum redundancy maximum relevance (mRMR) and incremental feature selection (IFS)
methods that were adopted to analyze features and extract optimal features for building an optimal
support machine vector (SVM) classifier.
Results and Conclusion:
The constructed optimal SVM classifier yielded an MCC of 0.806, which
is superior to the classifier only using individual features with or without SMOTE. The performance
of the classifier can be improved by combining the topological features and essential features of a
drug combination.
Collapse
Affiliation(s)
- Tianyun Wang
- College of Information Engineering, Shanghai Maritime University, Shanghai 201306, China
| | - Lei Chen
- College of Information Engineering, Shanghai Maritime University, Shanghai 201306, China
| | - Xian Zhao
- College of Information Engineering, Shanghai Maritime University, Shanghai 201306, China
| |
Collapse
|
41
|
Wang S, Li J, Sun X, Zhang YH, Huang T, Cai Y. Computational Method for Identifying Malonylation Sites by Using Random Forest Algorithm. Comb Chem High Throughput Screen 2018; 23:304-312. [PMID: 30588879 DOI: 10.2174/1386207322666181227144318] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2018] [Revised: 09/03/2018] [Accepted: 12/04/2018] [Indexed: 12/12/2022]
Abstract
BACKGROUND As a newly uncovered post-translational modification on the ε-amino group of lysine residue, protein malonylation was found to be involved in metabolic pathways and certain diseases. Apart from experimental approaches, several computational methods based on machine learning algorithms were recently proposed to predict malonylation sites. However, previous methods failed to address imbalanced data sizes between positive and negative samples. OBJECTIVE In this study, we identified the significant features of malonylation sites in a novel computational method which applied machine learning algorithms and balanced data sizes by applying synthetic minority over-sampling technique. METHOD Four types of features, namely, amino acid (AA) composition, position-specific scoring matrix (PSSM), AA factor, and disorder were used to encode residues in protein segments. Then, a two-step feature selection procedure including maximum relevance minimum redundancy and incremental feature selection, together with random forest algorithm, was performed on the constructed hybrid feature vector. RESULTS An optimal classifier was built from the optimal feature subset, which featured an F1-measure of 0.356. Feature analysis was performed on several selected important features. CONCLUSION Results showed that certain types of PSSM and disorder features may be closely associated with malonylation of lysine residues. Our study contributes to the development of computational approaches for predicting malonyllysine and provides insights into molecular mechanism of malonylation.
Collapse
Affiliation(s)
- ShaoPeng Wang
- School of Life Sciences, Shanghai University, Shanghai 200444, China
| | - JiaRui Li
- School of Life Sciences, Shanghai University, Shanghai 200444, China
| | - Xijun Sun
- School of Life Sciences, Shanghai University, Shanghai 200444, China
| | - Yu-Hang Zhang
- Institute of Health Sciences, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai 200031, China
| | - Tao Huang
- Institute of Health Sciences, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai 200031, China
| | - Yudong Cai
- School of Life Sciences, Shanghai University, Shanghai 200444, China
| |
Collapse
|
42
|
Chen L, Pan X, Zhang YH, Liu M, Huang T, Cai YD. Classification of Widely and Rarely Expressed Genes with Recurrent Neural Network. Comput Struct Biotechnol J 2018; 17:49-60. [PMID: 30595815 PMCID: PMC6307323 DOI: 10.1016/j.csbj.2018.12.002] [Citation(s) in RCA: 30] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2018] [Revised: 12/07/2018] [Accepted: 12/09/2018] [Indexed: 02/06/2023] Open
Abstract
A tissue-specific gene expression shapes the formation of tissues, while gene expression changes reflect the immune response of the human body to environmental stimulations or pressure, particularly in disease conditions, such as cancers. A few genes are commonly expressed across tissues or various cancers, while others are not. To investigate the functional differences between widely and rarely expressed genes, we defined the genes that were expressed in 32 normal tissues/cancers (i.e., called widely expressed genes; FPKM >1 in all samples) and those that were not detected (i.e., called rarely expressed genes; FPKM <1 in all samples) based on the large gene expression data set provided by Uhlen et al. Each gene was encoded using the gene ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment scores. Minimum redundancy maximum relevance (mRMR) was used to measure and rank these features on the mRMR feature list. Thereafter, we applied the incremental feature selection method with a supervised classifier recurrent neural network (RNN) to select the discriminate features for classifying widely expressed genes from rarely expressed genes and construct an optimum RNN classifier. The Youden's indexes generated by the optimum RNN classifier and evaluated using a 10-fold cross validation were 0.739 for normal tissues and 0.639 for cancers. Furthermore, the underlying mechanisms of the key discriminate GO and KEGG features were analyzed. Results can facilitate the identification of the expression landscape of genes and elucidation of how gene expression shapes tissues and the microenvironment of cancers. Some genes are widely expressed across tissues or various cancers. A number of genes are rarely expressed across tissues or various cancers. The functional differences between widely and rarely expressed genes were studied. Several GO terms and KEGG pathways were extracted and analyzed.
Collapse
Affiliation(s)
- Lei Chen
- School of Life Sciences, Shanghai University, Shanghai 200444, People's Republic of China.,College of Information Engineering, Shanghai Maritime University, Shanghai 201306, People's Republic of China.,Shanghai Key Laboratory of PMMP, East China Normal University, Shanghai 200241, People's Republic of China
| | - XiaoYong Pan
- Department of Medical Informatics, Erasmus MC, Rotterdam, the Netherlands
| | - Yu-Hang Zhang
- Institute of Health Sciences, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai 200031, People's Republic of China
| | - Min Liu
- College of Information Engineering, Shanghai Maritime University, Shanghai 201306, People's Republic of China
| | - Tao Huang
- Institute of Health Sciences, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai 200031, People's Republic of China
| | - Yu-Dong Cai
- School of Life Sciences, Shanghai University, Shanghai 200444, People's Republic of China
| |
Collapse
|
43
|
Chen L, Zhang S, Pan X, Hu X, Zhang YH, Yuan F, Huang T, Cai YD. HIV infection alters the human epigenetic landscape. Gene Ther 2018; 26:29-39. [PMID: 30443044 DOI: 10.1038/s41434-018-0051-6] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2018] [Revised: 10/30/2018] [Accepted: 10/31/2018] [Indexed: 02/07/2023]
Abstract
Many complex diseases or traits are the results of both genetic and environmental factors. The environmental factors affect the human body by modifying its epigenetics, which controls the activity of genomes without mutating it. Viral infection is one of the common environmental factors for complex diseases. For example, the human immunodeficiency virus (HIV) infection can cause acquired immune deficiency syndrome (AIDS), HBV, and HCV infections are associated with hepatocellular carcinoma, and human papillomavirus infection is a causal factor in cervical carcinoma. In this study, to investigate how HIV infection affects DNA methylation, we analyzed the blood DNA methylation data of 485 512 sites in 44 HIV- and 142 HIV + patients. Several advanced computational methods were applied to identify the core distinctive features that were different between the HIV patients and the healthy controls. These methods can be used for differentiating HIV-infected patients from uninfected ones. These core distinctive DNA methylation features were confirmed to be functionally connected to premature aging and abnormal immune regulation, two typical pathological symptoms of HIV infection, revealing the potential regulatory mechanisms of HIV infection on the DNA methylation status of the host cells and provided novel insights on the pathogenesis of HIV infection and AIDS.
Collapse
Affiliation(s)
- Lei Chen
- School of Life Sciences, Shanghai University, Shanghai, 200444, China.,Shanghai Key Laboratory of PMMP, East China Normal University, Shanghai, 200241, China.,College of Information Engineering, Shanghai Maritime University, Shanghai, 201306, China
| | - Shiqi Zhang
- Department of Biostatistics, University of Copenhagen, Copenhagen, Denmark
| | - Xiaoyong Pan
- Department of Medical Informatics, Erasmus MC, Rotterdam, Netherlands
| | - XiaoHua Hu
- Department of Biostatistics and Computational Biology, School of Life Sciences, Fudan University, Shanghai, 200438, China
| | - Yu-Hang Zhang
- Institute of Health Sciences, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, 200031, China
| | - Fei Yuan
- Department of Science & Technology, Binzhou Medical University Hospital, Binzhou, 256603, Shandong, China
| | - Tao Huang
- Institute of Health Sciences, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, 200031, China.
| | - Yu-Dong Cai
- School of Life Sciences, Shanghai University, Shanghai, 200444, China.
| |
Collapse
|
44
|
Chen L, Zhang YH, Pan X, Liu M, Wang S, Huang T, Cai YD. Tissue Expression Difference between mRNAs and lncRNAs. Int J Mol Sci 2018; 19:ijms19113416. [PMID: 30384456 PMCID: PMC6274976 DOI: 10.3390/ijms19113416] [Citation(s) in RCA: 49] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2018] [Revised: 10/26/2018] [Accepted: 10/28/2018] [Indexed: 12/15/2022] Open
Abstract
Messenger RNA (mRNA) and long noncoding RNA (lncRNA) are two main subgroups of RNAs participating in transcription regulation. With the development of next generation sequencing, increasing lncRNAs are identified. Many hidden functions of lncRNAs are also revealed. However, the differences in lncRNAs and mRNAs are still unclear. For example, we need to determine whether lncRNAs have stronger tissue specificity than mRNAs and which tissues have more lncRNAs expressed. To investigate such tissue expression difference between mRNAs and lncRNAs, we encoded 9339 lncRNAs and 14,294 mRNAs with 71 expression features, including 69 maximum expression features for 69 types of cells, one feature for the maximum expression in all cells, and one expression specificity feature that was measured as Chao-Shen-corrected Shannon's entropy. With advanced feature selection methods, such as maximum relevance minimum redundancy, incremental feature selection methods, and random forest algorithm, 13 features presented the dissimilarity of lncRNAs and mRNAs. The 11 cell subtype features indicated which cell types of the lncRNAs and mRNAs had the largest expression difference. Such cell subtypes may be the potential cell models for lncRNA identification and function investigation. The expression specificity feature suggested that the cell types to express mRNAs and lncRNAs were different. The maximum expression feature suggested that the maximum expression levels of mRNAs and lncRNAs were different. In addition, the rule learning algorithm, repeated incremental pruning to produce error reduction algorithm, was also employed to produce effective classification rules for classifying lncRNAs and mRNAs, which gave competitive results compared with random forest and could give a clearer picture of different expression patterns between lncRNAs and mRNAs. Results not only revealed the heterogeneous expression pattern of lncRNA and mRNA, but also gave rise to the development of a new tool to identify the potential biological functions of such RNA subgroups.
Collapse
Affiliation(s)
- Lei Chen
- School of Life Sciences, Shanghai University, Shanghai 200444, China.
- College of Information Engineering, Shanghai Maritime University, Shanghai 201306, China.
- Shanghai Key Laboratory of PMMP, East China Normal University, Shanghai 200241, China.
| | - Yu-Hang Zhang
- Institute of Health Sciences, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai 200031, China.
| | - Xiaoyong Pan
- Department of Medical Informatics, Erasmus MC, 3000 CA Rotterdam, The Netherlands.
| | - Min Liu
- College of Information Engineering, Shanghai Maritime University, Shanghai 201306, China.
| | - Shaopeng Wang
- School of Life Sciences, Shanghai University, Shanghai 200444, China.
| | - Tao Huang
- Institute of Health Sciences, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai 200031, China.
| | - Yu-Dong Cai
- School of Life Sciences, Shanghai University, Shanghai 200444, China.
| |
Collapse
|
45
|
Lu S, Zhao K, Wang X, Liu H, Ainiwaer X, Xu Y, Ye M. Use of Laplacian Heat Diffusion Algorithm to Infer Novel Genes With Functions Related to Uveitis. Front Genet 2018; 9:425. [PMID: 30349554 PMCID: PMC6186792 DOI: 10.3389/fgene.2018.00425] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2018] [Accepted: 09/10/2018] [Indexed: 12/17/2022] Open
Abstract
Uveitis is the inflammation of the uvea and is a serious eye disease that can cause blindness for middle-aged and young people. However, the pathogenesis of this disease has not been fully uncovered and thus renders difficulties in designing effective treatments. Completely identifying the genes related to this disease can help improve and accelerate the comprehension of uveitis. In this study, a new computational method was developed to infer potential related genes based on validated ones. We employed a large protein–protein interaction network reported in STRING, in which Laplacian heat diffusion algorithm was applied using validated genes as seed nodes. Except for the validated ones, all genes in the network were filtered by three tests, namely, permutation, association, and function tests, which evaluated the genes based on their specialties and associations to uveitis. Results indicated that 59 inferred genes were accessed, several of which were confirmed to be highly related to uveitis by literature review. In addition, the inferred genes were compared with those reported in a previous study, indicating that our reported genes are necessary supplements.
Collapse
Affiliation(s)
- Shiheng Lu
- Department of Ophthalmology, Shanghai Pudong Hospital, Fudan University Pudong Medical Center, Pudong, China
| | - Ke Zhao
- Department of Ophthalmology, Shanghai Pudong Hospital, Fudan University Pudong Medical Center, Pudong, China
| | - Xuefei Wang
- Department of Ophthalmology, Shanghai Pudong Hospital, Fudan University Pudong Medical Center, Pudong, China
| | - Hui Liu
- Department of Ophthalmology, Shanghai Pudong Hospital, Fudan University Pudong Medical Center, Pudong, China
| | - Xiamuxiya Ainiwaer
- Department of Ophthalmology, Shanghai Pudong Hospital, Fudan University Pudong Medical Center, Pudong, China
| | - Yan Xu
- School of Life Sciences, Shanghai University, Shanghai, China
| | - Min Ye
- Department of Ophthalmology, Shanghai Pudong Hospital, Fudan University Pudong Medical Center, Pudong, China
| |
Collapse
|
46
|
Pan X, Hu X, Zhang YH, Chen L, Zhu L, Wan S, Huang T, Cai YD. Identification of the copy number variant biomarkers for breast cancer subtypes. Mol Genet Genomics 2018; 294:95-110. [PMID: 30203254 DOI: 10.1007/s00438-018-1488-4] [Citation(s) in RCA: 51] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2018] [Accepted: 09/03/2018] [Indexed: 01/07/2023]
Abstract
Breast cancer is a common and threatening malignant disease with multiple biological and clinical subtypes. It can be categorized into subtypes of luminal A, luminal B, Her2 positive, and basal-like. Copy number variants (CNVs) have been reported to be a potential and even better biomarker for cancer diagnosis than mRNA biomarkers, because it is considerably more stable and robust than gene expression. Thus, it is meaningful to detect CNVs of different cancers. To identify the CNV biomarker for breast cancer subtypes, we integrated the CNV data of more than 2000 samples from two large breast cancer databases, METABRIC and The Cancer Genome Atlas (TCGA). A Monte Carlo feature selection-based and incremental feature selection-based computational method was proposed and tested to identify the distinctive core CNVs in different breast cancer subtypes. We identified the CNV genes that may contribute to breast cancer tumorigenesis as well as built a set of quantitative distinctive rules for recognition of the breast cancer subtypes. The tenfold cross-validation Matthew's correlation coefficient (MCC) on METABRIC training set and the independent test on TCGA dataset were 0.515 and 0.492, respectively. The CNVs of PGAP3, GRB7, MIR4728, PNMT, STARD3, TCAP and ERBB2 were important for the accurate diagnosis of breast cancer subtypes. The findings reported in this study may further uncover the difference between different breast cancer subtypes and improve the diagnosis accuracy.
Collapse
Affiliation(s)
- Xiaoyong Pan
- College of Life Science, Shanghai University, Shanghai, 200444, People's Republic of China.,Department of Medical Informatics, Erasmus MC, Rotterdam, The Netherlands
| | - XiaoHua Hu
- Department of Biostatistics and Computational Biology, School of Life Sciences, Fudan University, Shanghai, 200438, People's Republic of China
| | - Yu-Hang Zhang
- Institute of Health Sciences, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, 200031, People's Republic of China
| | - Lei Chen
- College of Information Engineering, Shanghai Maritime University, Shanghai, 201306, People's Republic of China.,Shanghai Key Laboratory of PMMP, East China Normal University, Shanghai, 200241, People's Republic of China
| | - LiuCun Zhu
- College of Life Science, Shanghai University, Shanghai, 200444, People's Republic of China
| | - ShiBao Wan
- College of Life Science, Shanghai University, Shanghai, 200444, People's Republic of China
| | - Tao Huang
- Institute of Health Sciences, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, 200031, People's Republic of China.
| | - Yu-Dong Cai
- College of Life Science, Shanghai University, Shanghai, 200444, People's Republic of China.
| |
Collapse
|