1
|
Zhang ZY, Sun ZJ, Gao D, Hao YD, Lin H, Liu F. Excavation of gene markers associated with pancreatic ductal adenocarcinoma based on interrelationships of gene expression. IET Syst Biol 2024. [PMID: 38530028 DOI: 10.1049/syb2.12090] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2023] [Revised: 02/06/2024] [Accepted: 03/10/2024] [Indexed: 03/27/2024] Open
Abstract
Pancreatic ductal adenocarcinoma (PDAC) accounts for 95% of all pancreatic cancer cases, posing grave challenges to its diagnosis and treatment. Timely diagnosis is pivotal for improving patient survival, necessitating the discovery of precise biomarkers. An innovative approach was introduced to identify gene markers for precision PDAC detection. The core idea of our method is to discover gene pairs that display consistent opposite relative expression and differential co-expression patterns between PDAC and normal samples. Reversal gene pair analysis and differential partial correlation analysis were performed to determine reversal differential partial correlation (RDC) gene pairs. Using incremental feature selection, the authors refined the selected gene set and constructed a machine-learning model for PDAC recognition. As a result, the approach identified 10 RDC gene pairs. And the model could achieve a remarkable accuracy of 96.1% during cross-validation, surpassing gene expression-based models. The experiment on independent validation data confirmed the model's performance. Enrichment analysis revealed the involvement of these genes in essential biological processes and shed light on their potential roles in PDAC pathogenesis. Overall, the findings highlight the potential of these 10 RDC gene pairs as effective diagnostic markers for early PDAC detection, bringing hope for improving patient prognosis and survival.
Collapse
Affiliation(s)
- Zhao-Yue Zhang
- School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, China
- School of Healthcare Technology, Chengdu Neusoft University, Chengdu, China
| | - Zi-Jie Sun
- School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, China
| | - Dong Gao
- School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, China
| | - Yu-Duo Hao
- School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, China
| | - Hao Lin
- School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, China
| | - Fen Liu
- Department of Radiation Oncology, Peking University Cancer Hospital (Inner Mongolia Campus), Affiliated Cancer Hospital of Inner Mongolia Medical University, Inner Mongolia Cancer Hospital, Hohhot, China
| |
Collapse
|
2
|
Wu Y, Xiao Q, Wang S, Xu H, Fang Y. Establishment and Analysis of an Artificial Neural Network Model for Early Detection of Polycystic Ovary Syndrome Using Machine Learning Techniques. J Inflamm Res 2023; 16:5667-5676. [PMID: 38050562 PMCID: PMC10693771 DOI: 10.2147/jir.s438838] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2023] [Accepted: 11/10/2023] [Indexed: 12/06/2023] Open
Abstract
Background To identify novel gene combinations and to develop an early diagnostic model for Polycystic Ovary Syndrome (PCOS) through the integration of artificial neural networks (ANN) and random forest (RF) methods. Methods We retrieved and processed gene expression datasets for PCOS from the Gene Expression Omnibus (GEO) database. Differential expression analysis of genes (DEGs) within the training set was performed using the "limma" R package. Enrichment analyses on DEGs using gene ontology (GO) and the Kyoto Encyclopedia of Genes and Genomes (KEGG), and immune cell infiltration. The identification of critical genes from DEGs was then performed using random forests, followed by the developing of new diagnostic models for PCOS using artificial neural networks. Results We identified 130 up-regulated genes and 132 down-regulated genes in PCOS compared to normal samples. Gene Ontology analysis revealed significant enrichment in myofibrils and highlighted crucial biological functions related to myofilament sliding, myofibril, and actin-binding. Compared with normal tissues, the types of immune cells expressed in PCOS samples are different. A random forest algorithm identified 10 significant genes proposed as potential PCOS-specific biomarkers. Using these genes, an artificial neural network diagnostic model accurately distinguished PCOS from normal samples. The diagnostic model underwent validation using the independent validation set, and the resulting area under the receiver operating characteristic curve (AUC) values was consistent with the anticipated outcomes. Conclusion Utilizing unique gene combinations, this research created a diagnostic model by merging random forest techniques with artificial neural networks. The AUC indicated a notably superior performance of the diagnostic model.
Collapse
Affiliation(s)
- Yumi Wu
- Institute of Acupuncture and Moxibustion of China Academy of Chinese Medical Sciences, Beijing, People’s Republic of China
| | - QiWei Xiao
- Institute of Acupuncture and Moxibustion of China Academy of Chinese Medical Sciences, Beijing, People’s Republic of China
| | - ShouDong Wang
- The Out-Patient Department of TCM of China Academy of Chinese Medical Sciences, Beijing, People’s Republic of China
| | - Huanfang Xu
- Institute of Acupuncture and Moxibustion of China Academy of Chinese Medical Sciences, Beijing, People’s Republic of China
- Acupuncture and Moxibustion Hospital of China Academy of Chinese Medical Sciences, Beijing, People’s Republic of China
| | - YiGong Fang
- Institute of Acupuncture and Moxibustion of China Academy of Chinese Medical Sciences, Beijing, People’s Republic of China
- Acupuncture and Moxibustion Hospital of China Academy of Chinese Medical Sciences, Beijing, People’s Republic of China
| |
Collapse
|
3
|
Mohamed TIA, Ezugwu AE, Fonou-Dombeu JV, Ikotun AM, Mohammed M. A bio-inspired convolution neural network architecture for automatic breast cancer detection and classification using RNA-Seq gene expression data. Sci Rep 2023; 13:14644. [PMID: 37670037 PMCID: PMC10480180 DOI: 10.1038/s41598-023-41731-z] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2023] [Accepted: 08/30/2023] [Indexed: 09/07/2023] Open
Abstract
Breast cancer is considered one of the significant health challenges and ranks among the most prevalent and dangerous cancer types affecting women globally. Early breast cancer detection and diagnosis are crucial for effective treatment and personalized therapy. Early detection and diagnosis can help patients and physicians discover new treatment options, provide a more suitable quality of life, and ensure increased survival rates. Breast cancer detection using gene expression involves many complexities, such as the issue of dimensionality and the complicatedness of the gene expression data. This paper proposes a bio-inspired CNN model for breast cancer detection using gene expression data downloaded from the cancer genome atlas (TCGA). The data contains 1208 clinical samples of 19,948 genes with 113 normal and 1095 cancerous samples. In the proposed model, Array-Array Intensity Correlation (AAIC) is used at the pre-processing stage for outlier removal, followed by a normalization process to avoid biases in the expression measures. Filtration is used for gene reduction using a threshold value of 0.25. Thereafter the pre-processed gene expression dataset was converted into images which were later converted to grayscale to meet the requirements of the model. The model also uses a hybrid model of CNN architecture with a metaheuristic algorithm, namely the Ebola Optimization Search Algorithm (EOSA), to enhance the detection of breast cancer. The traditional CNN and five hybrid algorithms were compared with the classification result of the proposed model. The competing hybrid algorithms include the Whale Optimization Algorithm (WOA-CNN), the Genetic Algorithm (GA-CNN), the Satin Bowerbird Optimization (SBO-CNN), the Life Choice-Based Optimization (LCBO-CNN), and the Multi-Verse Optimizer (MVO-CNN). The results show that the proposed model determined the classes with high-performance measurements with an accuracy of 98.3%, a precision of 99%, a recall of 99%, an f1-score of 99%, a kappa of 90.3%, a specificity of 92.8%, and a sensitivity of 98.9% for the cancerous class. The results suggest that the proposed method has the potential to be a reliable and precise approach to breast cancer detection, which is crucial for early diagnosis and personalized therapy.
Collapse
Affiliation(s)
- Tehnan I A Mohamed
- School of Mathematics, Statistics, and Computer Science, University of KwaZulu-Natal, King Edward Avenue, Pietermaritzburg Campus, Pietermaritzburg, 3201, KwaZulu-Natal, South Africa.
| | - Absalom E Ezugwu
- Unit for Data Science and Computing, North-West University, Potchefstroom, South Africa.
| | - Jean Vincent Fonou-Dombeu
- School of Mathematics, Statistics, and Computer Science, University of KwaZulu-Natal, King Edward Avenue, Pietermaritzburg Campus, Pietermaritzburg, 3201, KwaZulu-Natal, South Africa
| | - Abiodun M Ikotun
- School of Mathematics, Statistics, and Computer Science, University of KwaZulu-Natal, King Edward Avenue, Pietermaritzburg Campus, Pietermaritzburg, 3201, KwaZulu-Natal, South Africa
| | - Mohanad Mohammed
- School of Mathematics, Statistics, and Computer Science, University of KwaZulu-Natal, King Edward Avenue, Pietermaritzburg Campus, Pietermaritzburg, 3201, KwaZulu-Natal, South Africa
| |
Collapse
|
4
|
Zhou J, Jiang Z, Fu L, Qu F, Dai M, Xie N, Zhang S, Wang F. Contribution of labor related gene subtype classification on heterogeneity of polycystic ovary syndrome. PLoS One 2023; 18:e0282292. [PMID: 36857354 PMCID: PMC9977056 DOI: 10.1371/journal.pone.0282292] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2022] [Accepted: 02/11/2023] [Indexed: 03/02/2023] Open
Abstract
OBJECTIVE As one of the most common endocrine disorders in women of reproductive age, polycystic ovary syndrome (PCOS) is highly heterogeneous with varied clinical features and diverse gestational complications among individuals. The patients with PCOS have 2-fold higher risk of preterm labor which is associated with substantial infant morbidity and mortality and great socioeconomic cost. The study was designated to identify molecular subtypes and the related hub genes to facilitate the susceptibility assessment of preterm labor in women with PCOS. METHODS Four mRNA datasets (GSE84958, GSE5090, GSE43264 and GSE98421) were obtained from Gene Expression Omnibus database. Twenty-eight candidate genes related to preterm labor or labor were yielded from the researches and our unpublished data. Then, we utilized unsupervised clustering to identify molecular subtypes in PCOS based on the expression of above candidate genes. Key modules were generated with weighted gene co-expression network analysis R package, and their hub genes were generated with CytoHubba. The probable biological function and mechanism were explored through Gene Ontology analysis and Kyoto Encyclopedia of Genes and Genomes pathway analysis. In addition, STRING and Cytoscape software were used to identify the protein-protein interaction (PPI) network, and the molecular complex detection (MCODE) was used to identify the hub genes. Then the overlapping hub genes were predicted. RESULTS Two molecular subtypes were found in women with PCOS based on the expression similarity of preterm labor or labor-related genes, in which two modules were highlighted. The key modules and PPI network have five overlapping five hub genes, two of which, GTF2F2 and MYO6 gene, were further confirmed by the comparison between clustering subgroups according to the expression of hub genes. CONCLUSIONS Distinct PCOS molecular subtypes were identified with preterm labor or labor-related genes, which might uncover the potential mechanism underlying heterogeneity of clinical pregnancy complications in women with PCOS.
Collapse
Affiliation(s)
- Jue Zhou
- School of Food Science and Biotechnology, Zhejiang Gongshang University, Hangzhou, China
| | - Zhou Jiang
- Department of Obstetrics and Gynecology, Sir Run Run Shaw Hospital, School of Medicine, Zhejiang University, Hangzhou, China
| | - Leyi Fu
- Women’s Hospital, School of Medicine, Zhejiang University, Hangzhou, China
| | - Fan Qu
- Women’s Hospital, School of Medicine, Zhejiang University, Hangzhou, China
| | - Minchen Dai
- Women’s Hospital, School of Medicine, Zhejiang University, Hangzhou, China
| | - Ningning Xie
- Women’s Hospital, School of Medicine, Zhejiang University, Hangzhou, China
| | - Songying Zhang
- Department of Obstetrics and Gynecology, Sir Run Run Shaw Hospital, School of Medicine, Zhejiang University, Hangzhou, China
- * E-mail: (FW); (SZ)
| | - Fangfang Wang
- Women’s Hospital, School of Medicine, Zhejiang University, Hangzhou, China
- * E-mail: (FW); (SZ)
| |
Collapse
|
5
|
Wang S, Liu W, Ye Z, Xia X, Guo M. Development of a joint diagnostic model of thyroid papillary carcinoma with artificial neural network and random forest. Front Genet 2022; 13:957718. [PMCID: PMC9585230 DOI: 10.3389/fgene.2022.957718] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2022] [Accepted: 09/21/2022] [Indexed: 11/13/2022] Open
Abstract
Objective: Papillary thyroid carcinoma (PTC) accounts for 80% of thyroid malignancy, and the occurrence of PTC is increasing rapidly. The present study was conducted with the purpose of identifying novel and important gene panels and developing an early diagnostic model for PTC by combining artificial neural network (ANN) and random forest (RF).Methods and results: Samples were searched from the Gene Expression Omnibus (GEO) database, and gene expression datasets (GSE27155, GSE60542, and GSE33630) were collected and processed. GSE27155 and GSE60542 were merged into the training set, and GSE33630 was defined as the validation set. Differentially expressed genes (DEGs) in the training set were obtained by “limma” of R software. Then, Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment analysis as well as immune cell infiltration analysis were conducted based on DEGs. Important genes were identified from the DEGs by random forest. Finally, an artificial neural network was used to develop a diagnostic model. Also, the diagnostic model was validated by the validation set, and the area under the receiver operating characteristic curve (AUC) value was satisfactory.Conclusion: A diagnostic model was established by a joint of random forest and artificial neural network based on a novel gene panel. The AUC showed that the diagnostic model had significantly excellent performance.
Collapse
|
6
|
Bahadory S, Sadraei J, Zibaei M, Pirestani M, Dalimi A. In vitro anti-gastrointestinal cancer activity of Toxocara canis-derived peptide: Analyzing the expression level of factors related to cell proliferation and tumor growth. Front Pharmacol 2022; 13:878724. [PMID: 36204226 PMCID: PMC9530354 DOI: 10.3389/fphar.2022.878724] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2022] [Accepted: 08/01/2022] [Indexed: 11/23/2022] Open
Abstract
Background: Recently, a hypothesis about the negative relationship between cancers and parasites has been proposed and investigated; some parasitic worms and their products can affect the cancer cell proliferation. Due to the potential anti-cancer effect of helminthic parasites, in the present study, the excretory–secretory protein of Toxocara canis (T. canis) parasite was used to evaluate the possible anti-cancer properties and their effect on gastrointestinal and liver cancer cell proliferation-related genes in laboratory conditions. Methods and materials: The selected synthesized peptide fraction from the T. canis excretory–secretory Troponin protein peptide (ES TPP) was exposed at 32, 64, 128, and 256 μg/ml concentrations to three gastrointestinal cancer cell lines AGS, HT-29, and Caco 2, as well as HDF cells as normal cell lines. We used the MTT assay to evaluate cellular changes and cell viability (CV). Variations in gene (Bcl-2, APAF1, ZEB1, VEGF, cyclin-D1, and caspase-3) expression were analyzed by real-time RT-PCR. Results: After 24 h of exposure to pept1ides and cell lines, a decrease in CV was observed at a concentration of 64 μg/ml and compared to the control group. Then, after 48 h, a significant decrease in the CV of Caco 2 cells was observed at a concentration of 32 μg/ml; in the other cancer cell lines, concentrations above 32 μg/ml were effective. The peptide was able to significantly alter the expression of the studied genes at a concentration of 100 μg/ml. Conclusion: Although the studied peptide at high concentrations could have a statistically significant effect on cancer cells, it is still far from the standard drug and can be optimized and promising in future studies.
Collapse
Affiliation(s)
- Saeed Bahadory
- Department of Parasitology, Faculty of Medical Sciences, Tarbiat Modares University, Tehran, Iran
| | - Javid Sadraei
- Department of Parasitology, Faculty of Medical Sciences, Tarbiat Modares University, Tehran, Iran
- *Correspondence: Javid Sadraei,
| | - Mohammad Zibaei
- Department of Parasitology and Mycology, School of Medicine, Alborz University of Medical Sciences, Karaj, Iran
- Evidence-Based Phytotherapy and Complementary Medicine Research Center, Alborz University of Medical Sciences, Karaj, Iran
| | - Majid Pirestani
- Department of Parasitology, Faculty of Medical Sciences, Tarbiat Modares University, Tehran, Iran
| | - Abdolhossein Dalimi
- Department of Parasitology, Faculty of Medical Sciences, Tarbiat Modares University, Tehran, Iran
| |
Collapse
|
7
|
Shao D, Dai Y, Li N, Cao X, Zhao W, Cheng L, Rong Z, Huang L, Wang Y, Zhao J. Artificial intelligence in clinical research of cancers. Brief Bioinform 2021; 23:6470966. [PMID: 34929741 PMCID: PMC8769909 DOI: 10.1093/bib/bbab523] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2021] [Revised: 11/06/2021] [Accepted: 11/13/2021] [Indexed: 12/16/2022] Open
Abstract
Several factors, including advances in computational algorithms, the availability of high-performance computing hardware, and the assembly of large community-based databases, have led to the extensive application of Artificial Intelligence (AI) in the biomedical domain for nearly 20 years. AI algorithms have attained expert-level performance in cancer research. However, only a few AI-based applications have been approved for use in the real world. Whether AI will eventually be capable of replacing medical experts has been a hot topic. In this article, we first summarize the cancer research status using AI in the past two decades, including the consensus on the procedure of AI based on an ideal paradigm and current efforts of the expertise and domain knowledge. Next, the available data of AI process in the biomedical domain are surveyed. Then, we review the methods and applications of AI in cancer clinical research categorized by the data types including radiographic imaging, cancer genome, medical records, drug information and biomedical literatures. At last, we discuss challenges in moving AI from theoretical research to real-world cancer research applications and the perspectives toward the future realization of AI participating cancer treatment.
Collapse
Affiliation(s)
- Dan Shao
- College of Computer Science and Technology, Key Laboratory of Human Health Status Identification and Function Enhancement of Jilin Province, Changchun University, Changchun 130022, China
| | - Yinfei Dai
- College of Computer Science and Technology, Key Laboratory of Human Health Status Identification and Function Enhancement of Jilin Province, Changchun University, Changchun 130022, China
| | - Nianfeng Li
- College of Computer Science and Technology, Key Laboratory of Human Health Status Identification and Function Enhancement of Jilin Province, Changchun University, Changchun 130022, China
| | - Xuqing Cao
- Department of Neurology, People's Hospital of Ningxia Hui Autonomous Region (The Affiliated people's Hospital of Ningxia Medical University and The First Affiliated Hospital of Northwest Minzu University), Yinchuan 750002, China
| | - Wei Zhao
- Department of Biochemistry and Molecular Biology, Ningxia Medical University, Yinchuan 750002, China
| | - Li Cheng
- Department of Electrical Diagnosis, Affiliated Hospital of Changchun University of Traditional Chinese Medicine, Changchun, 130021, China
| | - Zhuqing Rong
- School of Science, Key Laboratory of Human Health Status Identification and Function Enhancement of Jilin Province, Changchun University, Changchun 130022, China
| | - Lan Huang
- Key laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun 130012, China
| | - Yan Wang
- Key laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun 130012, China
| | - Jing Zhao
- Department of Biomedical Informatics, College of Medicine, The Ohio State University, Columbus, 43210, USA
| |
Collapse
|
8
|
Liu Q, Cheng B, Jin Y, Hu P. Bayesian tensor factorization-drive breast cancer subtyping by integrating multi-omics data. J Biomed Inform 2021; 125:103958. [PMID: 34839017 DOI: 10.1016/j.jbi.2021.103958] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2021] [Revised: 10/13/2021] [Accepted: 11/19/2021] [Indexed: 12/12/2022]
Abstract
Breast cancer is a highly heterogeneous disease. Subtyping the disease and identifying the genomic features driving these subtypes are critical for precision oncology for breast cancer. This study focuses on developing a new computational approach for breast cancer subtyping. We proposed to use Bayesian tensor factorization (BTF) to integrate multi-omics data of breast cancer, which include expression profiles of RNA-sequencing, copy number variation, and DNA methylation measured on 762 breast cancer patients from The Cancer Genome Atlas. We applied a consensus clustering approach to identify breast cancer subtypes using the factorized latent features by BTF. Subtype-specific survival patterns of the breast cancer patients were evaluated using Kaplan-Meier (KM) estimators. The proposed approach was compared with other state-of-the-art approaches for cancer subtyping. The BTF-subtyping analysis identified 17 optimized latent components, which were used to reveal six major breast cancer subtypes. Out of all different approaches, only the proposed approach showed distinct survival patterns (p < 0.05). Statistical tests also showed that the identified clusters have statistically significant distributions. Our results showed that the proposed approach is a promising strategy to efficiently use publicly available multi-omics data to identify breast cancer subtypes.
Collapse
Affiliation(s)
- Qian Liu
- Department of Biochemistry and Medical Genetics, University of Manitoba, Winnipeg, Canada; Department of Computer Science, University of Manitoba, Winnipeg, Manitoba, Canada
| | - Bowen Cheng
- Dalla Lana School of Public Health, University of Toronto, Toronto, Ontario, Canada
| | - Yongwon Jin
- Department of Biochemistry and Medical Genetics, University of Manitoba, Winnipeg, Canada
| | - Pingzhao Hu
- Department of Biochemistry and Medical Genetics, University of Manitoba, Winnipeg, Canada; Department of Computer Science, University of Manitoba, Winnipeg, Manitoba, Canada; Dalla Lana School of Public Health, University of Toronto, Toronto, Ontario, Canada; CancerCare Manitoba Research Institute, Winnipeg, Manitoba, Canada.
| |
Collapse
|
9
|
The Blood Gene Expression Signature for Kawasaki Disease in Children Identified with Advanced Feature Selection Methods. BIOMED RESEARCH INTERNATIONAL 2021; 2020:6062436. [PMID: 32685506 PMCID: PMC7327570 DOI: 10.1155/2020/6062436] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/26/2020] [Accepted: 06/12/2020] [Indexed: 01/22/2023]
Abstract
Kawasaki disease (KD) is an acute vasculitis, accompanied by coronary artery aneurysm, coronary artery dilatation, arrhythmia, and other serious cardiovascular diseases. So far, the etiology of KD is unclear; it is necessary to study the molecular mechanism and related factors of KD. In this study, we analyzed the expression profiles of 75 DB (identifying bacteria), 122 DV (identifying virus), 71 HC (healthy control), and 311 KD (Kawasaki disease) samples. 332 key genes related to KD and pathogen infections were identified using a combination of advanced feature selection methods: (1) Boruta, (2) Monte-Carlo Feature Selection (MCFS), and (3) Incremental Feature Selection (IFS). The number of signature genes was narrowed down step by step. Subsequently, their functions were revealed by KEGG and GO enrichment analyses. Our results provided clues of potential molecular mechanisms of KD and were helpful for KD detection and treatment.
Collapse
|
10
|
Li D, Lin H, Li L. Multiple Feature Selection Strategies Identified Novel Cardiac Gene Expression Signature for Heart Failure. Front Physiol 2020; 11:604241. [PMID: 33304275 PMCID: PMC7693561 DOI: 10.3389/fphys.2020.604241] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2020] [Accepted: 10/15/2020] [Indexed: 02/02/2023] Open
Abstract
Heart failure (HF) is a serious condition in which the support of blood pumped by the heart is insufficient to meet the demands of body at a normal cardiac filling pressure. Approximately 26 million patients worldwide are suffering from heart failure and about 17–45% of patients with heart failure die within 1-year, and the majority die within 5-years admitted to a hospital. The molecular mechanisms underlying the progression of heart failure have been poorly studied. We compared the gene expression profiles between patients with heart failure (n = 177) and without heart failure (n = 136) using multiple feature selection strategies and identified 38 HF signature genes. The support vector machine (SVM) classifier based on these 38 genes evaluated with leave-one-out cross validation (LOOCV) achieved great performance with sensitivity of 0.983 and specificity of 0.963. The network analysis suggested that the hub gene SMOC2 may play important roles in HF. Other genes, such as FCN3, HMGN2, and SERPINA3, also showed great promises. Our results can facilitate the early detection of heart failure and can reveal its molecular mechanisms.
Collapse
Affiliation(s)
- Dan Li
- Department of Cardiovascular Medicine, First Hospital Affiliated to Harbin Medical University, Harbin, China
| | - Hong Lin
- Internal Medicine-Cardiovascular Department, Harbin Chest Hospital, Harbin, China
| | - Luyifei Li
- Department of Cardiovascular Medicine, First Hospital Affiliated to Harbin Medical University, Harbin, China
| |
Collapse
|
11
|
Xia Q, Shu Z, Ye T, Zhang M. Identification and Analysis of the Blood lncRNA Signature for Liver Cirrhosis and Hepatocellular Carcinoma. Front Genet 2020; 11:595699. [PMID: 33365048 PMCID: PMC7750531 DOI: 10.3389/fgene.2020.595699] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2020] [Accepted: 10/13/2020] [Indexed: 12/12/2022] Open
Abstract
As one of the most common malignant tumors, hepatocellular carcinoma (HCC) is the fifth major cause of cancer-associated mortality worldwide. In 90% of cases, HCC develops in the context of liver cirrhosis and chronic hepatitis B virus (HBV) infection is an important etiology for cirrhosis and HCC, accounting for 53% of all HCC cases. To understand the underlying mechanisms of the dynamic chain reactions from normal to HBV infection, from HBV infection to liver cirrhosis, from liver cirrhosis to HCC, we analyzed the blood lncRNA expression profiles from 38 healthy control samples, 45 chronic hepatitis B patients, 46 liver cirrhosis patients, and 46 HCC patients. Advanced machine-learning methods including Monte Carlo feature selection, incremental feature selection (IFS), and support vector machine (SVM) were applied to discover the signature associated with HCC progression and construct the prediction model. One hundred seventy-one key HCC progression-associated lncRNAs were identified and their overall accuracy was 0.823 as evaluated with leave-one-out cross validation (LOOCV). The accuracies of the lncRNA signature for healthy control, chronic hepatitis B, liver cirrhosis, and HCC were 0.895, 0.711, 0.870, and 0.826, respectively. The 171-lncRNA signature is not only useful for early detection and intervention of HCC, but also helpful for understanding the multistage tumorigenic processes of HCC.
Collapse
Affiliation(s)
- Qi Xia
- State Key Laboratory for Diagnosis and Treatment of Infectious Diseases, National Clinical Research Center for Infectious Diseases, Collaborative Innovation Center for Diagnosis and Treatment of Infectious Diseases, The First Affiliated Hospital, College of Medicine, Zhejiang University, Hangzhou, China.,Key Laboratory for Biomedical Engineering of Ministry of Education, Zhejiang University, Hangzhou, China.,Zhejiang University, Hangzhou, China
| | - Zheyue Shu
- Zhejiang University, Hangzhou, China.,Division of Hepatobiliary and Pancreatic Surgery, Department of Surgery, The First Affiliated Hospital, College of Medicine, Zhejiang University, Hangzhou, China.,Key Laboratory of Combined Multi-Organ Transplantation, Ministry of Public Health, Hangzhou, China
| | - Ting Ye
- Zhejiang University, Hangzhou, China
| | - Min Zhang
- Zhejiang University, Hangzhou, China.,Division of Hepatobiliary and Pancreatic Surgery, Department of Surgery, The First Affiliated Hospital, College of Medicine, Zhejiang University, Hangzhou, China.,Key Laboratory of Combined Multi-Organ Transplantation, Ministry of Public Health, Hangzhou, China
| |
Collapse
|
12
|
Katz L, Woolman M, Tata A, Zarrine-Afsar A. Potential impact of tissue molecular heterogeneity on ambient mass spectrometry profiles: a note of caution in choosing the right disease model. Anal Bioanal Chem 2020; 413:2655-2664. [PMID: 33247337 DOI: 10.1007/s00216-020-03054-0] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2020] [Revised: 11/02/2020] [Accepted: 11/10/2020] [Indexed: 02/07/2023]
Abstract
This review provides a summary of known molecular alterations in commonly used cancer models and strives to stipulate how they may affect ambient mass spectrometry profiles. Immortalized cell lines are known to accumulate mutations, and xenografts derived from cell lines are known to contain tumour microenvironment elements from the host animal. While the use of human specimens for mass spectrometry profiling studies is highly encouraged, patient-derived xenografts with low passage numbers could provide an alternative means of amplifying material for ambient MS research when needed. Similarly, genetic preservation of patient tissue seen in some organoid models, further verified by qualitative proteomic and transcriptomic analyses, may argue in favor of organoid suitability for certain ambient profiling studies. However, to choose the appropriate model, pre-evaluation of the model's molecular characteristics in the context of the research question(s) being asked will likely provide the most appropriate strategy to move research forward. This can be achieved by performing comparative ambient MS analysis of the disease model of choice against a small amount of patient tissue to verify concordance. Disease models, however, will continue to be useful tools to orthogonally validate metabolic states of patient tissues through controlled genetic alterations that are not possible with patient specimens.
Collapse
Affiliation(s)
- Lauren Katz
- Techna Institute for the Advancement of Technology for Health, University Health Network, 100 College Street, Toronto, ON, M5G 1P5, Canada.,Department of Medical Biophysics, University of Toronto, 101 College Street, Toronto, ON, M5G 1L7, Canada
| | - Michael Woolman
- Techna Institute for the Advancement of Technology for Health, University Health Network, 100 College Street, Toronto, ON, M5G 1P5, Canada.,Department of Medical Biophysics, University of Toronto, 101 College Street, Toronto, ON, M5G 1L7, Canada
| | - Alessandra Tata
- Laboratorio di Chimica Sperimentale, Istituto Zooprofilattico delle Venezie, Viale Fiume 78, 36100, Vicenza, Italy
| | - Arash Zarrine-Afsar
- Techna Institute for the Advancement of Technology for Health, University Health Network, 100 College Street, Toronto, ON, M5G 1P5, Canada. .,Department of Medical Biophysics, University of Toronto, 101 College Street, Toronto, ON, M5G 1L7, Canada. .,Department of Surgery, University of Toronto, 149 College Street, Toronto, ON, M5T 1P5, Canada. .,Keenan Research Center for Biomedical Science & the Li Ka Shing Knowledge Institute, St. Michael's Hospital, 30 Bond Street, Toronto, ON, M5B 1W8, Canada.
| |
Collapse
|
13
|
Wu Z, Shou L, Wang J, Huang T, Xu X. The Methylation Pattern for Knee and Hip Osteoarthritis. Front Cell Dev Biol 2020; 8:602024. [PMID: 33240895 PMCID: PMC7677303 DOI: 10.3389/fcell.2020.602024] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2020] [Accepted: 10/22/2020] [Indexed: 01/08/2023] Open
Abstract
Osteoarthritis is one of the most prevalent chronic joint diseases for middle-aged and elderly people. But in recent years, the number of young people suffering from the disease increases quickly. It is known that osteoarthritis is a common degenerative disease caused by the combination and interaction of many factors such as natural and environmental factors. DNA methylations reflect the effects of environmental factors. Several researches on DNA methylation at specific genes in OA cartilage indicated the great potential roles of DNA methylation in OA. To systematically investigate the methylation pattern in knee and hip osteoarthritis, we analyzed the methylation profiles in cartilage of 16 OA hip samples, 19 control hip samples and 62 OA knee samples. 12 discriminative methylation sites were identified using advanced minimal Redundancy Maximal Relevance (mRMR) and Incremental Feature Selection (IFS) methods. The SVM classifier of these 12 methylation sites from genes like MEIS1, GABRG3, RXRA, and EN1, can perfectly classify the OA hip samples, control hip samples and OA knee samples evaluated with LOOCV (Leave-One Out-Cross Validation). These 12 methylation sites can not only serve as biomarker, but also provide underlying mechanism of OA.
Collapse
Affiliation(s)
- Zhen Wu
- Departmemt of Orthopaedics, Tongde Hospital of Zhejiang Province, Hangzhou, China
| | - Lu Shou
- Departmemt of Pneumology, Tongde Hospital of Zhejiang Province, Hangzhou, China
| | - Jian Wang
- Departmemt of Orthopaedics, Tongde Hospital of Zhejiang Province, Hangzhou, China
| | - Tao Huang
- Shanghai Institute of Nutrition and Health, Chinese Academy of Sciences, Shanghai, China
| | - Xinwei Xu
- Departmemt of Orthopaedics, Tongde Hospital of Zhejiang Province, Hangzhou, China
| |
Collapse
|
14
|
Zhu JH, Yan QL, Wang JW, Chen Y, Ye QH, Wang ZJ, Huang T. The Key Genes for Perineural Invasion in Pancreatic Ductal Adenocarcinoma Identified With Monte-Carlo Feature Selection Method. Front Genet 2020; 11:554502. [PMID: 33193628 PMCID: PMC7593847 DOI: 10.3389/fgene.2020.554502] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2020] [Accepted: 08/17/2020] [Indexed: 12/20/2022] Open
Abstract
Background Pancreatic ductal adenocarcinoma (PDAC) is the most aggressive form of pancreatic cancer. Its 5-year survival rate is only 3–5%. Perineural invasion (PNI) is a process of cancer cells invading the surrounding nerves and perineural spaces. It is considered to be associated with the poor prognosis of PDAC. About 90% of pancreatic cancer patients have PNI. The high incidence of PNI in pancreatic cancer limits radical resection and promotes local recurrence, which negatively affects life quality and survival time of the patients with pancreatic cancer. Objectives To investigate the mechanism of PNI in pancreatic cancer, we analyzed the gene expression profiles of tumors and adjacent tissues from 50 PDAC patients which included 28 patients with perineural invasion and 22 patients without perineural invasion. Method Using Monte-Carlo feature selection and Incremental Feature Selection (IFS) method, we identified 26 key features within which 15 features were from tumor tissues and 11 features were from adjacent tissues. Results Our results suggested that not only the tumor tissue, but also the adjacent tissue, was informative for perineural invasion prediction. The SVM classifier based on these 26 key features can predict perineural invasion accurately, with a high accuracy of 0.94 evaluated with leave-one-out cross validation (LOOCV). Conclusion The in-depth biological analysis of key feature genes, such as TNFRSF14, XPO1, and ATF3, shed light on the understanding of perineural invasion in pancreatic ductal adenocarcinoma.
Collapse
Affiliation(s)
- Jin-Hui Zhu
- Department of General Surgery, The Second Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, China
| | - Qiu-Liang Yan
- Department of General Surgery, Jinhua People's Hospital, Jinhua, China
| | - Jian-Wei Wang
- Department of Surgical Oncology, The Second Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, China
| | - Yan Chen
- Department of General Surgery, The Second Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, China
| | - Qing-Huang Ye
- Department of General Surgery, The Second Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, China
| | - Zhi-Jiang Wang
- Department of General Surgery, The Second Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, China
| | - Tao Huang
- Shanghai Institute of Nutrition and Health, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, China
| |
Collapse
|
15
|
Li S, Jiang L, Tang J, Gao N, Guo F. Kernel Fusion Method for Detecting Cancer Subtypes via Selecting Relevant Expression Data. Front Genet 2020; 11:979. [PMID: 33133130 PMCID: PMC7511763 DOI: 10.3389/fgene.2020.00979] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2020] [Accepted: 08/03/2020] [Indexed: 12/19/2022] Open
Abstract
Recently, cancer has been characterized as a heterogeneous disease composed of many different subtypes. Early diagnosis of cancer subtypes is an important study of cancer research, which can be of tremendous help to patients after treatment. In this paper, we first extract a novel dataset, which contains gene expression, miRNA expression, and isoform expression of five cancers from The Cancer Genome Atlas (TCGA). Next, to avoid the effect of noise existing in 60, 483 genes, we select a small number of genes by using LASSO that employs gene expression and survival time of patients. Then, we construct one similarity kernel for each expression data by using Chebyshev distance. And also, We used SKF to fused the three similarity matrix composed of gene, Iso, and miRNA, and finally clustered the fused similarity matrix with spectral clustering. In the experimental results, our method has better P-value in the Cox model than other methods on 10 cancer data from Jiang Dataset and Novel Dataset. We have drawn different survival curves for different cancers and found that some genes play a key role in cancer. For breast cancer, we find out that HSPA2A, RNASE1, CLIC6, and IFITM1 are highly expressed in some specific groups. For lung cancer, we ensure that C4BPA, SESN3, and IRS1 are highly expressed in some specific groups. The code and all supporting data files are available from https://github.com/guofei-tju/Uncovering-Cancer-Subtypes-via-LASSO.
Collapse
Affiliation(s)
- Shuhao Li
- School of Computer Science and Technology, College of Intelligence and Computing, Tianjin University, Tianjin, China
| | - Limin Jiang
- School of Computer Science and Technology, College of Intelligence and Computing, Tianjin University, Tianjin, China
| | - Jijun Tang
- School of Computer Science and Technology, College of Intelligence and Computing, Tianjin University, Tianjin, China.,Department of Computer Science and Engineering, University of South Carolina, Columbia, SC, United States
| | - Nan Gao
- School of Computer Science and Technology, Zhejiang University of Technology, Hangzhou, China
| | - Fei Guo
- School of Computer Science and Technology, College of Intelligence and Computing, Tianjin University, Tianjin, China
| |
Collapse
|
16
|
Establishment and Analysis of a Combined Diagnostic Model of Polycystic Ovary Syndrome with Random Forest and Artificial Neural Network. BIOMED RESEARCH INTERNATIONAL 2020; 2020:2613091. [PMID: 32884937 PMCID: PMC7455828 DOI: 10.1155/2020/2613091] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/30/2020] [Revised: 07/27/2020] [Accepted: 08/03/2020] [Indexed: 12/14/2022]
Abstract
Polycystic ovary syndrome (PCOS) is one of the most common metabolic and reproductive endocrinopathies. However, few studies have tried to develop a diagnostic model based on gene biomarkers. In this study, we applied a computational method by combining two machine learning algorithms, including random forest (RF) and artificial neural network (ANN), to identify gene biomarkers and construct diagnostic model. We collected gene expression data from Gene Expression Omnibus (GEO) database containing 76 PCOS samples and 57 normal samples; five datasets were utilized, including one dataset for screening differentially expressed genes (DEGs), two training datasets, and two validation datasets. Firstly, based on RF, 12 key genes in 264 DEGs were identified to be vital for classification of PCOS and normal samples. Moreover, the weights of these key genes were calculated using ANN with microarray and RNA-seq training dataset, respectively. Furthermore, the diagnostic models for two types of datasets were developed and named neuralPCOS. Finally, two validation datasets were used to test and compare the performance of neuralPCOS with other two set of marker genes by area under curve (AUC). Our model achieved an AUC of 0.7273 in microarray dataset, and 0.6488 in RNA-seq dataset. To conclude, we uncovered gene biomarkers and developed a novel diagnostic model of PCOS, which would be helpful for diagnosis.
Collapse
|
17
|
Ren X, Wang S, Huang T. Decipher the connections between proteins and phenotypes. BIOCHIMICA ET BIOPHYSICA ACTA-PROTEINS AND PROTEOMICS 2020; 1868:140503. [PMID: 32707349 DOI: 10.1016/j.bbapap.2020.140503] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/31/2020] [Revised: 06/30/2020] [Accepted: 07/16/2020] [Indexed: 10/23/2022]
Abstract
As the outward-most representation of life, phenotype is the fundamental basis with which humans understand life and disease. But with the advent of molecular and sequencing technique and research, a growing portion of science research focuses primarily on the molecular level of life. Our understanding in molecular variations and mechanisms can only be fully utilized when they are translated into the phenotypic level. In this study, we constructed similarity network for phenotype ontology, and then applied network analysis methods to discover phenotype/disease clusters. Then, we used machine learning models to predict protein-phenotype associations. Each protein was characterized by the functional profiles of its interaction neighbors on the protein-protein interaction network. Our methods can not only predict protein-phenotype associations, but also reveal the underlying mechanisms from protein to phenotype.
Collapse
Affiliation(s)
- Xiaohui Ren
- Shanghai Institute of Nutrition and Health, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, China
| | - Steven Wang
- Department of Molecular Biology, Columbia University, New York, USA
| | - Tao Huang
- Shanghai Institute of Nutrition and Health, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, China.
| |
Collapse
|
18
|
Pan X, Zeng T, Zhang YH, Chen L, Feng K, Huang T, Cai YD. Investigation and Prediction of Human Interactome Based on Quantitative Features. Front Bioeng Biotechnol 2020; 8:730. [PMID: 32766217 PMCID: PMC7379396 DOI: 10.3389/fbioe.2020.00730] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2020] [Accepted: 06/09/2020] [Indexed: 01/27/2023] Open
Abstract
Protein is one of the most significant components of all living creatures. All significant and essential biological structures and functions relies on proteins and their respective biological functions. However, proteins cannot perform their unique biological significance independently. They have to interact with each other to realize the complicated biological processes in all living creatures including human beings. In other words, proteins depend on interactions (protein-protein interactions) to realize their significant effects. Thus, the significance comparison and quantitative contribution of candidate PPI features must be determined urgently. According to previous studies, 258 physical and chemical characteristics of proteins have been reported and confirmed to definitively affect the interaction efficiency of the related proteins. Among such features, essential physiochemical features of proteins like stoichiometric balance, protein abundance, molecular weight and charge distribution have been validated to be quite significant and irreplaceable for protein-protein interactions (PPIs). Therefore, in this study, we, on one hand, presented a novel computational framework to identify the key factors affecting PPIs with Boruta feature selection (BFS), Monte Carlo feature selection (MCFS), incremental feature selection (IFS), and on the other hand, built a quantitative decision-rule system to evaluate the potential PPIs under real conditions with random forest (RF) and RIPPER algorithms, thereby supplying several new insights into the detailed biological mechanisms of complicated PPIs. The main datasets and codes can be downloaded at https://github.com/xypan1232/Mass-PPI.
Collapse
Affiliation(s)
- Xiaoyong Pan
- School of Life Sciences, Shanghai University, Shanghai, China.,Key Laboratory of System Control and Information Processing, Ministry of Education of China, Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, Shanghai, China
| | - Tao Zeng
- Key Laboratory of Systems Biology, Institute of Biochemistry and Cell Biology, Chinese Academy of Sciences, Shanghai, China
| | - Yu-Hang Zhang
- Shanghai Institute of Nutrition and Health, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, China
| | - Lei Chen
- College of Information Engineering, Shanghai Maritime University, Shanghai, China
| | - Kaiyan Feng
- Department of Computer Science, Guangdong AIB Polytechnic, Guangzhou, China
| | - Tao Huang
- Shanghai Institute of Nutrition and Health, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, China
| | - Yu-Dong Cai
- School of Life Sciences, Shanghai University, Shanghai, China
| |
Collapse
|
19
|
Fajarda O, Duarte-Pereira S, Silva RM, Oliveira JL. Merging microarray studies to identify a common gene expression signature to several structural heart diseases. BioData Min 2020; 13:8. [PMID: 32670412 PMCID: PMC7346458 DOI: 10.1186/s13040-020-00217-8] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2020] [Accepted: 06/05/2020] [Indexed: 12/22/2022] Open
Abstract
BACKGROUND Heart disease is the leading cause of death worldwide. Knowing a gene expression signature in heart disease can lead to the development of more efficient diagnosis and treatments that may prevent premature deaths. A large amount of microarray data is available in public repositories and can be used to identify differentially expressed genes. However, most of the microarray datasets are composed of a reduced number of samples and to obtain more reliable results, several datasets have to be merged, which is a challenging task. The identification of differentially expressed genes is commonly done using statistical methods. Nonetheless, these methods are based on the definition of an arbitrary threshold to select the differentially expressed genes and there is no consensus on the values that should be used. RESULTS Nine publicly available microarray datasets from studies of different heart diseases were merged to form a dataset composed of 689 samples and 8354 features. Subsequently, the adjusted p-value and fold change were determined and by combining a set of adjusted p-values cutoffs with a list of different fold change thresholds, 12 sets of differentially expressed genes were obtained. To select the set of differentially expressed genes that has the best accuracy in classifying samples from patients with heart diseases and samples from patients with no heart condition, the random forest algorithm was used. A set of 62 differentially expressed genes having a classification accuracy of approximately 95% was identified. CONCLUSIONS We identified a gene expression signature common to different cardiac diseases and supported our findings by showing their involvement in the pathophysiology of the heart. The approach used in this study is suitable for the identification of gene expression signatures, and can be extended to different diseases.
Collapse
Affiliation(s)
- Olga Fajarda
- IEETA/DETI, University of Aveiro, Aveiro, 3810-193 Portugal
| | - Sara Duarte-Pereira
- IEETA/DETI, University of Aveiro, Aveiro, 3810-193 Portugal
- Department of Medical Sciences and iBiMED-Institute of Biomedicine, University of Aveiro, Aveiro, 3810-193 Portugal
| | - Raquel M. Silva
- IEETA/DETI, University of Aveiro, Aveiro, 3810-193 Portugal
- Department of Medical Sciences and iBiMED-Institute of Biomedicine, University of Aveiro, Aveiro, 3810-193 Portugal
- Current Address: Universidade Católica Portuguesa, Faculdade de Medicina Dentária, CIIS-Centro de Investigação Interdisciplinar em Saúde, Campus de Viseu, Viseu, 3504-505 Portugal
| | | |
Collapse
|
20
|
Li M, Chen F, Zhang Y, Xiong Y, Li Q, Huang H. Identification of Post-myocardial Infarction Blood Expression Signatures Using Multiple Feature Selection Strategies. Front Physiol 2020; 11:483. [PMID: 32581823 PMCID: PMC7287215 DOI: 10.3389/fphys.2020.00483] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2020] [Accepted: 04/20/2020] [Indexed: 12/24/2022] Open
Abstract
Myocardial infarction (MI) is a type of serious heart attack in which the blood flow to the heart is suddenly interrupted, resulting in injury to the heart muscles due to a lack of oxygen supply. Although clinical diagnosis methods can be used to identify the occurrence of MI, using the changes of molecular markers or characteristic molecules in blood to characterize the early phase and later trend of MI will help us choose a more reasonable treatment plan. Previously, comparative transcriptome studies focused on finding differentially expressed genes between MI patients and healthy people. However, signature molecules altered in different phases of MI have not been well excavated. We developed a set of computational approaches integrating multiple machine learning algorithms, including Monte Carlo feature selection (MCFS), incremental feature selection (IFS), and support vector machine (SVM), to identify gene expression characteristics on different phases of MI. 134 genes were determined to serve as features for building optimal SVM classifiers to distinguish acute MI and post-MI. Subsequently, functional enrichment analyses followed by protein-protein interaction analysis on 134 genes identified several hub genes (IL1R1, TLR2, and TLR4) associated with progression of MI, which can be used as new diagnostic molecules for MI.
Collapse
Affiliation(s)
- Ming Li
- Department of Cardiology, Eastern Hospital, Sichuan Academy of Medical Sciences & Sichuan Provincial People's Hospital, Chengdu, China
| | - Fuli Chen
- Department of Cardiology, Sichuan Academy of Medical Sciences & Sichuan Provincial People's Hospital, Chengdu, China
| | - Yaling Zhang
- Department of Nephrology, Eastern Hospital, Sichuan Academy of Medical Sciences & Sichuan Provincial People's Hospital, Chengdu, China
| | - Yan Xiong
- Department of Cardiology, Sichuan Academy of Medical Sciences & Sichuan Provincial People's Hospital, Chengdu, China
| | - Qiyong Li
- Department of Cardiology, Sichuan Academy of Medical Sciences & Sichuan Provincial People's Hospital, Chengdu, China
| | - Hui Huang
- Department of Cardiology, Sichuan Academy of Medical Sciences & Sichuan Provincial People's Hospital, Chengdu, China
| |
Collapse
|
21
|
Retained or altered expression of major histocompatibility complex class I in patient-derived xenograft models in breast cancer. Immunol Res 2020; 67:469-477. [PMID: 31900802 DOI: 10.1007/s12026-019-09109-4] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2023]
Abstract
The expression of major histocompatibility complex class I (MHC I) in tumor cells is regulated by interferon signaling, and it is an important factor in the efficacy of cytotoxic T cell-dependent immunotherapy. To determine the impact of immune cells in MHC I expression on tumor cells, we compared the expression of MHC I in tumor cells derived from primary breast cancers and patient-derived xenograft (PDX) models. MHC I and myxovirus resistance gene A (MxA) expression were analyzed using immunohistochemistry in 23 cases of tumor tissue and corresponding primary and secondary PDXs. The median H score of MHC I was 210 (0-300) in patient tumor tissues, 197.5 (0-300) in primary PDX tumors, and 157.5 (5-300) in secondary PDX tumors. Cases were divided into four groups based on the difference in MHC I expression between the patient tumor tissues and secondary PDXs. Eleven cases constituted the high MHC I group, four constituted the low MHC I group, six comprised the decreased MHC I group, and two comprised the increased MHC I group. MHC I and MxA expressions in each tumor were weakly correlated within patients' tumors, while strongly correlated within PDX models. Retained or altered expression of MHC I in breast cancer PDXs reveals the presence of intrinsic and extrinsic interferon signaling pathways in tumor cells. Thus, considering MHC I expression in PDX is important when using PDX models to evaluate the efficacy of cancer immunotherapy in a preclinical setting.
Collapse
|
22
|
Analysis of gene expression profiles of lung cancer subtypes with machine learning algorithms. Biochim Biophys Acta Mol Basis Dis 2020; 1866:165822. [PMID: 32360590 DOI: 10.1016/j.bbadis.2020.165822] [Citation(s) in RCA: 26] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2020] [Revised: 04/13/2020] [Accepted: 04/22/2020] [Indexed: 12/14/2022]
Abstract
Lung cancer is one of the most common cancer types worldwide and causes more than one million deaths annually. Lung adenocarcinoma (AC) and lung squamous cell cancer (SCC) are two major lung cancer subtypes and have different characteristics in several aspects. Identifying their differentially expressed genes and different gene expression patterns can deepen our understanding of these two subtypes at the transcriptomic level. In this work, we used several machine learning algorithms to investigate the gene expression profiles of lung AC and lung SCC samples retrieved from Gene Expression Omnibus. First, the profiles were analyzed by using a powerful feature selection method, namely, Monte Carlo feature selection. A feature list, ranking all features according to their importance, and some informative features were obtained. Then, the feature list was used in the incremental feature selection method to extract optimal features, which can allow the support vector machine (SVM) to yield the best performance for classifying lung AC and lung SCC samples. Some top genes (CSTA, TP63, SERPINB13, CLCA2, BICD2, PERP, FAT2, BNC1, ATP11B, FAM83B, KRT5, PARD6G, PKP1) were extensively analyzed to prove that they can be differentially expressed genes between lung AC and lung SCC. Meanwhile, a rule learning procedure was applied on informative features to construct the classification rules. These rules provide a clear procedure of classification and show some different gene expression patterns between lung AC and lung SCC.
Collapse
|
23
|
Xue W, Ton H, Zhang J, Xie T, Chen X, Zhou B, Guo Y, Fang J, Wang S, Zhang W. Patient‑derived orthotopic xenograft glioma models fail to replicate the magnetic resonance imaging features of the original patient tumor. Oncol Rep 2020; 43:1619-1629. [PMID: 32323818 PMCID: PMC7107810 DOI: 10.3892/or.2020.7538] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2019] [Accepted: 02/12/2020] [Indexed: 12/14/2022] Open
Abstract
Patient-derived orthotopic glioma xenograft models are important platforms used for pre-clinical research of glioma. In the present study, the diagnostic ability of magnetic resonance imaging (MRI) was examined with regard to the identification of biomarkers obtained from patient-derived glioma xenografts and human tumors. Conventional MRI, diffusion weighted imaging and dynamic contrast-enhanced (DCE)-MRI were used to analyze seven pairs of high grade gliomas with their corresponding xenografts obtained from non-obese diabetic-severe-combined immunodeficiency nude mice. Tumor samples were collected for transcriptome sequencing and histopathological staining, and differentially expressed genes were screened between the original tumors and the corresponding xenografts. Gene Ontology (GO) analysis was performed to predict the functions of these genes. In 6 cases of xenografts with diffuse growth, the degree of enhancement was significantly lower compared with the original tumors. Histopathological staining indicated that the microvascular area and microvascular diameter of the xenografts were significantly lower compared with the original tumors (P=0.009 and P=0.007, respectively). In one case, there was evidence of nodular tumor growth in the mouse. Both MRI and histopathological staining showed a clear demarcation between the transplanted tumors and the normal brain tissues. The relative apparent diffusion coefficient values of the 7 cases examined were significantly higher compared with the corresponding original tumors (P=0.001) and transfer coefficient values derived from DCE-MRI of the tumor area was significantly lower compared with the original tumors (P=0.016). GO analysis indicated that the expression levels of extracellular matrix-associated genes, angiogenesis-associated genes and immune function-associated genes in the original tumors were higher compared with the corresponding xenografts. In conclusion, the data demonstrated that the MRI features of patient-derived xenograft glioma models in mice were different compared with those of the original patient tumors. Differential gene expression may underlie the differences noted in the MRI features between original tumors and corresponding xenografts. The results of the present study highlight the precautions that should be taken when extrapolating data from patient-derived xenograft studies, and their applicability to humans.
Collapse
Affiliation(s)
- Wei Xue
- Department of Radiology, Daping Hospital, Army Medical University, Chongqing 400042, P.R. China
| | - Haipeng Ton
- Department of Radiology, Daping Hospital, Army Medical University, Chongqing 400042, P.R. China
| | - Junfeng Zhang
- Department of Radiology, Daping Hospital, Army Medical University, Chongqing 400042, P.R. China
| | - Tian Xie
- Department of Radiology, Daping Hospital, Army Medical University, Chongqing 400042, P.R. China
| | - Xiao Chen
- Department of Radiology, Daping Hospital, Army Medical University, Chongqing 400042, P.R. China
| | - Bo Zhou
- Department of Radiology, Daping Hospital, Army Medical University, Chongqing 400042, P.R. China
| | - Yu Guo
- Department of Radiology, Daping Hospital, Army Medical University, Chongqing 400042, P.R. China
| | - Jingqin Fang
- Department of Radiology, Daping Hospital, Army Medical University, Chongqing 400042, P.R. China
| | - Shunan Wang
- Department of Radiology, Daping Hospital, Army Medical University, Chongqing 400042, P.R. China
| | - Weiguo Zhang
- Department of Radiology, Daping Hospital, Army Medical University, Chongqing 400042, P.R. China
| |
Collapse
|
24
|
Chen L, Pan X, Guo W, Gan Z, Zhang YH, Niu Z, Huang T, Cai YD. Investigating the gene expression profiles of cells in seven embryonic stages with machine learning algorithms. Genomics 2020; 112:2524-2534. [PMID: 32045671 DOI: 10.1016/j.ygeno.2020.02.004] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2019] [Revised: 12/26/2019] [Accepted: 02/07/2020] [Indexed: 12/15/2022]
Abstract
The development of embryonic cells involves several continuous stages, and some genes are related to embryogenesis. To date, few studies have systematically investigated changes in gene expression profiles during mammalian embryogenesis. In this study, a computational analysis using machine learning algorithms was performed on the gene expression profiles of mouse embryonic cells at seven stages. First, the profiles were analyzed through a powerful Monte Carlo feature selection method for the generation of a feature list. Second, increment feature selection was applied on the list by incorporating two classification algorithms: support vector machine (SVM) and repeated incremental pruning to produce error reduction (RIPPER). Through SVM, we extracted several latent gene biomarkers, indicating the stages of embryonic cells, and constructed an optimal SVM classifier that produced a nearly perfect classification of embryonic cells. Furthermore, some interesting rules were accessed by the RIPPER algorithm, suggesting different expression patterns for different stages.
Collapse
Affiliation(s)
- Lei Chen
- School of Life Sciences, Shanghai University, Shanghai 200444, China; College of Information Engineering, Shanghai Maritime University, Shanghai 201306, China; Shanghai Key Laboratory of PMMP, East China Normal University, Shanghai 200241, China.
| | - XiaoYong Pan
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, Key Laboratory of System Control and Information Processing, Ministry of Education of China, 200240 Shanghai, China.
| | - Wei Guo
- Institute of Health Sciences, Shanghai Jiao Tong University School of Medicine, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai 200031, China.
| | - Zijun Gan
- Shanghai Institute of Nutrition and Health, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai 200031, China.
| | - Yu-Hang Zhang
- Shanghai Institute of Nutrition and Health, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai 200031, China; Channing Division of Network Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA.
| | - Zhibin Niu
- College of Intelligence and Computing, Tianjin University, Tianjin 300072, China.
| | - Tao Huang
- Shanghai Institute of Nutrition and Health, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai 200031, China.
| | - Yu-Dong Cai
- School of Life Sciences, Shanghai University, Shanghai 200444, China.
| |
Collapse
|
25
|
Chen L, Pan X, Zeng T, Zhang YH, Zhang Y, Huang T, Cai YD. Immunosignature Screening for Multiple Cancer Subtypes Based on Expression Rule. Front Bioeng Biotechnol 2019; 7:370. [PMID: 31850330 PMCID: PMC6901955 DOI: 10.3389/fbioe.2019.00370] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2019] [Accepted: 11/13/2019] [Indexed: 12/13/2022] Open
Abstract
Liquid biopsy (i.e., fluid biopsy) involves a series of clinical examination approaches. Monitoring of cancer immunological status by the “immunosignature” of patients presents a novel method for tumor-associated liquid biopsy. The major work content and the core technological difficulties for the monitoring of cancer immunosignature are the recognition of cancer-related immune-activating antigens by high-throughput screening approaches. Currently, one key task of immunosignature-based liquid biopsy is the qualitative and quantitative identification of typical tumor-specific antigens. In this study, we reused two sets of peptide microarray data that detected the expression level of potential antigenic peptides derived from tumor tissues to avoid the detection differences induced by chip platforms. Several machine learning algorithms were applied on these two sets. First, the Monte Carlo Feature Selection (MCFS) method was used to analyze features in two sets. A feature list was obtained according to the MCFS results on each set. Second, incremental feature selection method incorporating one classification algorithm (support vector machine or random forest) followed to extract optimal features and construct optimal classifiers. On the other hand, the repeated incremental pruning to produce error reduction, a rule learning algorithm, was applied on key features yielded by the MCFS method to extract quantitative rules for accurate cancer immune monitoring and pathologic diagnosis. Finally, obtained key features and quantitative rules were extensively analyzed.
Collapse
Affiliation(s)
- Lei Chen
- School of Life Sciences, Shanghai University, Shanghai, China.,College of Information Engineering, Shanghai Maritime University, Shanghai, China.,Shanghai Key Laboratory of Pure Mathematics and Mathematical Practice (PMMP), East China Normal University, Shanghai, China
| | - XiaoYong Pan
- Key Laboratory of System Control and Information Processing, Ministry of Education of China, Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, Shanghai, China.,IDLab, Department for Electronics and Information Systems, Ghent University, Ghent, Belgium
| | - Tao Zeng
- Key Laboratory of Systems Biology, Institute of Biochemistry and Cell Biology, Chinese Academy of Sciences, Shanghai, China
| | - Yu-Hang Zhang
- Shanghai Institute of Nutrition and Health, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, China
| | - YunHua Zhang
- Anhui Province Key Laboratory of Farmland Ecological Conservation and Pollution Prevention, School of Resources and Environment, Anhui Agricultural University, Hefei, China
| | - Tao Huang
- Shanghai Institute of Nutrition and Health, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, China
| | - Yu-Dong Cai
- School of Life Sciences, Shanghai University, Shanghai, China
| |
Collapse
|
26
|
Li J, Wang D, Wang Y. IBI: Identification of Biomarker Genes in Individual Tumor Samples. Front Genet 2019; 10:1236. [PMID: 31850079 PMCID: PMC6902017 DOI: 10.3389/fgene.2019.01236] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2019] [Accepted: 11/07/2019] [Indexed: 12/12/2022] Open
Abstract
Individual patient biomarkers have an important role in personalized treatment. Although various high-throughput sequencing technologies are widely used in biological experiments, these are usually conducted only once or a few times for each patient, which makes it a challenging problem to identify biomarkers in individual patients. At present, there is a lack of effective methods to identify biomarkers in individual sample data. Here, we propose a novel method, IBI, to identify biomarkers in individual tumor samples. Experimental results from several tumor data sets showed that the proposed method could effectively find biomarker genes for individual patients, including common biomarkers related to the mechanisms of the development of cancer, which can be used to predict survival and drug response in patients. In summary, these results demonstrate that the proposed method offers a new perspective for analyzing individual samples.
Collapse
Affiliation(s)
- Jie Li
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Dong Wang
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Yadong Wang
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| |
Collapse
|
27
|
Identifying Methylation Pattern and Genes Associated with Breast Cancer Subtypes. Int J Mol Sci 2019; 20:ijms20174269. [PMID: 31480430 PMCID: PMC6747348 DOI: 10.3390/ijms20174269] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2019] [Revised: 08/19/2019] [Accepted: 08/29/2019] [Indexed: 12/18/2022] Open
Abstract
Breast cancer is regarded worldwide as a severe human disease. Various genetic variations, including hereditary and somatic mutations, contribute to the initiation and progression of this disease. The diagnostic parameters of breast cancer are not limited to the conventional protein content and can include newly discovered genetic variants and even genetic modification patterns such as methylation and microRNA. In addition, breast cancer detection extends to detailed breast cancer stratifications to provide subtype-specific indications for further personalized treatment. One genome-wide expression–methylation quantitative trait loci analysis confirmed that different breast cancer subtypes have various methylation patterns. However, recognizing clinically applied (methylation) biomarkers is difficult due to the large number of differentially methylated genes. In this study, we attempted to re-screen a small group of functional biomarkers for the identification and distinction of different breast cancer subtypes with advanced machine learning methods. The findings may contribute to biomarker identification for different breast cancer subtypes and provide a new perspective for differential pathogenesis in breast cancer subtypes.
Collapse
|
28
|
Gene selection for microarray data classification via adaptive hypergraph embedded dictionary learning. Gene 2019; 706:188-200. [DOI: 10.1016/j.gene.2019.04.060] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2018] [Revised: 04/03/2019] [Accepted: 04/22/2019] [Indexed: 01/19/2023]
|
29
|
Li J, Lu L, Zhang YH, Xu Y, Liu M, Feng K, Chen L, Kong X, Huang T, Cai YD. Identification of leukemia stem cell expression signatures through Monte Carlo feature selection strategy and support vector machine. Cancer Gene Ther 2019; 27:56-69. [PMID: 31138902 DOI: 10.1038/s41417-019-0105-y] [Citation(s) in RCA: 30] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2019] [Revised: 04/28/2019] [Accepted: 05/04/2019] [Indexed: 01/09/2023]
Abstract
Acute myeloid leukemia (AML) is a type of blood cancer characterized by the rapid growth of immature white blood cells from the bone marrow. Therapy resistance resulting from the persistence of leukemia stem cells (LSCs) are found in numerous patients. Comparative transcriptome studies have been previously conducted to analyze differentially expressed genes between LSC+ and LSC- cells. However, these studies mainly focused on a limited number of genes with the most obvious expression differences between the two cell types. We developed a computational approach incorporating several machine learning algorithms, including Monte Carlo feature selection (MCFS), incremental feature selection (IFS), support vector machine (SVM), Repeated Incremental Pruning to Produce Error Reduction (RIPPER), to identify gene expression features specific to LSCs. One thousand 0ne hudred fifty-nine features (genes) were first identified, which can be used to build the optimal SVM classifier for distinguishing LSC+ and LSC- cells. Among these 1159 genes, the top 17 genes were identified as LSC-specific biomarkers. In addition, six classification rules were produced by RIPPER algorithm. The subsequent literature review on these features/genes and the classification rules and functional enrichment analyses of the 1159 features/genes confirmed the relevance of extracted genes and rules to the characteristics of LSCs.
Collapse
Affiliation(s)
- JiaRui Li
- Shanghai Institute of Nutrition and Health, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, 200031, P. R. China.,School of Life Sciences, Shanghai University, Shanghai, 200444, P. R. China
| | - Lin Lu
- Department of Radiology, Columbia University Medical Center, New York, NY, 10032, USA
| | - Yu-Hang Zhang
- Shanghai Institute of Nutrition and Health, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, 200031, P. R. China
| | - YaoChen Xu
- Institute of Biochemistry and Cell Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, 200031, P. R. China
| | - Min Liu
- College of Information Engineering, Shanghai Maritime University, Shanghai, 201306, P. R. China
| | - KaiYan Feng
- Department of Computer Science, Guangdong AIB Polytechnic, Guangzhou, 510507, P. R. China
| | - Lei Chen
- College of Information Engineering, Shanghai Maritime University, Shanghai, 201306, P. R. China.,Shanghai Key Laboratory of PMMP, East China Normal University, Shanghai, 200241, P. R. China
| | - XiangYin Kong
- Shanghai Institute of Nutrition and Health, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, 200031, P. R. China.
| | - Tao Huang
- Shanghai Institute of Nutrition and Health, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, 200031, P. R. China.
| | - Yu-Dong Cai
- School of Life Sciences, Shanghai University, Shanghai, 200444, P. R. China.
| |
Collapse
|
30
|
Analysis of Expression Pattern of snoRNAs in Different Cancer Types with Machine Learning Algorithms. Int J Mol Sci 2019; 20:ijms20092185. [PMID: 31052553 PMCID: PMC6539089 DOI: 10.3390/ijms20092185] [Citation(s) in RCA: 26] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2019] [Revised: 04/29/2019] [Accepted: 04/30/2019] [Indexed: 01/17/2023] Open
Abstract
Small nucleolar RNAs (snoRNAs) are a new type of functional small RNAs involved in the chemical modifications of rRNAs, tRNAs, and small nuclear RNAs. It is reported that they play important roles in tumorigenesis via various regulatory modes. snoRNAs can both participate in the regulation of methylation and pseudouridylation and regulate the expression pattern of their host genes. This research investigated the expression pattern of snoRNAs in eight major cancer types in TCGA via several machine learning algorithms. The expression levels of snoRNAs were first analyzed by a powerful feature selection method, Monte Carlo feature selection (MCFS). A feature list and some informative features were accessed. Then, the incremental feature selection (IFS) was applied to the feature list to extract optimal features/snoRNAs, which can make the support vector machine (SVM) yield best performance. The discriminative snoRNAs included HBII-52-14, HBII-336, SNORD123, HBII-85-29, HBII-420, U3, HBI-43, SNORD116, SNORA73B, SCARNA4, HBII-85-20, etc., on which the SVM can provide a Matthew’s correlation coefficient (MCC) of 0.881 for predicting these eight cancer types. On the other hand, the informative features were fed into the Johnson reducer and repeated incremental pruning to produce error reduction (RIPPER) algorithms to generate classification rules, which can clearly show different snoRNAs expression patterns in different cancer types. The analysis results indicated that extracted discriminative snoRNAs can be important for identifying cancer samples in different types and the expression pattern of snoRNAs in different cancer types can be partly uncovered by quantitative recognition rules.
Collapse
|
31
|
Chen L, Pan X, Zhang YH, Kong X, Huang T, Cai YD. Tissue differences revealed by gene expression profiles of various cell lines. J Cell Biochem 2019; 120:7068-7081. [PMID: 30368905 DOI: 10.1002/jcb.27977] [Citation(s) in RCA: 22] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2018] [Accepted: 10/04/2018] [Indexed: 01/24/2023]
Abstract
Mechanisms through which tissues are formed and maintained remain unknown but are fundamental aspects in biology. Tissue-specific gene expression is a valuable tool to study such mechanisms. But in many biomedical studies, cell lines, rather than human body tissues, are used to investigate biological mechanisms Whether or not cell lines maintain their tissue-specific characteristics after they are isolated and cultured outside the human body remains to be explored. In this study, we applied a novel computational method to identify core genes that contribute to the differentiation of cell lines from various tissues. Several advanced computational techniques, such as Monte Carlo feature selection method, incremental feature selection method, and support vector machine (SVM) algorithm, were incorporated in the proposed method, which extensively analyzed the gene expression profiles of cell lines from different tissues. As a result, we extracted a group of functional genes that can indicate the differences of cell lines in different tissues and built an optimal SVM classifier for identifying cell lines in different tissues. In addition, a set of rules for classifying cell lines were also reported, which can give a clearer picture of cell lines in different issues although its performance was not better than the optimal SVM classifier. Finally, we compared such genes with the tissue-specific genes identified by the Genotype-tissue Expression project. Results showed that most expression patterns between tissues remained in the derived cell lines despite some uniqueness that some genes show tissue specificity.
Collapse
Affiliation(s)
- Lei Chen
- School of Life Sciences, Shanghai University, Shanghai, China.,College of Information Engineering, Shanghai Maritime University, Shanghai, China.,Shanghai Key Laboratory of PMMP, East China Normal University, Shanghai, China
| | - Xiaoyong Pan
- Department of Medical Informatics, Erasmus MC, Rotterdam, The Netherlands
| | - Yu-Hang Zhang
- Institute of Health Sciences, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, China
| | - Xiangyin Kong
- Institute of Health Sciences, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, China
| | - Tao Huang
- Institute of Health Sciences, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, China
| | - Yu-Dong Cai
- School of Life Sciences, Shanghai University, Shanghai, China
| |
Collapse
|
32
|
Chen L, Pan X, Zhang YH, Huang T, Cai YD. Analysis of Gene Expression Differences between Different Pancreatic Cells. ACS OMEGA 2019; 4:6421-6435. [DOI: 10.1021/acsomega.8b02171] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/30/2023]
Affiliation(s)
- Lei Chen
- School of Life Sciences, Shanghai University, Shanghai 200444, China
- College of Information Engineering, Shanghai Maritime University, Shanghai 201306, China
- Shanghai Key Laboratory of PMMP, East China Normal University, Shanghai 200241, China
| | - Xiaoyong Pan
- Department of Medical Informatics, Erasmus MC, Rotterdam 3014ZK, Netherlands
| | - Yu-Hang Zhang
- Institute of Health Sciences, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai 200031, China
| | - Tao Huang
- Institute of Health Sciences, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai 200031, China
| | - Yu-Dong Cai
- School of Life Sciences, Shanghai University, Shanghai 200444, China
| |
Collapse
|
33
|
Chen X, Jin Y, Feng Y. Evaluation of Plasma Extracellular Vesicle MicroRNA Signatures for Lung Adenocarcinoma and Granuloma With Monte-Carlo Feature Selection Method. Front Genet 2019; 10:367. [PMID: 31105742 PMCID: PMC6498093 DOI: 10.3389/fgene.2019.00367] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2018] [Accepted: 04/05/2019] [Indexed: 12/24/2022] Open
Abstract
Extracellular Vesicle (EV) is a compilation of secreted vesicles, including micro vesicles, large oncosomes, and exosomes. It can be used in non-invasive diagnosis. MicroRNAs (miRNAs) processed by exosomes can be detected by liquid biopsy. To objectively evaluate the discriminative ability of miRNAs from whole plasma, EV and EV-free plasma, we analyzed the miRNA expression profiles in whole plasma, EV and EV-free plasma of 10 lung adenocarcinoma and 9 granuloma patients. With Monte-Carlo feature selection method, the top discriminative miRNAs in whole plasma, EV and EV-free plasma were identified, and they were quite different. Using the Repeated Incremental Pruning to Produce Error Reduction (RIPPER) method, we learned the classification rules: in whole plasma, granuloma patients did not express hsa-miR-223-3p while the lung adenocarcinoma patients expressed hsa-miR-223-3p; in EV, the hsa-miR-23b-3p was highly expressed in granuloma patients but not lung adenocarcinoma patients; in EV-free plasma, hsa-miR-376a-3p was expressed in granuloma patients but barely expressed in lung adenocarcinoma patients. For prediction performance, whole plasma had the highest weighted accuracy and EV outperformed EV-free plasma. Our results suggested that EV can be used as lung cancer biomarker. However, since it is less stable and not easy to detect, there are still technological difficulties to overcome.
Collapse
Affiliation(s)
- Xiangbo Chen
- Key Laboratory of Molecular Epigenetics of the Ministry of Education, Northeast Normal University, Changchun, China.,Hangzhou Baocheng Biotechnology Co., Ltd., Hangzhou, China
| | - Yunjie Jin
- Department of Oncology, Shanghai Putuo People's Hospital, Shanghai, China
| | - Yu Feng
- Shuguang Hospital, Shanghai University of Traditional Chinese Medicine, Shanghai, China
| |
Collapse
|
34
|
The next generation personalized models to screen hidden layers of breast cancer tumorigenicity. Breast Cancer Res Treat 2019; 175:277-286. [PMID: 30810866 DOI: 10.1007/s10549-019-05159-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2018] [Accepted: 02/05/2019] [Indexed: 10/27/2022]
Abstract
BACKGROUND Breast cancer (BC) is a challenging disease and major cause of death amongst women worldwide who die due to tumor relapse or sidelong diseases. BC main complexity comes from the heterogeneous nature of breast tumors that demands customized treatments in the form of personalized medicine. REVIEW OF THE LITERATURE AND DISCUSSION Spatiotemporally dynamic and heterogeneous nature of BC tumors is shaped by their clonal evolution and sub-clonal selections and shapes resistance to collective or group therapies that drives cancer recurrence and tumor metastasis. Personalized intervention promises to administer medications that selectively target each individual patient tumor and even further each colonized secondary tumor. Such personalized regimens will require creation of in vitro and in vivo models genuinely recapitulating characteristics of each tumor type as initiating platforms for two main purposes: to closely monitor the tumorigenic processes that shape tumor heterogeneity and evolution as the main driving forces behind tumor chemo-resistance and relapse, and subsequently to establish patient-specific preventive and therapeutic measures. While application of tumor modeling for personalized drug screening and design requires a separate review, here we discuss the personalized utilities of xenograft modeling in investigating BC tumor formation and progression toward metastasis. We will further elaborate on the impact of innovative technologies on personalized modeling of BC tumorigenicity at improved resolution. CONCLUSION Heterogeneous nature of each BC tumor requires personalized intervention implying that modeling breast tumors is inevitable for better disease understanding, detection and cure. Patient-derived xenografts are just the initiating piece of the puzzle for ideal management of breast cancer. Emerging technologies promise to model BC more personalized than before.
Collapse
|
35
|
Mirza B, Wang W, Wang J, Choi H, Chung NC, Ping P. Machine Learning and Integrative Analysis of Biomedical Big Data. Genes (Basel) 2019; 10:E87. [PMID: 30696086 PMCID: PMC6410075 DOI: 10.3390/genes10020087] [Citation(s) in RCA: 142] [Impact Index Per Article: 28.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2018] [Revised: 01/08/2019] [Accepted: 01/21/2019] [Indexed: 12/11/2022] Open
Abstract
Recent developments in high-throughput technologies have accelerated the accumulation of massive amounts of omics data from multiple sources: genome, epigenome, transcriptome, proteome, metabolome, etc. Traditionally, data from each source (e.g., genome) is analyzed in isolation using statistical and machine learning (ML) methods. Integrative analysis of multi-omics and clinical data is key to new biomedical discoveries and advancements in precision medicine. However, data integration poses new computational challenges as well as exacerbates the ones associated with single-omics studies. Specialized computational approaches are required to effectively and efficiently perform integrative analysis of biomedical data acquired from diverse modalities. In this review, we discuss state-of-the-art ML-based approaches for tackling five specific computational challenges associated with integrative analysis: curse of dimensionality, data heterogeneity, missing data, class imbalance and scalability issues.
Collapse
Affiliation(s)
- Bilal Mirza
- NIH BD2K Center of Excellence for Biomedical Computing, University of California Los Angeles, Los Angeles, CA 90095, USA.
- Department of Physiology, University of California Los Angeles, Los Angeles, CA 90095, USA.
| | - Wei Wang
- NIH BD2K Center of Excellence for Biomedical Computing, University of California Los Angeles, Los Angeles, CA 90095, USA.
- Department of Computer Science, University of California Los Angeles, Los Angeles, CA 90095, USA.
- Scalable Analytics Institute (ScAi), University of California Los Angeles, Los Angeles, CA 90095, USA.
- Department of Bioinformatics, University of California Los Angeles, Los Angeles, CA 90095, USA.
| | - Jie Wang
- NIH BD2K Center of Excellence for Biomedical Computing, University of California Los Angeles, Los Angeles, CA 90095, USA.
- Department of Physiology, University of California Los Angeles, Los Angeles, CA 90095, USA.
| | - Howard Choi
- NIH BD2K Center of Excellence for Biomedical Computing, University of California Los Angeles, Los Angeles, CA 90095, USA.
- Department of Physiology, University of California Los Angeles, Los Angeles, CA 90095, USA.
- Department of Bioinformatics, University of California Los Angeles, Los Angeles, CA 90095, USA.
| | - Neo Christopher Chung
- NIH BD2K Center of Excellence for Biomedical Computing, University of California Los Angeles, Los Angeles, CA 90095, USA.
- Institute of Informatics, Faculty of Mathematics, Informatics and Mechanics, University of Warsaw, Banacha 2, 02-097 Warsaw, Poland.
| | - Peipei Ping
- NIH BD2K Center of Excellence for Biomedical Computing, University of California Los Angeles, Los Angeles, CA 90095, USA.
- Department of Physiology, University of California Los Angeles, Los Angeles, CA 90095, USA.
- Scalable Analytics Institute (ScAi), University of California Los Angeles, Los Angeles, CA 90095, USA.
- Department of Bioinformatics, University of California Los Angeles, Los Angeles, CA 90095, USA.
- Department of Medicine (Cardiology), University of California Los Angeles, Los Angeles, CA 90095, USA.
| |
Collapse
|
36
|
Rahman MF, Rahman MR, Islam T, Zaman T, Shuvo MAH, Hossain MT, Islam MR, Karim MR, Moni MA. A bioinformatics approach to decode core genes and molecular pathways shared by breast cancer and endometrial cancer. INFORMATICS IN MEDICINE UNLOCKED 2019. [DOI: 10.1016/j.imu.2019.100274] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
|
37
|
Chen L, Pan X, Zhang YH, Liu M, Huang T, Cai YD. Classification of Widely and Rarely Expressed Genes with Recurrent Neural Network. Comput Struct Biotechnol J 2018; 17:49-60. [PMID: 30595815 PMCID: PMC6307323 DOI: 10.1016/j.csbj.2018.12.002] [Citation(s) in RCA: 30] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2018] [Revised: 12/07/2018] [Accepted: 12/09/2018] [Indexed: 02/06/2023] Open
Abstract
A tissue-specific gene expression shapes the formation of tissues, while gene expression changes reflect the immune response of the human body to environmental stimulations or pressure, particularly in disease conditions, such as cancers. A few genes are commonly expressed across tissues or various cancers, while others are not. To investigate the functional differences between widely and rarely expressed genes, we defined the genes that were expressed in 32 normal tissues/cancers (i.e., called widely expressed genes; FPKM >1 in all samples) and those that were not detected (i.e., called rarely expressed genes; FPKM <1 in all samples) based on the large gene expression data set provided by Uhlen et al. Each gene was encoded using the gene ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment scores. Minimum redundancy maximum relevance (mRMR) was used to measure and rank these features on the mRMR feature list. Thereafter, we applied the incremental feature selection method with a supervised classifier recurrent neural network (RNN) to select the discriminate features for classifying widely expressed genes from rarely expressed genes and construct an optimum RNN classifier. The Youden's indexes generated by the optimum RNN classifier and evaluated using a 10-fold cross validation were 0.739 for normal tissues and 0.639 for cancers. Furthermore, the underlying mechanisms of the key discriminate GO and KEGG features were analyzed. Results can facilitate the identification of the expression landscape of genes and elucidation of how gene expression shapes tissues and the microenvironment of cancers. Some genes are widely expressed across tissues or various cancers. A number of genes are rarely expressed across tissues or various cancers. The functional differences between widely and rarely expressed genes were studied. Several GO terms and KEGG pathways were extracted and analyzed.
Collapse
Affiliation(s)
- Lei Chen
- School of Life Sciences, Shanghai University, Shanghai 200444, People's Republic of China.,College of Information Engineering, Shanghai Maritime University, Shanghai 201306, People's Republic of China.,Shanghai Key Laboratory of PMMP, East China Normal University, Shanghai 200241, People's Republic of China
| | - XiaoYong Pan
- Department of Medical Informatics, Erasmus MC, Rotterdam, the Netherlands
| | - Yu-Hang Zhang
- Institute of Health Sciences, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai 200031, People's Republic of China
| | - Min Liu
- College of Information Engineering, Shanghai Maritime University, Shanghai 201306, People's Republic of China
| | - Tao Huang
- Institute of Health Sciences, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai 200031, People's Republic of China
| | - Yu-Dong Cai
- School of Life Sciences, Shanghai University, Shanghai 200444, People's Republic of China
| |
Collapse
|
38
|
Chen L, Zhang S, Pan X, Hu X, Zhang YH, Yuan F, Huang T, Cai YD. HIV infection alters the human epigenetic landscape. Gene Ther 2018; 26:29-39. [PMID: 30443044 DOI: 10.1038/s41434-018-0051-6] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2018] [Revised: 10/30/2018] [Accepted: 10/31/2018] [Indexed: 02/07/2023]
Abstract
Many complex diseases or traits are the results of both genetic and environmental factors. The environmental factors affect the human body by modifying its epigenetics, which controls the activity of genomes without mutating it. Viral infection is one of the common environmental factors for complex diseases. For example, the human immunodeficiency virus (HIV) infection can cause acquired immune deficiency syndrome (AIDS), HBV, and HCV infections are associated with hepatocellular carcinoma, and human papillomavirus infection is a causal factor in cervical carcinoma. In this study, to investigate how HIV infection affects DNA methylation, we analyzed the blood DNA methylation data of 485 512 sites in 44 HIV- and 142 HIV + patients. Several advanced computational methods were applied to identify the core distinctive features that were different between the HIV patients and the healthy controls. These methods can be used for differentiating HIV-infected patients from uninfected ones. These core distinctive DNA methylation features were confirmed to be functionally connected to premature aging and abnormal immune regulation, two typical pathological symptoms of HIV infection, revealing the potential regulatory mechanisms of HIV infection on the DNA methylation status of the host cells and provided novel insights on the pathogenesis of HIV infection and AIDS.
Collapse
Affiliation(s)
- Lei Chen
- School of Life Sciences, Shanghai University, Shanghai, 200444, China.,Shanghai Key Laboratory of PMMP, East China Normal University, Shanghai, 200241, China.,College of Information Engineering, Shanghai Maritime University, Shanghai, 201306, China
| | - Shiqi Zhang
- Department of Biostatistics, University of Copenhagen, Copenhagen, Denmark
| | - Xiaoyong Pan
- Department of Medical Informatics, Erasmus MC, Rotterdam, Netherlands
| | - XiaoHua Hu
- Department of Biostatistics and Computational Biology, School of Life Sciences, Fudan University, Shanghai, 200438, China
| | - Yu-Hang Zhang
- Institute of Health Sciences, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, 200031, China
| | - Fei Yuan
- Department of Science & Technology, Binzhou Medical University Hospital, Binzhou, 256603, Shandong, China
| | - Tao Huang
- Institute of Health Sciences, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, 200031, China.
| | - Yu-Dong Cai
- School of Life Sciences, Shanghai University, Shanghai, 200444, China.
| |
Collapse
|
39
|
The early detection of asthma based on blood gene expression. Mol Biol Rep 2018; 46:217-223. [PMID: 30421126 DOI: 10.1007/s11033-018-4463-6] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2018] [Accepted: 11/01/2018] [Indexed: 01/10/2023]
Abstract
Asthma is a complex heterogeneous disorder with hereditary tendency and the most widely used therapy is inhalation of anti-inflammatory corticosteroids. But it has systemic side effects. If the chronic inflammation can be detected in early stage, the dosage of corticosteroids will be low and the side effects can be avoided. Therefore, to discover the early stage blood biomarkers for asthma, we analyzed the gene expression profiles in the blood of 77 moderate asthma patients and 87 healthy controls. With advanced feature selection methods, minimal Redundancy Maximal Relevance and Incremental Feature Selection, we identified 31 genes, such as MYD88, ZFP36, CCR3 and CYP3A5, as the optimal asthma biomarker. The sensitivity, specificity and accuracy of the 31-gene Support Vector Machine predictor evaluated with Leave-One-Out Cross Validation were 0.870, 0.816 and 0.841, respectively. Through literature survey, many biomarker genes have asthma associated functions. Our results not only provided the easy-to-apply blood gene expression biomarkers for early detection of asthma, but also an explainable qualitative model with biological significance.
Collapse
|
40
|
Chen L, Zhang YH, Pan X, Liu M, Wang S, Huang T, Cai YD. Tissue Expression Difference between mRNAs and lncRNAs. Int J Mol Sci 2018; 19:ijms19113416. [PMID: 30384456 PMCID: PMC6274976 DOI: 10.3390/ijms19113416] [Citation(s) in RCA: 49] [Impact Index Per Article: 8.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2018] [Revised: 10/26/2018] [Accepted: 10/28/2018] [Indexed: 12/15/2022] Open
Abstract
Messenger RNA (mRNA) and long noncoding RNA (lncRNA) are two main subgroups of RNAs participating in transcription regulation. With the development of next generation sequencing, increasing lncRNAs are identified. Many hidden functions of lncRNAs are also revealed. However, the differences in lncRNAs and mRNAs are still unclear. For example, we need to determine whether lncRNAs have stronger tissue specificity than mRNAs and which tissues have more lncRNAs expressed. To investigate such tissue expression difference between mRNAs and lncRNAs, we encoded 9339 lncRNAs and 14,294 mRNAs with 71 expression features, including 69 maximum expression features for 69 types of cells, one feature for the maximum expression in all cells, and one expression specificity feature that was measured as Chao-Shen-corrected Shannon's entropy. With advanced feature selection methods, such as maximum relevance minimum redundancy, incremental feature selection methods, and random forest algorithm, 13 features presented the dissimilarity of lncRNAs and mRNAs. The 11 cell subtype features indicated which cell types of the lncRNAs and mRNAs had the largest expression difference. Such cell subtypes may be the potential cell models for lncRNA identification and function investigation. The expression specificity feature suggested that the cell types to express mRNAs and lncRNAs were different. The maximum expression feature suggested that the maximum expression levels of mRNAs and lncRNAs were different. In addition, the rule learning algorithm, repeated incremental pruning to produce error reduction algorithm, was also employed to produce effective classification rules for classifying lncRNAs and mRNAs, which gave competitive results compared with random forest and could give a clearer picture of different expression patterns between lncRNAs and mRNAs. Results not only revealed the heterogeneous expression pattern of lncRNA and mRNA, but also gave rise to the development of a new tool to identify the potential biological functions of such RNA subgroups.
Collapse
Affiliation(s)
- Lei Chen
- School of Life Sciences, Shanghai University, Shanghai 200444, China.
- College of Information Engineering, Shanghai Maritime University, Shanghai 201306, China.
- Shanghai Key Laboratory of PMMP, East China Normal University, Shanghai 200241, China.
| | - Yu-Hang Zhang
- Institute of Health Sciences, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai 200031, China.
| | - Xiaoyong Pan
- Department of Medical Informatics, Erasmus MC, 3000 CA Rotterdam, The Netherlands.
| | - Min Liu
- College of Information Engineering, Shanghai Maritime University, Shanghai 201306, China.
| | - Shaopeng Wang
- School of Life Sciences, Shanghai University, Shanghai 200444, China.
| | - Tao Huang
- Institute of Health Sciences, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai 200031, China.
| | - Yu-Dong Cai
- School of Life Sciences, Shanghai University, Shanghai 200444, China.
| |
Collapse
|
41
|
Lu S, Zhao K, Wang X, Liu H, Ainiwaer X, Xu Y, Ye M. Use of Laplacian Heat Diffusion Algorithm to Infer Novel Genes With Functions Related to Uveitis. Front Genet 2018; 9:425. [PMID: 30349554 PMCID: PMC6186792 DOI: 10.3389/fgene.2018.00425] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2018] [Accepted: 09/10/2018] [Indexed: 12/17/2022] Open
Abstract
Uveitis is the inflammation of the uvea and is a serious eye disease that can cause blindness for middle-aged and young people. However, the pathogenesis of this disease has not been fully uncovered and thus renders difficulties in designing effective treatments. Completely identifying the genes related to this disease can help improve and accelerate the comprehension of uveitis. In this study, a new computational method was developed to infer potential related genes based on validated ones. We employed a large protein–protein interaction network reported in STRING, in which Laplacian heat diffusion algorithm was applied using validated genes as seed nodes. Except for the validated ones, all genes in the network were filtered by three tests, namely, permutation, association, and function tests, which evaluated the genes based on their specialties and associations to uveitis. Results indicated that 59 inferred genes were accessed, several of which were confirmed to be highly related to uveitis by literature review. In addition, the inferred genes were compared with those reported in a previous study, indicating that our reported genes are necessary supplements.
Collapse
Affiliation(s)
- Shiheng Lu
- Department of Ophthalmology, Shanghai Pudong Hospital, Fudan University Pudong Medical Center, Pudong, China
| | - Ke Zhao
- Department of Ophthalmology, Shanghai Pudong Hospital, Fudan University Pudong Medical Center, Pudong, China
| | - Xuefei Wang
- Department of Ophthalmology, Shanghai Pudong Hospital, Fudan University Pudong Medical Center, Pudong, China
| | - Hui Liu
- Department of Ophthalmology, Shanghai Pudong Hospital, Fudan University Pudong Medical Center, Pudong, China
| | - Xiamuxiya Ainiwaer
- Department of Ophthalmology, Shanghai Pudong Hospital, Fudan University Pudong Medical Center, Pudong, China
| | - Yan Xu
- School of Life Sciences, Shanghai University, Shanghai, China
| | - Min Ye
- Department of Ophthalmology, Shanghai Pudong Hospital, Fudan University Pudong Medical Center, Pudong, China
| |
Collapse
|
42
|
A Computational Method for Classifying Different Human Tissues with Quantitatively Tissue-Specific Expressed Genes. Genes (Basel) 2018; 9:genes9090449. [PMID: 30205473 PMCID: PMC6162521 DOI: 10.3390/genes9090449] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2018] [Revised: 09/01/2018] [Accepted: 09/04/2018] [Indexed: 02/06/2023] Open
Abstract
Tissue-specific gene expression has long been recognized as a crucial key for understanding tissue development and function. Efforts have been made in the past decade to identify tissue-specific expression profiles, such as the Human Proteome Atlas and FANTOM5. However, these studies mainly focused on "qualitatively tissue-specific expressed genes" which are highly enriched in one or a group of tissues but paid less attention to "quantitatively tissue-specific expressed genes", which are expressed in all or most tissues but with differential expression levels. In this study, we applied machine learning algorithms to build a computational method for identifying "quantitatively tissue-specific expressed genes" capable of distinguishing 25 human tissues from their expression patterns. Our results uncovered the expression of 432 genes as optimal features for tissue classification, which were obtained with a Matthews Correlation Coefficient (MCC) of more than 0.99 yielded by a support vector machine (SVM). This constructed model was superior to the SVM model using tissue enriched genes and yielded MCC of 0.985 on an independent test dataset, indicating its good generalization ability. These 432 genes were proven to be widely expressed in multiple tissues and a literature review of the top 23 genes found that most of them support their discriminating powers. As a complement to previous studies, our discovery of these quantitatively tissue-specific genes provides insights into the detailed understanding of tissue development and function.
Collapse
|
43
|
Li J, Lan CN, Kong Y, Feng SS, Huang T. Identification and Analysis of Blood Gene Expression Signature for Osteoarthritis With Advanced Feature Selection Methods. Front Genet 2018; 9:246. [PMID: 30214455 PMCID: PMC6125376 DOI: 10.3389/fgene.2018.00246] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2018] [Accepted: 06/22/2018] [Indexed: 12/15/2022] Open
Abstract
Osteoarthritis (OA) is a complex disease that affects articular joints and may cause disability. The incidence of OA is extremely high. Most elderly people have the symptoms of osteoarthritis. The physiotherapy of OA is time consuming, and the chances of full recovery from OA are very minimal. The most effective way of fighting OA is early diagnosis and early intervention. Liquid biopsy has become a popular noninvasive test. To find the blood gene expression signature for OA, we reanalyzed the publicly available blood gene expression profiles of 106 patients with OA and 33 control samples using an automatic computational pipeline based on advanced feature selection methods. Finally, a compact 23-gene set was identified. On the basis of these 23 genes, we constructed a Support Vector Machine (SVM) classifier and evaluated it with leave-one-out cross-validation. Its sensitivity (Sn), specificity (Sp), accuracy (ACC), and Mathew's correlation coefficient (MCC) were 0.991, 0.909, 0.971, and 0.920, respectively. Obviously, the performance needed to be validated in an independent large dataset, but the in-depth biological analysis of the 23 biomarkers showed great promise and suggested that mRNA surveillance pathway and multicellular organism growth played important roles in OA. Our results shed light on OA diagnosis through liquid biopsy.
Collapse
Affiliation(s)
- Jing Li
- Department of Rehabilitation, The Second Xiangya Hospital, Central South University, Changsha, China
| | - Chun-Na Lan
- Department of Rehabilitation, The Second Xiangya Hospital, Central South University, Changsha, China
| | - Ying Kong
- Department of Rehabilitation, The Second Xiangya Hospital, Central South University, Changsha, China
| | - Song-Shan Feng
- Department of Neurosurgery, Xiangya Hospital, Central South University, Changsha, China
| | - Tao Huang
- Institute of Health Sciences, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, China
| |
Collapse
|