1
|
Wang C, Liao S, Wang Y, Hu X, Xu J. Computational Identification of Guillain-Barré Syndrome-Related Genes by an mRNA Gene Expression Profile and a Protein–Protein Interaction Network. Front Mol Neurosci 2022; 15:850209. [PMID: 35370550 PMCID: PMC8968047 DOI: 10.3389/fnmol.2022.850209] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2022] [Accepted: 02/24/2022] [Indexed: 11/22/2022] Open
Abstract
Background In the present study, we used a computational method to identify Guillain–Barré syndrome (GBS) related genes based on (i) a gene expression profile, and (ii) the shortest path analysis in a protein–protein interaction (PPI) network. Materials and Methods mRNA Microarray analyses were performed on the peripheral blood mononuclear cells (PBMCs) of four GBS patients and four age- and gender-matched healthy controls. Results Totally 30 GBS-related genes were screened out, in which 20 were retrieved from PPI analysis of upregulated expressed genes and 23 were from downregulated expressed genes (13 overlap genes). Gene ontology (GO) enrichment and KEGG enrichment analysis were performed, respectively. Results showed that there were some overlap GO terms and KEGG pathway terms in both upregulated and downregulated analysis, including positive regulation of macromolecule metabolic process, intracellular signaling cascade, cell surface receptor linked signal transduction, intracellular non-membrane-bounded organelle, non-membrane-bounded organelle, plasma membrane, ErbB signaling pathway, focal adhesion, neurotrophin signaling pathway and Wnt signaling pathway, which indicated these terms may play a critical role during GBS process. Discussion These results provided basic information about the genetic and molecular pathogenesis of GBS disease, which may improve the development of effective genetic strategies for GBS treatment in the future.
Collapse
Affiliation(s)
- Chunyang Wang
- Department of Neurology, Tianjin Medical University General Hospital, Tianjin, China
| | - Shiwei Liao
- Tianjin Key Laboratory of Cerebral Vascular and Neurodegenerative Diseases, Department of Neurorehabilitation and Neurology, Tianjin Huanhu Hospital, Tianjin Neurosurgical Institute, Tianjin, China
| | - Yiyi Wang
- Department of Neurology, Tianjin Haihe Hospital, Tianjin, China
| | - Xiaowei Hu
- Department of Neurology, Tianjin Medical University General Hospital, Tianjin, China
| | - Jing Xu
- Department of Neurology, Tianjin Medical University General Hospital, Tianjin, China
- *Correspondence: Jing Xu,
| |
Collapse
|
2
|
Wu Y, Sa Y, Guo Y, Li Q, Zhang N. Identification of WHO II/III gliomas by 16 prognostic-related gene signatures using machine learning methods. Curr Med Chem 2021; 29:1622-1639. [PMID: 34455959 DOI: 10.2174/0929867328666210827103049] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2021] [Revised: 05/27/2021] [Accepted: 05/28/2021] [Indexed: 11/22/2022]
Abstract
BACKGROUND It is found that the prognosis of gliomas of the same grade has large differences among World Health Organization(WHO) grade II and III in clinical observation. Therefore, a better understanding of the genetics and molecular mechanisms underlying WHO grade II and III gliomas is required, with the aim of developing a classification scheme at the molecular level rather than the conventional pathological morphology level. METHOD We performed survival analysis combined with machine learning methods of Least Absolute Shrinkage and Selection Operator using expression datasets downloaded from the Chinese Glioma Genome Atlas as well as The Cancer Genome Atlas. Risk scores were calculated by the product of expression level of overall survival-related genes and their multivariate Cox proportional hazards regression coefficients. WHO grade II and III gliomas were categorized into the low-risk subgroup, medium-risk subgroup, and high-risk subgroup. We used the 16 prognostic-related genes as input features to build a classification model based on prognosis using a fully connected neural network. Gene function annotations were also performed. RESULTS The 16 genes (AKNAD1, C7orf13, CDK20, CHRFAM7A, CHRNA1, EFNB1, GAS1, HIST2H2BE, KCNK3, KLHL4, LRRK2, NXPH3, PIGZ, SAMD5, ERINC2, and SIX6) related to the glioma prognosis were screened. The 16 selected genes were associated with the development of gliomas and carcinogenesis. The accuracy of an external validation data set of the fully connected neural network model from the two cohorts reached 95.5%. Our method has good potential capability in classifying WHO grade II and III gliomas into low-risk, medium-risk, and high-risk subgroups. The subgroups showed significant (P<0.01) differences in overall survival. CONCLUSION This resulted in the identification of 16 genes that were related to the prognosis of gliomas. Here we developed a computational method to discriminate WHO grade II and III gliomas into three subgroups with distinct prognoses. The gene expression-based method provides a reliable alternative to determine the prognosis of gliomas.
Collapse
Affiliation(s)
- YaMeng Wu
- Department of Biomedical Engineering, Tianjin Key Lab of BME Measurement, Tianjin University, Tianjin. China
| | - Yu Sa
- Department of Biomedical Engineering, Tianjin Key Lab of BME Measurement, Tianjin University, Tianjin. China
| | - Yu Guo
- Department of Biomedical Engineering, Tianjin Key Lab of BME Measurement, Tianjin University, Tianjin. China
| | - QiFeng Li
- Department of Biomedical Engineering, Tianjin Key Lab of BME Measurement, Tianjin University, Tianjin. China
| | - Ning Zhang
- Department of Biomedical Engineering, Tianjin Key Lab of BME Measurement, Tianjin University, Tianjin. China
| |
Collapse
|
3
|
Hozhabri H, Lashkari A, Razavi SM, Mohammadian A. Integration of gene expression data identifies key genes and pathways in colorectal cancer. Med Oncol 2021; 38:7. [PMID: 33411100 DOI: 10.1007/s12032-020-01448-9] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2020] [Accepted: 11/21/2020] [Indexed: 12/16/2022]
Abstract
Colorectal cancer (CRC) is one of the most common malignant tumor and prevalent cause of cancer-related death worldwide. In this study, we analyzed the gene expression profiles of patients with CRC with the aim of better understanding the molecular mechanism and key genes in CRC. Four gene expression profiles including, GSE9348, GSE41328, GSE41657, and GSE113513 were downloaded from GEO database. The data were processed using R programming language, in which 319 common differentially expressed genes including 94 up-regulated and 225 down-regulated were identified. The gene ontology (GO) and Kyoto Encyclopedia of Genes and Genomes pathway (KEGG) enrichment analyses were conducted to find the most significant enriched pathways in CRC. Based on the GO and KEGG pathway analysis, the most important dysregulated pathways were regulation of cell proliferation, biocarbonate transport, Wnt, and IL-17 signaling pathways, and nitrogen metabolism. The protein-protein interaction (PPI) network of the DEGs was constructed using Cytoscape software and hub genes including MYC, CXCL1, CD44, MMP1, and CXCL12 were identified as the most critical hub genes. The present study enhances our understanding of the molecular mechanisms of the CRC, which might potentially be applied in the treatment strategies of CRC as molecular targets and diagnostic biomarkers.
Collapse
Affiliation(s)
- Hossein Hozhabri
- Institute of Biochemistry and Biophysics, University of Tehran, Tehran, Iran.
| | - Ali Lashkari
- Institute of Biochemistry and Biophysics, University of Tehran, Tehran, Iran
| | - Seyed-Morteza Razavi
- Department of Cell and Molecular Biology, Faculty of Biological Sciences, Kharazmi University, Tehran, Iran.,Salari Institute of Cognitive and Behavioral Disorders (SICBD), Karaj, Alborz, Iran.,Systems Biology Research Lab, Bioinformatics Group, Systems Biology of Next Generation Company (SBNGC), Qom, Iran
| | - Ali Mohammadian
- Department of Medical Biotechnology, Faculty of Medical Sciences, Tarbiat Modares University, Tehran, Iran.
| |
Collapse
|
4
|
|
5
|
Liu HC, Peng YS, Lee HC. miRDRN-miRNA disease regulatory network: a tool for exploring disease and tissue-specific microRNA regulatory networks. PeerJ 2019; 7:e7309. [PMID: 31404401 PMCID: PMC6688598 DOI: 10.7717/peerj.7309] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2019] [Accepted: 06/17/2019] [Indexed: 11/20/2022] Open
Abstract
BACKGROUND MicroRNA (miRNA) regulates cellular processes by acting on specific target genes, and cellular processes proceed through multiple interactions often organized into pathways among genes and gene products. Hundreds of miRNAs and their target genes have been identified, as are many miRNA-disease associations. These, together with huge amounts of data on gene annotation, biological pathways, and protein-protein interactions are available in public databases. Here, using such data we built a database and web service platform, miRNA disease regulatory network (miRDRN), for users to construct disease and tissue-specific miRNA-protein regulatory networks, with which they may explore disease related molecular and pathway associations, or find new ones, and possibly discover new modes of drug action. METHODS Data on disease-miRNA association, miRNA-target association and validation, gene-tissue association, gene-tumor association, biological pathways, human protein interaction, gene ID, gene ontology, gene annotation, and product were collected from publicly available databases and integrated. A large set of miRNA target-specific regulatory sub-pathways (RSPs) having the form (T, G 1, G 2) was built from the integrated data and stored, where T is a miRNA-associated target gene, G 1 (G 2) is a gene/protein interacting with T (G 1). Each sequence (T, G 1, G 2) was assigned a p-value weighted by the participation of the three genes in molecular interactions and reaction pathways. RESULTS A web service platform, miRDRN (http://mirdrn.ncu.edu.tw/mirdrn/), was built. The database part of miRDRN currently stores 6,973,875 p-valued RSPs associated with 116 diseases in 78 tissue types built from 207 diseases-associated miRNA regulating 389 genes. miRDRN also provides facilities for the user to construct disease and tissue-specific miRNA regulatory networks from RSPs it stores, and to download and/or visualize parts or all of the product. User may use miRDRN to explore a single disease, or a disease-pair to gain insights on comorbidity. As demonstrations, miRDRN was applied: to explore the single disease colorectal cancer (CRC), in which 26 novel potential CRC target genes were identified; to study the comorbidity of the disease-pair Alzheimer's disease-Type 2 diabetes, in which 18 novel potential comorbid genes were identified; and, to explore possible causes that may shed light on recent failures of late-phase trials of anti-AD, BACE1 inhibitor drugs, in which genes downstream to BACE1 whose suppression may affect signal transduction were identified.
Collapse
Affiliation(s)
- Hsueh-Chuan Liu
- Department of Biomedical Sciences and Engineering, National Central University, Taoyuan City, Taiwan
| | - Yi-Shian Peng
- Department of Biomedical Sciences and Engineering, National Central University, Taoyuan City, Taiwan
| | - Hoong-Chien Lee
- Department of Biomedical Sciences and Engineering, National Central University, Taoyuan City, Taiwan
- Department of Physics, Chung Yuan Christian University, Zhongli District, Taoyuan City, Taiwan
| |
Collapse
|
6
|
Li M, Guo Y, Feng YM, Zhang N. Identification of Triple-Negative Breast Cancer Genes and a Novel High-Risk Breast Cancer Prediction Model Development Based on PPI Data and Support Vector Machines. Front Genet 2019; 10:180. [PMID: 30930932 PMCID: PMC6428707 DOI: 10.3389/fgene.2019.00180] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2018] [Accepted: 02/19/2019] [Indexed: 12/20/2022] Open
Abstract
Triple-negative breast cancer (TNBC) is a special subtype of breast cancer that is difficult to treat. It is crucial to identify breast cancer-related genes that could provide new biomarkers for breast cancer diagnosis and potential treatment goals. In the development of our new high-risk breast cancer prediction model, seven raw gene expression datasets from the NCBI gene expression omnibus (GEO) database (GSE31519, GSE9574, GSE20194, GSE20271, GSE32646, GSE45255, and GSE15852) were used. Using the maximum relevance minimum redundancy (mRMR) method, we selected significant genes. Then, we mapped transcripts of the genes on the protein-protein interaction (PPI) network from the Search Tool for the Retrieval of Interacting Genes (STRING) database, as well as traced the shortest path between each pair of proteins. Genes with higher betweenness values were selected from the shortest path proteins. In order to ensure validity and precision, a permutation test was performed. We randomly selected 248 proteins from the PPI network for shortest path tracing and repeated the procedure 100 times. We also removed genes that appeared more frequently in randomized results. As a result, 54 genes were selected as potential TNBC-related genes. Using 14 out the 54 genes, which are potential TNBC associated genes, as input features into a support vector machine (SVM), a novel model was trained to predict high-risk breast cancer. The prediction accuracy of normal tissues and TNBC tissues reached 95.394%, and the predictions of Stage II and Stage III TNBC reached 86.598%, indicating that such genes play important roles in distinguishing breast cancers, and that the method could be promising in practical use. According to reports, some of the 54 genes we identified from the PPI network are associated with breast cancer in the literature. Several other genes have not yet been reported but have functional resemblance with known cancer genes. These may be novel breast cancer-related genes and need further experimental validation. Gene ontology (GO) enrichment and Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment analyses were performed to appraise the 54 genes. It was indicated that cellular response to organic cyclic compounds has an influence in breast cancer, and most genes may be related with viral carcinogenesis.
Collapse
Affiliation(s)
- Ming Li
- Department of Biomedical Engineering, Tianjin Key Lab of BME Measurement, Tianjin University, Tianjin, China
| | - Yu Guo
- Department of Biomedical Engineering, Tianjin Key Lab of BME Measurement, Tianjin University, Tianjin, China
| | - Yuan-Ming Feng
- Department of Biomedical Engineering, Tianjin Key Lab of BME Measurement, Tianjin University, Tianjin, China
- Department of Radiation Oncology, Tianjin Medical University Cancer Institute and Hospital, Tianjin, China
| | - Ning Zhang
- Department of Biomedical Engineering, Tianjin Key Lab of BME Measurement, Tianjin University, Tianjin, China
| |
Collapse
|
7
|
Cai BH, Wu PH, Chou CK, Huang HC, Chao CC, Chung HY, Lee HY, Chen JY, Kannagi R. Synergistic activation of the NEU4 promoter by p73 and AP2 in colon cancer cells. Sci Rep 2019; 9:950. [PMID: 30700826 PMCID: PMC6353964 DOI: 10.1038/s41598-018-37521-7] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2018] [Accepted: 12/07/2018] [Indexed: 12/22/2022] Open
Abstract
More than 50% of colon cancers bear mutations in p53, one of the most important tumor suppressors, and its family members p63 or p73 are expected to contribute to inhibiting the progression of colon cancers. The AP2 family also acts as a tumor suppressor. Here we found that p73 and AP2 are able to activate NEU4, a neuraminidase gene, which removes the terminal sialic acid residues from cancer-associated glycans. Under serum starvation, NEU4 was up-regulated and one of the NEU4 target glycans, sialyl Lewis X, was decreased, whereas p73 and AP2 were up-regulated. Sialyl Lewis X levels were not, however, decreased under starvation conditions in p73- or AP2-knockdown cells. p53 and AP2 underwent protein-protein interactions, exerting synergistic effects to activate p21, and interaction of p53 with AP2 was lost in cells expressing the L350P mutation of p53. The homologous residues in p63 and p73 are L423 and L377, respectively. The synergistic effect of p53/p63 with AP2 to activate genes was lost with the L350P/L423P mutation in p53/p63, but p73 bearing the L377P mutation was able to interact with AP2 and exerted its normal synergistic effects. We propose that p73 and AP2 synergistically activate the NEU4 promoter in colon cancer cells.
Collapse
Affiliation(s)
- Bi-He Cai
- Institute of Biomedical Sciences, Academia Sinica, Taipei, Taiwan. .,Department of Biology and Anatomy, National Defense Medical Center, Taipei, Taiwan.
| | - Po-Han Wu
- Institute of Biomedical Sciences, Academia Sinica, Taipei, Taiwan
| | - Chi-Kan Chou
- Institute of Biomedical Sciences, Academia Sinica, Taipei, Taiwan
| | - Hsiang-Chi Huang
- Institute of Biomedical Sciences, Academia Sinica, Taipei, Taiwan.,Taiwan International Graduate Program in Molecular Medicine, National Yang-Ming University and Academia Sinica, Taipei, Taiwan
| | - Chia-Chun Chao
- Institute of Biomedical Sciences, Academia Sinica, Taipei, Taiwan.,Genomics Research Center, Academia Sinica, Taipei, Taiwan
| | - Hsiao-Yu Chung
- Institute of Biomedical Sciences, Academia Sinica, Taipei, Taiwan
| | - Hsueh-Yi Lee
- Institute of Biomedical Sciences, Academia Sinica, Taipei, Taiwan
| | - Jang-Yi Chen
- Department of Biology and Anatomy, National Defense Medical Center, Taipei, Taiwan
| | - Reiji Kannagi
- Institute of Biomedical Sciences, Academia Sinica, Taipei, Taiwan.
| |
Collapse
|
8
|
Zhang TM, Huang T, Wang RF. Cross talk of chromosome instability, CpG island methylator phenotype and mismatch repair in colorectal cancer. Oncol Lett 2018; 16:1736-1746. [PMID: 30008861 PMCID: PMC6036478 DOI: 10.3892/ol.2018.8860] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2017] [Accepted: 05/22/2018] [Indexed: 12/20/2022] Open
Abstract
Colorectal cancer is a severe cancer associated with a high prevalence and fatality rate. There are three major mechanisms for colorectal cancer: (1) Chromosome instability (CIN), (2) CpG island methylator phenotype (CIMP) and (3) mismatch repair (MMR), of which CIN is the most common type. However, these subtypes are not exclusive and overlap. To investigate their biological mechanisms and cross talk, the gene expression profiles of 585 colorectal cancer patients with CIN, CIMP and MMR status records were collected. By comparing the CIN+ and CIN-samples, CIMP+ and CIMP-samples, MMR+ and MMR-samples with minimal redundancy maximal relevance (mRMR) and incremental feature selection (IFS) methods, the CIN, CIMP and MMR associated genes were selected. Unfortunately, there was little direct overlap among them. To investigate their indirect interactions, downstream genes of CIN, CIMP and MMR were identified using the random walk with restart (RWR) method and a greater overlap of downstream genes was indicated. The common downstream genes were involved in biosynthetic and metabolic pathways. These findings were consistent with the clinical observation of wide range metabolite aberrations in colorectal cancer. To conclude, the present study gave a gene level explanation of CIN, CIMP and MMR, but also showed the network level cross talk of CIN, CIMP and MMR. The common genes of CIN, CIMP and MMR may be useful for cross-subtype general colorectal cancer drug development.
Collapse
Affiliation(s)
- Tian-Ming Zhang
- Department of Colorectal and Anal Surgery, Jinhua Hospital of Zhejiang University, Jinhua, Zhejiang 321000, P.R. China
| | - Tao Huang
- Institute of Health Sciences, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai 200031, P.R. China
| | - Rong-Fei Wang
- Department of Colorectal and Anal Surgery, Jinhua People's Hospital, Jinhua, Zhejiang 321000, P.R. China
| |
Collapse
|
9
|
Herrera-Cruz MS, Simmen T. Cancer: Untethering Mitochondria from the Endoplasmic Reticulum? Front Oncol 2017; 7:105. [PMID: 28603693 PMCID: PMC5445141 DOI: 10.3389/fonc.2017.00105] [Citation(s) in RCA: 37] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2017] [Accepted: 05/05/2017] [Indexed: 01/18/2023] Open
Abstract
Following the discovery of the mitochondria-associated membrane (MAM) as a hub for lipid metabolism in 1990 and its description as one of the first examples for membrane contact sites at the turn of the century, the past decade has seen the emergence of this structure as a potential regulator of cancer growth and metabolism. The mechanistic basis for this hypothesis is that the MAM accommodates flux of Ca2+ from the endoplasmic reticulum (ER) to mitochondria. This flux then determines mitochondrial ATP production, known to be low in many tumors as part of the Warburg effect. However, low mitochondrial Ca2+ flux also reduces the propensity of tumor cells to undergo apoptosis, another cancer hallmark. Numerous regulators of this flux have been recently identified as MAM proteins. Not surprisingly, many fall into the groups of tumor suppressors and oncogenes. Given the important role that the MAM could play in cancer, it is expected that proteins mediating its formation are particularly implicated in tumorigenesis. Examples for such proteins are mitofusin-2 and phosphofurin acidic cluster sorting protein 2 that likely act as tumor suppressors. This review discusses how these proteins that mediate or regulate ER–mitochondria tethering are (or are not) promoting or inhibiting tumorigenesis. The emerging picture of MAMs in cancer seems to indicate that in addition to the downregulation of mitochondrial Ca2+ import, MAM defects are but one way how cancer cells control mitochondria metabolism and apoptosis.
Collapse
Affiliation(s)
- Maria Sol Herrera-Cruz
- Faculty of Medicine and Dentistry, Department of Cell Biology, University of Alberta, Edmonton, AB, Canada
| | - Thomas Simmen
- Faculty of Medicine and Dentistry, Department of Cell Biology, University of Alberta, Edmonton, AB, Canada
| |
Collapse
|
10
|
Differentially expressed lncRNAs and mRNAs identified by microarray analysis in GBS patients vs healthy controls. Sci Rep 2016; 6:21819. [PMID: 26898505 PMCID: PMC4761882 DOI: 10.1038/srep21819] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2015] [Accepted: 02/01/2016] [Indexed: 11/08/2022] Open
Abstract
The aim of our present study was to determine whether message RNAs (mRNAs) and long noncoding RNAs (lncRNAs) are expressed differentially in patients with Guillain-Barré syndrome (GBS) compared with healthy controls. The mRNA and lncRNA profiles of GBS patients and healthy controls were generated by using microarray analysis. From microarray analysis, we listed 310 mRNAs and 114 lncRNAs with the mRMR software classed into two sample groups, GBS patients and healthy controls. KEGG mapping demonstrated that the top seven signal pathways may play important roles in GBS development. Several GO terms, such as cytosol, cellular macromolecular complex assembly, cell cycle, ligase activity, protein catabolic process, etc., were enriched in gene lists, suggesting a potential correlation with GBS development. Co-expression network analysis indicated that 113 lncRNAs and 303 mRNAs were included in the co-expression network. Our present study showed that these differentially expressed mRNAs and lncRNAs may play important roles in GBS development, which provides basic information for defining the mechanism(s) that promote GBS.
Collapse
|
11
|
Huang T, Shu Y, Cai YD. Genetic differences among ethnic groups. BMC Genomics 2015; 16:1093. [PMID: 26690364 PMCID: PMC4687076 DOI: 10.1186/s12864-015-2328-0] [Citation(s) in RCA: 90] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2015] [Accepted: 12/15/2015] [Indexed: 12/12/2022] Open
Abstract
BACKGROUND Many differences between different ethnic groups have been observed, such as skin color, eye color, height, susceptibility to some diseases, and response to certain drugs. However, the genetic bases of such differences have been under-investigated. Since the HapMap project, large-scale genotype data from Caucasian, African and Asian population samples have been available. The project found that these populations were located in different areas of the PCA (Principal Component Analysis) plot. However, as an unsupervised method, PCA does not measure the differences in each single nucleotide polymorphism (SNP) among populations. RESULTS We applied an advanced mutual information-based feature selection method to detect associations between SNP status and ethnic groups using the latest HapMap Phase 3 release version 3, which included more sub-populations. A total of 299 SNPs were identified, and they can accurately predicted the ethnicity of all HapMap populations. The 10-fold cross validation accuracy of the SMO (sequential minimal optimization) model on training dataset was 0.901, and the accuracy on independent test dataset was 0.895. CONCLUSIONS In-depth functional analysis of these SNPs and their nearby genes revealed the genetic bases of skin and eye color differences among populations.
Collapse
Affiliation(s)
- Tao Huang
- Institute of Health Sciences, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, 200031, P. R. China.
| | - Yang Shu
- Sate Key Laboratory of Biotherapy, Sichuan University, Sichuan, 610041, P. R. China.
| | - Yu-Dong Cai
- College of Life Science, Shanghai University, Shanghai, 200444, P. R. China.
| |
Collapse
|
12
|
Prediction of aptamer-target interacting pairs with pseudo-amino acid composition. PLoS One 2014; 9:e86729. [PMID: 24466214 PMCID: PMC3899287 DOI: 10.1371/journal.pone.0086729] [Citation(s) in RCA: 32] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2013] [Accepted: 12/15/2013] [Indexed: 11/19/2022] Open
Abstract
Aptamers are oligonucleic acid or peptide molecules that bind to specific target molecules. As a novel and powerful class of ligands, aptamers are thought to have excellent potential for applications in the fields of biosensing, diagnostics and therapeutics. In this study, a new method for predicting aptamer-target interacting pairs was proposed by integrating features derived from both aptamers and their targets. Features of nucleotide composition and traditional amino acid composition as well as pseudo amino acid were utilized to represent aptamers and targets, respectively. The predictor was constructed based on Random Forest and the optimal features were selected by using the maximum relevance minimum redundancy (mRMR) method and the incremental feature selection (IFS) method. As a result, 81.34% accuracy and 0.4612 MCC were obtained for the training dataset, and 77.41% accuracy and 0.3717 MCC were achieved for the testing dataset. An optimal feature set of 220 features were selected, which were considered as the ones that contributed significantly to the interacting aptamer-target pair predictions. Analysis of the optimal feature set indicated several important factors in determining aptamer-target interactions. It is anticipated that our prediction method may become a useful tool for identifying aptamer-target pairs and the features selected and analyzed in this study may provide useful insights into the mechanism of interactions between aptamers and targets.
Collapse
|
13
|
Li BQ, Feng KY, Ding J, Cai YD. Predicting DNA-binding sites of proteins based on sequential and 3D structural information. Mol Genet Genomics 2014; 289:489-99. [PMID: 24448651 DOI: 10.1007/s00438-014-0812-x] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2013] [Accepted: 01/04/2014] [Indexed: 11/26/2022]
Abstract
Protein-DNA interactions play important roles in many biological processes. To understand the molecular mechanisms of protein-DNA interaction, it is necessary to identify the DNA-binding sites in DNA-binding proteins. In the last decade, computational approaches have been developed to predict protein-DNA-binding sites based solely on protein sequences. In this study, we developed a novel predictor based on support vector machine algorithm coupled with the maximum relevance minimum redundancy method followed by incremental feature selection. We incorporated not only features of physicochemical/biochemical properties, sequence conservation, residual disorder, secondary structure, solvent accessibility, but also five three-dimensional (3D) structural features calculated from PDB data to predict the protein-DNA interaction sites. Feature analysis showed that 3D structural features indeed contributed to the prediction of DNA-binding site and it was demonstrated that the prediction performance was better with 3D structural features than without them. It was also shown via analysis of features from each site that the features of DNA-binding site itself contribute the most to the prediction. Our prediction method may become a useful tool for identifying the DNA-binding sites and the feature analysis described in this paper may provide useful insights for in-depth investigations into the mechanisms of protein-DNA interaction.
Collapse
Affiliation(s)
- Bi-Qing Li
- Key Laboratory of Systems Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, 200031, People's Republic of China
| | | | | | | |
Collapse
|
14
|
Prediction and analysis of retinoblastoma related genes through gene ontology and KEGG. BIOMED RESEARCH INTERNATIONAL 2013; 2013:304029. [PMID: 23998122 PMCID: PMC3755425 DOI: 10.1155/2013/304029] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/17/2013] [Accepted: 07/16/2013] [Indexed: 11/18/2022]
Abstract
One of the most important and challenging problems in biomedicine is how to predict the cancer related genes. Retinoblastoma (RB) is the most common primary intraocular malignancy usually occurring in childhood. Early detection of RB could reduce the morbidity and promote the probability of disease-free survival. Therefore, it is of great importance to identify RB genes. In this study, we developed a computational method to predict RB related genes based on Dagging, with the maximum relevance minimum redundancy (mRMR) method followed by incremental feature selection (IFS). 119 RB genes were compiled from two previous RB related studies, while 5,500 non-RB genes were randomly selected from Ensemble genes. Ten datasets were constructed based on all these RB and non-RB genes. Each gene was encoded with a 13,126-dimensional vector including 12,887 Gene Ontology enrichment scores and 239 KEGG enrichment scores. Finally, an optimal feature set including 1061 GO terms and 8 KEGG pathways was obtained. Analysis showed that these features were closely related to RB. It is anticipated that the method can be applied to predict the other cancer related genes as well.
Collapse
|