51
|
Integration of protein interaction and gene co-expression information for identification of melanoma candidate genes. Melanoma Res 2019; 29:126-133. [PMID: 30451788 DOI: 10.1097/cmr.0000000000000525] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
Cutaneous melanoma is an aggressive form of skin cancer that causes death worldwide. Although much has been learned about the molecular basis of melanoma genesis and progression, there is also increasing appreciation for the continuing discovery of melanoma genes to improve the genetic understanding of this malignancy. In the present study, melanoma candidate genes were identified by analysis of the common network from cancer type-specific RNA-Seq co-expression data and protein-protein interaction profiles. Then, an integrated network containing the known melanoma-related genes represented as seed genes and the putative genes represented as linker genes was generated using the subnetwork extraction algorithm. According to the network topology property of the putative genes, we selected seven key genes (CREB1, XPO1, SP3, TNFRSF1B, CD40LG, UBR1, and ZNF484) as candidate genes of melanoma. Subsequent analysis showed that six of these genes are melanoma-associated genes and one (ZNF484) is a cancer-associated gene on the basis of the existing literature. A signature comprising these seven key genes was developed and an overall survival analysis of 461 cutaneous melanoma cases was carried out. This seven-gene signature can accurately determine the risk profile for cutaneous melanoma tumors (log-rank P=3.27E-05) and be validated on an independent clinical cohort (log-rank P=0.028). The presented seven genes might serve as candidates for studying the molecular mechanisms and help improve the prognostic risk assessment, which have clinical implications for melanoma patients.
Collapse
|
52
|
Zhang Q, Shen Z, Huang DS. Modeling in-vivo protein-DNA binding by combining multiple-instance learning with a hybrid deep neural network. Sci Rep 2019; 9:8484. [PMID: 31186519 PMCID: PMC6559991 DOI: 10.1038/s41598-019-44966-x] [Citation(s) in RCA: 27] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2018] [Accepted: 05/15/2019] [Indexed: 01/26/2023] Open
Abstract
Modeling in-vivo protein-DNA binding is not only fundamental for further understanding of the regulatory mechanisms, but also a challenging task in computational biology. Deep-learning based methods have succeed in modeling in-vivo protein-DNA binding, but they often (1) follow the fully supervised learning framework and overlook the weakly supervised information of genomic sequences that a bound DNA sequence may has multiple TFBS(s), and, (2) use one-hot encoding to encode DNA sequences and ignore the dependencies among nucleotides. In this paper, we propose a weakly supervised framework, which combines multiple-instance learning with a hybrid deep neural network and uses k-mer encoding to transform DNA sequences, for modeling in-vivo protein-DNA binding. Firstly, this framework segments sequences into multiple overlapping instances using a sliding window, and then encodes all instances into image-like inputs of high-order dependencies using k-mer encoding. Secondly, it separately computes a score for all instances in the same bag using a hybrid deep neural network that integrates convolutional and recurrent neural networks. Finally, it integrates the predicted values of all instances as the final prediction of this bag using the Noisy-and method. The experimental results on in-vivo datasets demonstrate the superior performance of the proposed framework. In addition, we also explore the performance of the proposed framework when using k-mer encoding, and demonstrate the performance of the Noisy-and method by comparing it with other fusion methods, and find that adding recurrent layers can improve the performance of the proposed framework.
Collapse
Affiliation(s)
- Qinhu Zhang
- Institute of Machine Learning and Systems Biology, School of Electronics and Information Engineering, Tongji University, Shanghai, 201804, P.R. China
| | - Zhen Shen
- Institute of Machine Learning and Systems Biology, School of Electronics and Information Engineering, Tongji University, Shanghai, 201804, P.R. China
| | - De-Shuang Huang
- Institute of Machine Learning and Systems Biology, School of Electronics and Information Engineering, Tongji University, Shanghai, 201804, P.R. China.
| |
Collapse
|
53
|
Kori M, Gov E, Arga KY. Novel Genomic Biomarker Candidates for Cervical Cancer As Identified by Differential Co-Expression Network Analysis. OMICS-A JOURNAL OF INTEGRATIVE BIOLOGY 2019; 23:261-273. [PMID: 31038390 DOI: 10.1089/omi.2019.0025] [Citation(s) in RCA: 28] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Abstract
Cervical cancer is the second most common malignancy and the third reason for mortality among women in developing countries. Although infection by the oncogenic human papilloma viruses is a major cause, genomic contributors are still largely unknown. Network analyses, compared with candidate gene studies, offer greater promise to map the interactions among genomic loci contributing to cervical cancer risk. We report here a differential co-expression network analysis in five gene expression datasets (GSE7803, GSE9750, GSE39001, GSE52903, and GSE63514, from the Gene Expression Omnibus) in patients with cervical cancer and healthy controls. Kaplan-Meier Survival and principle component analyses were employed to evaluate prognostic and diagnostic performances of biomarker candidates, respectively. As a result, seven distinct co-expressed gene modules were identified. Among these, five modules (with sizes of 9-45 genes) presented high prognostic and diagnostic capabilities with hazard ratios of 2.28-11.3, and diagnostic odds ratios of 85.2-548.8. Moreover, these modules were associated with several key biological processes such as cell cycle regulation, keratinization, neutrophil degranulation, and the phospholipase D signaling pathway. In addition, transcription factors ETS1 and GATA2 were noted as common regulatory elements. These genomic biomarker candidates identified by differential co-expression network analysis offer new prospects for translational cancer research, not to mention personalized medicine to forecast cervical cancer susceptibility and prognosis. Looking into the future, we also suggest that the search for a molecular basis of common complex diseases should be complemented by differential co-expression analyses to obtain a systems-level understanding of disease phenotype variability.
Collapse
Affiliation(s)
- Medi Kori
- 1 Department of Bioengineering, Faculty of Engineering, Marmara University, Istanbul, Turkey
| | - Esra Gov
- 2 Department of Bioengineering, Faculty of Engineering, Adana Alparslan Türkeş Science and Technology University, Adana, Turkey
| | - Kazım Yalçın Arga
- 1 Department of Bioengineering, Faculty of Engineering, Marmara University, Istanbul, Turkey
| |
Collapse
|
54
|
Gao YC, Zhou XH, Zhang W. An Ensemble Strategy to Predict Prognosis in Ovarian Cancer Based on Gene Modules. Front Genet 2019; 10:366. [PMID: 31068972 PMCID: PMC6491874 DOI: 10.3389/fgene.2019.00366] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2019] [Accepted: 04/05/2019] [Indexed: 12/15/2022] Open
Abstract
Due to the high heterogeneity and complexity of cancer, it is still a challenge to predict the prognosis of cancer patients. In this work, we used a clustering algorithm to divide patients into different subtypes in order to reduce the heterogeneity of the cancer patients in each subtype. Based on the hypothesis that the gene co-expression network may reveal relationships among genes, some communities in the network could influence the prognosis of cancer patients and all the prognosis-related communities could fully reveal the prognosis of cancer patients. To predict the prognosis for cancer patients in each subtype, we adopted an ensemble classifier based on the gene co-expression network of the corresponding subtype. Using the gene expression data of ovarian cancer patients in TCGA (The Cancer Genome Atlas), three subtypes were identified. Survival analysis showed that patients in different subtypes had different survival risks. Three ensemble classifiers were constructed for each subtype. Leave-one-out and independent validation showed that our method outperformed control and literature methods. Furthermore, the function annotation of the communities in each subtype showed that some communities were cancer-related. Finally, we found that the current drug targets can partially support our method.
Collapse
Affiliation(s)
| | - Xiong-Hui Zhou
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan, China
| | - Wen Zhang
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan, China
| |
Collapse
|
55
|
Yuan L, Huang DS. A Network-guided Association Mapping Approach from DNA Methylation to Disease. Sci Rep 2019; 9:5601. [PMID: 30944378 PMCID: PMC6447594 DOI: 10.1038/s41598-019-42010-6] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2018] [Accepted: 03/12/2019] [Indexed: 01/11/2023] Open
Abstract
Aberrant DNA methylation may contribute to development of cancer. However, understanding the associations between DNA methylation and cancer remains a challenge because of the complex mechanisms involved in the associations and insufficient sample sizes. The unprecedented wealth of DNA methylation, gene expression and disease status data give us a new opportunity to design machine learning methods to investigate the underlying associated mechanisms. In this paper, we propose a network-guided association mapping approach from DNA methylation to disease (NAMDD). Compared with existing methods, NAMDD finds methylation-disease path associations by integrating analysis of multiple data combined with a stability selection strategy, thereby mining more information in the datasets and improving the quality of resultant methylation sites. The experimental results on both synthetic and real ovarian cancer data show that NAMDD substantially outperforms former disease-related methylation site research methods (including NsRRR and PCLOGIT) under false positive control. Furthermore, we applied NAMDD to ovarian cancer data, identified significant path associations and provided hypothetical biological path associations to explain our findings.
Collapse
Affiliation(s)
- Lin Yuan
- Institute of Machine Learning and Systems Biology, College of Electronic and Information Engineering, Tongji University, Shanghai, 201804, P.R. China
| | - De-Shuang Huang
- Institute of Machine Learning and Systems Biology, College of Electronic and Information Engineering, Tongji University, Shanghai, 201804, P.R. China.
| |
Collapse
|
56
|
Xu W, Zhu L, Huang DS. DCDE: An Efficient Deep Convolutional Divergence Encoding Method for Human Promoter Recognition. IEEE Trans Nanobioscience 2019; 18:136-145. [PMID: 30624223 DOI: 10.1109/tnb.2019.2891239] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Efficient human promoter feature extraction is still a major challenge in genome analysis as it can better understand human gene regulation and will be useful for experimental guidance. Although many machine learning algorithms have been developed for eukaryotic gene recognition, performance on promoters is unsatisfactory due to the diverse nature. To extract discriminative features from human promoters, an efficient deep convolutional divergence encoding method (DCDE) is proposed based on statistical divergence (SD) and convolutional neural network (CNN). SD can help optimize kmer feature extraction for human promoters. CNN can also be used to automatically extract features in gene analysis. In DCDE, we first perform informative kmers settlement to encode original gene sequences. A series of SD methods can optimize the most discriminative kmers distributions while maintaining important positional information. Then, CNN is utilized to extract lower dimensional deep features by secondary encoding. Finally, we construct a hybrid recognition architecture with multiple support vector machines and a bilayer decision method. It is flexible to add new features or new models and can be extended to identify other genomic functional elements. The extensive experiments demonstrate that DCDE is effective in promoter encoding and can significantly improve the performance of promoter recognition.
Collapse
|
57
|
Identification of cancer subtypes by integrating multiple types of transcriptomics data with deep learning in breast cancer. Neurocomputing 2019. [DOI: 10.1016/j.neucom.2018.03.072] [Citation(s) in RCA: 51] [Impact Index Per Article: 10.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
|
58
|
Zhang L, Yu G, Xia D, Wang J. Protein–protein interactions prediction based on ensemble deep neural networks. Neurocomputing 2019. [DOI: 10.1016/j.neucom.2018.02.097] [Citation(s) in RCA: 74] [Impact Index Per Article: 14.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]
|
59
|
You ZH, Huang W, Zhang S, Huang YA, Yu CQ, Li LP. An Efficient Ensemble Learning Approach for Predicting Protein-Protein Interactions by Integrating Protein Primary Sequence and Evolutionary Information. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2018; 16:809-817. [PMID: 30475726 DOI: 10.1109/tcbb.2018.2882423] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Protein-protein interactions (PPIs) perform a very important function in many cellular processes, including signal transduction, post-translational modifications, apoptosis, and cell growth. Deregulation of PPIs results in many diseases, including cancer and pernicious anemia. Although many high-throughput methods have been applied to generate a large amount of PPIs data, they are generally expensive, inefficient and labor-intensive. Hence, there is an urgent need for developing a computational method to accurately and rapidly detect PPIs. In this article, we proposed a highly efficient approach to predict PPIs by integrating a new protein sequence substitution matrix feature representation and ensemble weighted sparse representation model classifier. The proposed method is demonstrated on Saccharomyces cerevisiae dataset and achieved 99.26% prediction accuracy with 98.53% sensitivity at precision of 100%, which is shown to have much higher predictive accuracy than current state-of-the-art algorithms. Extensive experiments are performed with the benchmark data set from Human and Helicobacter pylori that the proposed method achieves outstanding better success rates than other existing approaches in this problem. Experiment results illustrate that our proposed method presents an economical approach for computational building of PPI networks, which can be a helpful supplementary method for future proteomics researches.
Collapse
|
60
|
Recurrent Neural Network for Predicting Transcription Factor Binding Sites. Sci Rep 2018; 8:15270. [PMID: 30323198 PMCID: PMC6189047 DOI: 10.1038/s41598-018-33321-1] [Citation(s) in RCA: 104] [Impact Index Per Article: 17.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2018] [Accepted: 09/25/2018] [Indexed: 12/23/2022] Open
Abstract
It is well known that DNA sequence contains a certain amount of transcription factors (TF) binding sites, and only part of them are identified through biological experiments. However, these experiments are expensive and time-consuming. To overcome these problems, some computational methods, based on k-mer features or convolutional neural networks, have been proposed to identify TF binding sites from DNA sequences. Although these methods have good performance, the context information that relates to TF binding sites is still lacking. Research indicates that standard recurrent neural networks (RNN) and its variants have better performance in time-series data compared with other models. In this study, we propose a model, named KEGRU, to identify TF binding sites by combining Bidirectional Gated Recurrent Unit (GRU) network with k-mer embedding. Firstly, DNA sequences are divided into k-mer sequences with a specified length and stride window. And then, we treat each k-mer as a word and pre-trained word representation model though word2vec algorithm. Thirdly, we construct a deep bidirectional GRU model for feature learning and classification. Experimental results have shown that our method has better performance compared with some state-of-the-art methods. Additional experiments about embedding strategy show that k-mer embedding will be helpful to enhance model performance. The robustness of KEGRU is proved by experiments with different k-mer length, stride window and embedding vector dimension.
Collapse
|
61
|
Deng SP, Guo WL. Identifying Key Genes of Liver Cancer by Networking of Multiple Data Sets. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2018; 16:792-800. [PMID: 30296239 DOI: 10.1109/tcbb.2018.2874238] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Liver cancer is one of the deadliest cancers in the world. To find effective therapies for this cancer, it is indispensable to identify key genes, which may play critical roles in the incidence of the liver cancer. To identify key genes of the liver cancer with high accuracy, we integrated multiple microarray gene expression data sets to compute common differentially expressed genes, which will result more accurate than those from individual data set. To find the main functions or pathways that these genes are involved in, some enrichment analyses were performed including functional enrichment analysis, pathway enrichment analysis, and disease association study. Based on these genes, a protein-protein interaction network was constructed and analyzed to identify key genes of the liver cancer by combining the local and global influence of nodes in the network. The identified key genes, such as TOP2A, ESR1, and KMO, have been demonstrated to be key biomarkers of the liver cancer in many publications. All the results suggest that our method can effectively identify key genes of the liver cancer. Moreover, our method can be applied to other types of data sets to select key genes of other complex diseases.
Collapse
|
62
|
Russo G, Pennisi M, Boscarino R, Pappalardo F. Continuous Petri Nets and microRNA Analysis in Melanoma. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2018; 15:1492-1499. [PMID: 28767374 DOI: 10.1109/tcbb.2017.2733529] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
Personalized target therapies represent one of the possible treatment strategies to fight the ongoing battle against cancer. New treatment interventions are still needed for an effective and successful cancer therapy. In this scenario, we simulated and analyzed the dynamics of BRAF V600E melanoma patients treated with BRAF inhibitors in order to find potentially interesting targets that may make standard treatments more effective in particularly aggressive tumors that may not respond to selective inhibitor drugs. To this aim, we developed a continuous Petri Net model that simulates fundamental signalling cascades involved in melanoma development, such as MAPK and PI3K/AKT, in order to deeply analyze these complex kinase cascades and predict new crucial nodes involved in melanomagenesis. The model pointed out that some microRNAs, like hsa-mir-132, downregulates expression levels of p120RasGAP: under high concentrations of p120RasGAP, MAPK pathway activation is significantly decreased and consequently also PI3K/PDK1/AKT activation. Furthermore, our analysis carried out through the Genomic Data Commons (GDC) Data Portal shows the evidence that hsa-mir-132 is significantly associated with clinical outcome in melanoma cancer genomic data sets of BRAF-mutated patients. In conclusion, targeting miRNAs through antisense oligonucleotides technology may suggest the way to enhance the action of BRAF-inhibitors.
Collapse
|
63
|
Deng SP, Hu W, Calhoun VD, Wang YP. Integrating Imaging Genomic Data in the Quest for Biomarkers of Schizophrenia Disease. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2018; 15:1480-1491. [PMID: 28880187 PMCID: PMC6207076 DOI: 10.1109/tcbb.2017.2748944] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/06/2023]
Abstract
It's increasingly important but difficult to determine potential biomarkers of schizophrenia (SCZ) disease, owing to the complex pathophysiology of this disease. In this study, a network-fusion based framework was proposed to identify genetic biomarkers of the SCZ disease. A three-step feature selection was applied to single nucleotide polymorphisms (SNPs), DNA methylation, and functional magnetic resonance imaging (fMRI) data to select important features, which were then used to construct two gene networks in different states for the SNPs and DNA methylation data, respectively. Two health networks (one is for SNP data and the other is for DNA methylation data) were combined into one health network from which health minimum spanning trees (MSTs) were extracted. Two disease networks also followed the same procedures. Those genes with significant changes were determined as SCZ biomarkers by comparing MSTs in two different states and they were finally validated from five aspects. The effectiveness of the proposed discovery framework was also demonstrated by comparing with other network-based discovery methods. In summary, our approach provides a general framework for discovering gene biomarkers of the complex diseases by integrating imaging genomic data, which can be applied to the diagnosis of the complex diseases in the future.
Collapse
Affiliation(s)
- Su-Ping Deng
- Department of Biomedical Engineering, School of Science and Engineering, Tulane University, New Orleans, LA 70118, USA.,
| | - Wenxing Hu
- Department of Biomedical Engineering, School of Science and Engineering, Tulane University, New Orleans, LA 70118, USA.,
| | | | - Yu-Ping Wang
- Department of Biomedical Engineering, School of Science and Engineering, Tulane University, New Orleans, LA 70118, USA., , Telephone: (504)865-5867, Fax: (504)862-8779
| |
Collapse
|
64
|
Lin X, Zhang X. Prediction of Hot Regions in PPIs Based on Improved Local Community Structure Detecting. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2018; 15:1470-1479. [PMID: 29994749 DOI: 10.1109/tcbb.2018.2793858] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
The hot regions in PPIs are some assembly regions which are composed of the tightly packed HotSpots. The discovery of hot regions helps to understand life activities and has very important value for biological applications. The identification of hot regions is the basis for protein design and cancer prevention. The existing algorithms of predicting hot regions often have some defects, such as low accuracy and unstability. This paper proposes a novel hot region prediction method based on diverse biological characteristics. First, feature evaluation is employed by using an impoved mRMR method. Then, SVM is adopted to create cassification model based on the features selected. In addition, a new clustering algorithm, namely LCSD (Local community structure detecting), is developed to detect and analyze the conformation of hot regions. In the clustering process, the link similarity of protein residues is introduced to handle the boundary nodes. This algorithm can effectively deal with the missing residue nodes and control the local community boundaries. The results indicate that the spatial structure of hot regions can be obtained more effectively, and that our method is more effective than previous methods for precise identification of hot regions.
Collapse
|
65
|
Liu J, Cheng Y, Wang X, Cui X, Kong Y, Du J. Low Rank Subspace Clustering via Discrete Constraint and Hypergraph Regularization for Tumor Molecular Pattern Discovery. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2018; 15:1500-1512. [PMID: 29993749 DOI: 10.1109/tcbb.2018.2834371] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Tumor clustering is a powerful approach for cancer class discovery which is crucial to the effective treatment of cancer. Many traditional clustering methods such as NMF-based models, have been widely used to identify tumors. However, they cannot achieve satisfactory results. Recently, subspace clustering approaches have been proposed to improve the performance by dividing the original space into multiple low-dimensional subspaces. Among them, low rank representation is becoming a popular approach to attain subspace clustering. In this paper, we propose a novel Low Rank Subspace Clustering model via Discrete Constraint and Hypergraph Regularization (DHLRS). The proposed method learns the cluster indicators directly by using discrete constraint, which makes the clustering task simple. For each subspace, we adopt Schatten -norm to better approximate the low rank constraint. Moreover, Hypergraph Regularization is adopted to infer the complex relationship between genes and intrinsic geometrical structure of gene expression data in each subspace. Finally, the molecular pattern of tumor gene expression data sets is discovered according to the optimized cluster indicators. Experiments on both synthetic data and real tumor gene expression data sets prove the effectiveness of proposed DHLRS.
Collapse
|
66
|
Bao W, Yuan CA, Zhang Y, Han K, Nandi AK, Honig B, Huang DS. Mutli-Features Prediction of Protein Translational Modification Sites. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2018; 15:1453-1460. [PMID: 28961121 DOI: 10.1109/tcbb.2017.2752703] [Citation(s) in RCA: 26] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
Post translational modification plays a significiant role in the biological processing. The potential post translational modification is composed of the center sites and the adjacent amino acid residues which are fundamental protein sequence residues. It can be helpful to perform their biological functions and contribute to understanding the molecular mechanisms that are the foundations of protein design and drug design. The existing algorithms of predicting modified sites often have some shortcomings, such as lower stability and accuracy. In this paper, a combination of physical, chemical, statistical, and biological properties of a protein have been ulitized as the features, and a novel framework is proposed to predict a protein's post translational modification sites. The multi-layer neural network and support vector machine are invoked to predict the potential modified sites with the selected features that include the compositions of amino acid residues, the E-H description of protein segments, and several properties from the AAIndex database. Being aware of the possible redundant information, the feature selection is proposed in the propocessing step in this research. The experimental results show that the proposed method has the ability to improve the accuracy in this classification issue.
Collapse
|
67
|
Kamal MS, Trivdedi MC, Alam JB, Dey N, Ashour AS, Shi F, Tavares JMR. Big DNA datasets analysis under push down automata. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS 2018. [DOI: 10.3233/jifs-169695] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Affiliation(s)
- Md. S. Kamal
- Department of Computer Science and Engineering, East West University Bangladesh, Bangladesh
| | - Munesh C. Trivdedi
- Department of Information Technology and Engineering, REC, Azamgarh, UP, India
| | - Jannat B. Alam
- Department of Computer Science and Engineering, East West University Bangladesh, Bangladesh
| | - Nilanjan Dey
- Department of Information Technology, Techno India College of Technology, West Bengal, India
| | - Amira S. Ashour
- Department of Electronics and Electrical Communications Engineering, Faculty of Engineering, Tanta University, Egypt
| | - Fuqian Shi
- College of Information and Engineering, Wenzhou Medical University, Wenzhou, PR China
| | - João Manuel R.S. Tavares
- Instituto de Ciência e Inovação em Engenharia Mecânica e Engenharia Industrial, Departamento de Engenharia Mecânica, Faculdade de Engenharia, Universidade do Porto, Portugal
| |
Collapse
|
68
|
Yuan L, Guo LH, Yuan CA, Zhang YH, Han K, Nandi A, Honig B, Huang DS. Integration of Multi-omics Data for Gene Regulatory Network Inference and Application to Breast Cancer. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2018; 16:782-791. [PMID: 30137012 DOI: 10.1109/tcbb.2018.2866836] [Citation(s) in RCA: 28] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Underlying a cancer phenotype is a specific gene regulatory network that represents the complex regulatory relationships between genes. However, it remains a challenge to find cancer-related gene regulatory network because of insufficient sample sizes and complex regulatory mechanisms in which gene is influenced by not only other genes but also other biological factors. With the development of high-throughput technologies and the unprecedented wealth of multi-omics data give us a new opportunity to design machine learning method to investigate underlying gene regulatory network. In this paper, we propose an approach, which use biweight midcorrelation to measure the correlation between factors and make use of nonconvex penalty based sparse regression for gene regulatory network inference (BMNPGRN). BMNCGRN incorporates multi-omics data (including DNA methylation and copy number variation) and their interactions in gene regulatory network model. The experimental results on synthetic datasets show that BMNPGRN outperforms popular and state-of-the-art methods (including DCGRN, ARACNE and CLR) under false positive control. Furthermore, we applied BMNPGRN on breast cancer (BRCA) data from The Cancer Genome Atlas database and provided gene regulatory network.
Collapse
|
69
|
Wassan JT, Wang H, Browne F, Zheng H. A Comprehensive Study on Predicting Functional Role of Metagenomes Using Machine Learning Methods. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2018; 16:751-763. [PMID: 30040657 DOI: 10.1109/tcbb.2018.2858808] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
"Metagenomics" is the study of genomic sequences obtained directly from environmental microbial communities with the aim to linking their structures with functional roles. The field has been aided in the unprecedented advancement through high-throughput omics data sequencing. The outcome of sequencing are biologically rich data sets. Metagenomic data consisting of microbial spe-cies which outnumber microbial samples, lead to the "curse of dimensionality". Hence the focus in metagenomics studies has moved towards developing efficient computational models using Machine Learning (ML), reducing the computational cost. In this paper, we comprehensively assessed various ML approaches to classifying high-dimensional human microbiota effectively into their functional phenotypes. We propose the application of embedded feature selection methods, namely, Extreme Gradient Boost-ing and Penalized Logistic Regression to determine important species. The resultant feature set enhanced the performance of one of the most popular state-of-the-art methods, Random Forest (RF) over metagenomic studies. Experimental results indicate that the proposed method achieved best results in terms of accuracy, area under Receiver Operating Characteristic curve (ROC-AUC) and major improvement in processing time. It outperformed other feature selection methods of filters or wrappers over RF and classifiers such as Support Vector Machine (SVM), Extreme Learning Machine (ELM), and -Nearest Neighbors (-NN).
Collapse
|
70
|
|
71
|
Gao L, Bao W, Zhang H, Yuan CA, Huang DS. Fast sequence analysis based on diamond sampling. PLoS One 2018; 13:e0198922. [PMID: 29953448 PMCID: PMC6023231 DOI: 10.1371/journal.pone.0198922] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2018] [Accepted: 05/29/2018] [Indexed: 12/02/2022] Open
Abstract
Both in DNA and protein contexts, an important method for modelling motifs is to utilize position weight matrix (PWM) in biological sequences. With the development of genome sequencing technology, the quantity of the sequence data is increasing explosively, so the faster searching algorithms which have the ability to meet the increasingly need are desired to develop. In this paper, we proposed a method for speeding up the searching process of candidate transcription factor binding sites (TFBS), and the users can be allowed to specify p threshold to get the desired trade-off between speed and sensitivity for a particular sequence analysis. Moreover, the proposed method can also be generalized to large-scale annotation and sequence projects.
Collapse
Affiliation(s)
- Liangxin Gao
- Institute of Machine Learning and Systems Biology, School of Electronics and Information Engineering, Tongji University, Shanghai, China
| | - Wenzhen Bao
- Institute of Machine Learning and Systems Biology, School of Electronics and Information Engineering, Tongji University, Shanghai, China
| | - Hongbo Zhang
- Institute of Machine Learning and Systems Biology, School of Electronics and Information Engineering, Tongji University, Shanghai, China
| | - Chang-An Yuan
- Science Computing and Intelligent Information Processing of GuangXi Higher Education Key Laboratory, Guangxi Teachers Education University, Nanning, Guangxi, China
| | - De-Shuang Huang
- Institute of Machine Learning and Systems Biology, School of Electronics and Information Engineering, Tongji University, Shanghai, China
| |
Collapse
|
72
|
Pennisi M, Russo G, Ravalli S, Pappalardo F. Combining agent based-models and virtual screening techniques to predict the best citrus-derived vaccine adjuvants against human papilloma virus. BMC Bioinformatics 2017; 18:544. [PMID: 29297294 PMCID: PMC5751416 DOI: 10.1186/s12859-017-1961-9] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023] Open
Abstract
Background Human papillomavirus infection is a global social burden that, every year, leads to thousands new diagnosis of cancer. The introduction of a protocol of immunization, with Gardasil and Cervarix vaccines, has radically changed the way this infection easily spreads among people. Even though vaccination is only preventive and not therapeutic, it is a strong tool capable to avoid the consequences that this pathogen could cause. Gardasil vaccine is not free from side effects and the duration of immunity is not always well determined. This work aim to enhance the effects of the vaccination by using a new class of adjuvants and a different administration protocol. Due to their minimum side effects, their easy extraction, their low production costs and their proven immune stimulating activity, citrus-derived molecules are valid candidates to be administered as adjuvants in a vaccine formulation against Hpv. Results With the aim to get a stronger immune response against Hpv infection we built an in silico model that delivers a way to predict the best adjuvants and the optimal means of administration to obtain such a goal. Simulations envisaged that the use of Neohesperidin elicited a strong immune response that was then validated in vivo. Conclusions We built up a computational infrastructure made by a virtual screening approach able to preselect promising citrus derived compounds, and by an agent based model that reproduces HPV dynamics subject to vaccine stimulation. This integrated methodology was able to predict the best protocol that confers a very good immune response against HPV infection. We finally tested the in silico results through in vivo experiments on mice, finding good agreement.
Collapse
Affiliation(s)
- Marzio Pennisi
- Department of Mathematics and Computer Science, University of Catania, 95125, Catania, Italy
| | - Giulia Russo
- Department of Biomedical and Biotechnological Sciences, University of Catania, 95123, Catania, Italy
| | - Silvia Ravalli
- Department of Drug Sciences, University of Catania, 95125, Catania, Italy
| | | |
Collapse
|
73
|
Yue Z, Li HT, Yang Y, Hussain S, Zheng CH, Xia J, Chen Y. Identification of breast cancer candidate genes using gene co-expression and protein-protein interaction information. Oncotarget 2017; 7:36092-36100. [PMID: 27150055 PMCID: PMC5094985 DOI: 10.18632/oncotarget.9132] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2015] [Accepted: 04/16/2016] [Indexed: 01/18/2023] Open
Abstract
Breast cancer (BC) is one of the most common malignancies that could threaten female health. As the molecular mechanism of BC has not yet been completely discovered, identification of related genes of this disease is an important area of research that could provide new insights into gene function as well as potential treatment targets. Here we used subnetwork extraction algorithms to identify novel BC related genes based on the known BC genes (seed genes), gene co-expression profiles and protein-protein interaction network. We computationally predicted seven key genes (EPHX2, GHRH, PPYR1, ALPP, KNG1, GSK3A and TRIT1) as putative genes of BC. Further analysis shows that six of these have been reported as breast cancer associated genes, and one (PPYR1) as cancer associated gene. Lastly, we developed an expression signature using these seven key genes which significantly stratified 1660 BC patients according to relapse free survival (hazard ratio [HR], 0.55; 95% confidence interval [CI], 0.46–0.65; Logrank p = 5.5e−13). The 7-genes signature could be established as a useful predictor of disease prognosis in BC patients. Overall, the identified seven genes might be useful prognostic and predictive molecular markers to predict the clinical outcome of BC patients.
Collapse
Affiliation(s)
- Zhenyu Yue
- School of Life Sciences, Anhui University, Hefei, Anhui 230601, China.,Institute of Health Sciences, Anhui University, Hefei, Anhui 230601, China
| | - Hai-Tao Li
- College of Electrical Engineering and Automation, Anhui University, Hefei, Anhui 230601, China
| | - Yabing Yang
- School of Life Sciences, Anhui University, Hefei, Anhui 230601, China
| | - Sajid Hussain
- School of Life Sciences, Anhui University, Hefei, Anhui 230601, China
| | - Chun-Hou Zheng
- College of Electrical Engineering and Automation, Anhui University, Hefei, Anhui 230601, China
| | - Junfeng Xia
- Institute of Health Sciences, Anhui University, Hefei, Anhui 230601, China
| | - Yan Chen
- School of Life Sciences, Anhui University, Hefei, Anhui 230601, China
| |
Collapse
|
74
|
Liu J, Wang X, Cheng Y, Zhang L. Tumor gene expression data classification via sample expansion-based deep learning. Oncotarget 2017; 8:109646-109660. [PMID: 29312636 PMCID: PMC5752549 DOI: 10.18632/oncotarget.22762] [Citation(s) in RCA: 32] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2017] [Accepted: 10/29/2017] [Indexed: 12/15/2022] Open
Abstract
Since tumor is seriously harmful to human health, effective diagnosis measures are in urgent need for tumor therapy. Early detection of tumor is particularly important for better treatment of patients. A notable issue is how to effectively discriminate tumor samples from normal ones. Many classification methods, such as Support Vector Machines (SVMs), have been proposed for tumor classification. Recently, deep learning has achieved satisfactory performance in the classification task of many areas. However, the application of deep learning is rare in tumor classification due to insufficient training samples of gene expression data. In this paper, a Sample Expansion method is proposed to address the problem. Inspired by the idea of Denoising Autoencoder (DAE), a large number of samples are obtained by randomly cleaning partially corrupted input many times. The expanded samples can not only maintain the merits of corrupted data in DAE but also deal with the problem of insufficient training samples of gene expression data to a certain extent. Since Stacked Autoencoder (SAE) and Convolutional Neural Network (CNN) models show excellent performance in classification task, the applicability of SAE and 1-dimensional CNN (1DCNN) on gene expression data is analyzed. Finally, two deep learning models, Sample Expansion-Based SAE (SESAE) and Sample Expansion-Based 1DCNN (SE1DCNN), are designed to carry out tumor gene expression data classification by using the expanded samples. Experimental studies indicate that SESAE and SE1DCNN are very effective in tumor classification.
Collapse
Affiliation(s)
- Jian Liu
- School of Information and Control Engineering, China University of Mining and Technology, Xuzhou 221116, China
| | - Xuesong Wang
- School of Information and Control Engineering, China University of Mining and Technology, Xuzhou 221116, China
| | - Yuhu Cheng
- School of Information and Control Engineering, China University of Mining and Technology, Xuzhou 221116, China
| | - Lin Zhang
- School of Information and Control Engineering, China University of Mining and Technology, Xuzhou 221116, China
| |
Collapse
|
75
|
Nimmy SF, Kamal MS, Hossain MI, Dey N, Ashour AS, Shi F. Neural Skyline Filtering for Imbalance Features Classification. INTERNATIONAL JOURNAL OF COMPUTATIONAL INTELLIGENCE AND APPLICATIONS 2017. [DOI: 10.1142/s1469026817500195] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
In the current digitalized era, large datasets play a vital role in features extractions, information processing, knowledge mining and management. Sometimes, existing mining approaches are not sufficient to handle large volume of datasets. Biological data processing also suffers for the same issue. In the present work, a classification process is carried out on large volume of exons and introns from a set of raw data. The proposed work is designed into two parts as pre-processing and mapping-based classification. For pre-processing, three filtering techniques have been used. However, these traditional filtering techniques face difficulties for large datasets due to the long required time during large data processing as well as the large required memory size. In this regard, a mapping-based neural skyline filtering approach is designed. Randomized algorithm performed the mapping for large volume of datasets based on objective function. The objective function determines the randomized size of the datasets according to the homogeneity. Around 200 million DNA base pairs have been used for experimental analysis. Experimental result shows that mapping centric filtering outperforms other filtering techniques during large data processing.
Collapse
Affiliation(s)
- Sonia Farhana Nimmy
- Department of Computer Science and Engineering, Notre Dame University Bangladesh, Bangladesh
| | - Md. Sarwar Kamal
- Department of Computer Science and Engineering, East West University Bangladesh, Bangladesh
| | - Muhammad Iqbal Hossain
- Department of Computer Science and Engineering, BGC Trust University Bangladesh, Bangladesh
| | - Nilanjan Dey
- Department of Information Technology, Techno India College of Technology, India
| | - Amira S. Ashour
- Department of Electronics and Electrical, Communications Engineering Tanta University, Egypt
| | - Fuqian Shi
- College of Information and Engineering, Wenzhou Medical University, Wenzhou, P. R. China
| |
Collapse
|
76
|
Li JQ, You ZH, Li X, Ming Z, Chen X. PSPEL: In Silico Prediction of Self-Interacting Proteins from Amino Acids Sequences Using Ensemble Learning. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2017; 14:1165-1172. [PMID: 28092572 DOI: 10.1109/tcbb.2017.2649529] [Citation(s) in RCA: 33] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Self interacting proteins (SIPs) play an important role in various aspects of the structural and functional organization of the cell. Detecting SIPs is one of the most important issues in current molecular biology. Although a large number of SIPs data has been generated by experimental methods, wet laboratory approaches are both time-consuming and costly. In addition, they yield high false negative and positive rates. Thus, there is a great need for in silico methods to predict SIPs accurately and efficiently. In this study, a new sequence-based method is proposed to predict SIPs. The evolutionary information contained in Position-Specific Scoring Matrix (PSSM) is extracted from of protein with known sequence. Then, features are fed to an ensemble classifier to distinguish the self-interacting and non-self-interacting proteins. When performed on Saccharomyces cerevisiae and Human SIPs data sets, the proposed method can achieve high accuracies of 86.86 and 91.30 percent, respectively. Our method also shows a good performance when compared with the SVM classifier and previous methods. Consequently, the proposed method can be considered to be a novel promising tool to predict SIPs.
Collapse
|
77
|
Chen Q, Lan C, Chen B, Wang L, Li J, Zhang C. Exploring Consensus RNA Substructural Patterns Using Subgraph Mining. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2017; 14:1134-1146. [PMID: 28026781 DOI: 10.1109/tcbb.2016.2645202] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Frequently recurring RNA structural motifs play important roles in RNA folding process and interaction with other molecules. Traditional index-based and shape-based schemas are useful in modeling RNA secondary structures but ignore the structural discrepancy of individual RNA family member. Further, the in-depth analysis of underlying substructure pattern is insufficient due to varied and unnormalized substructure data. This prevents us from understanding RNAs functions and their inherent synergistic regulation networks. This article thus proposes a novel labeled graph-based algorithm RnaGraph to uncover frequently RNA substructure patterns. Attribute data and graph data are combined to characterize diverse substructures and their correlations, respectively. Further, a top-k graph pattern mining algorithm is developed to extract interesting substructure motifs by integrating frequency and similarity. The experimental results show that our methods assist in not only modelling complex RNA secondary structures but also identifying hidden but interesting RNA substructure patterns.
Collapse
|
78
|
Yuan L, Zhu L, Guo WL, Zhou X, Zhang Y, Huang Z, Huang DS. Nonconvex Penalty Based Low-Rank Representation and Sparse Regression for eQTL Mapping. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2017; 14:1154-1164. [PMID: 28114074 DOI: 10.1109/tcbb.2016.2609420] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
This paper addresses the problem of accounting for confounding factors and expression quantitative trait loci (eQTL) mapping in the study of SNP-gene associations. The existing convex penalty based algorithm has limited capacity to keep main information of matrix in the process of reducing matrix rank. We present an algorithm, which use nonconvex penalty based low-rank representation to account for confounding factors and make use of sparse regression for eQTL mapping (NCLRS). The efficiency of the presented algorithm is evaluated by comparing the results of 18 synthetic datasets given by NCLRS and presented algorithm, respectively. The experimental results or biological dataset show that our approach is an effective tool to account for non-genetic effects than currently existing methods.
Collapse
|
79
|
Kamal MS, Sarowar MG, Dey N, Ashour AS, Ripon SH, Panigrahi BK, Tavares JMRS. Self-organizing mapping based swarm intelligence for secondary and tertiary proteins classification. INT J MACH LEARN CYB 2017. [DOI: 10.1007/s13042-017-0710-8] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023]
|
80
|
Zhang T, Wang X, Yue Z. Identification of candidate genes related to pancreatic cancer based on analysis of gene co-expression and protein-protein interaction network. Oncotarget 2017; 8:71105-71116. [PMID: 29050346 PMCID: PMC5642621 DOI: 10.18632/oncotarget.20537] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2017] [Accepted: 07/29/2017] [Indexed: 12/11/2022] Open
Abstract
Pancreatic cancer (PC) is one of the most common causes of cancer mortality worldwide. As the genetic mechanism of this complex disease is not uncovered clearly, identification of related genes of PC is of great significance that could provide new insights into gene function as well as potential therapy targets. In this study, we performed an integrated network method to discover PC candidate genes based on known PC related genes. Utilizing the subnetwork extraction algorithm with gene co-expression profiles and protein-protein interaction data, we obtained the integrated network comprising of the known PC related genes (denoted as seed genes) and the putative genes (denoted as linker genes). We then prioritized the linker genes based on their network information and inferred six key genes (KRT19, BARD1, MST1R, S100A14, LGALS1 and RNF168) as candidate genes of PC. Further analysis indicated that all of these genes have been reported as pancreatic cancer associated genes. Finally, we developed an expression signature using these six key genes which significantly stratified PC patients according to overall survival (Logrank p = 0.003) and was validated on an independent clinical cohort (Logrank p = 0.03). Overall, the identified six genes might offer helpful prognostic stratification information and be suitable to transfer to clinical use in PC patients.
Collapse
Affiliation(s)
- Tiejun Zhang
- GMU-GIBH Joint School of Life Sciences, Guangzhou Medical University, Guangzhou, Guangdong 511436, China
| | - Xiaojuan Wang
- Institute of Health Sciences, School of Computer Science and Technology, Anhui University, Hefei, Anhui 230601, China
| | - Zhenyu Yue
- Institute of Health Sciences, School of Computer Science and Technology, Anhui University, Hefei, Anhui 230601, China
| |
Collapse
|
81
|
Robust Significance Analysis of Microarrays by Minimum β-Divergence Method. BIOMED RESEARCH INTERNATIONAL 2017; 2017:5310198. [PMID: 28819626 PMCID: PMC5551475 DOI: 10.1155/2017/5310198] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/29/2017] [Accepted: 05/28/2017] [Indexed: 11/18/2022]
Abstract
Identification of differentially expressed (DE) genes with two or more conditions is an important task for discovery of few biomarker genes. Significance Analysis of Microarrays (SAM) is a popular statistical approach for identification of DE genes for both small- and large-sample cases. However, it is sensitive to outlying gene expressions and produces low power in presence of outliers. Therefore, in this paper, an attempt is made to robustify the SAM approach using the minimum β-divergence estimators instead of the maximum likelihood estimators of the parameters. We demonstrated the performance of the proposed method in a comparison of some other popular statistical methods such as ANOVA, SAM, LIMMA, KW, EBarrays, GaGa, and BRIDGE using both simulated and real gene expression datasets. We observe that all methods show good and almost equal performance in absence of outliers for the large-sample cases, while in the small-sample cases only three methods (SAM, LIMMA, and proposed) show almost equal and better performance than others with two or more conditions. However, in the presence of outliers, on an average, only the proposed method performs better than others for both small- and large-sample cases with each condition.
Collapse
|
82
|
Lu H, Yang L, Yan K, Xue Y, Gao Z. A cost-sensitive rotation forest algorithm for gene expression data classification. Neurocomputing 2017. [DOI: 10.1016/j.neucom.2016.09.077] [Citation(s) in RCA: 50] [Impact Index Per Article: 7.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
83
|
Puig-Butille JA, Gimenez-Xavier P, Visconti A, Nsengimana J, Garcia-García F, Tell-Marti G, Escamez MJ, Newton-Bishop J, Bataille V, del Río M, Dopazo J, Falchi M, Puig S. Genomic expression differences between cutaneous cells from red hair color individuals and black hair color individuals based on bioinformatic analysis. Oncotarget 2017; 8:11589-11599. [PMID: 28030792 PMCID: PMC5355288 DOI: 10.18632/oncotarget.14140] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2016] [Accepted: 11/21/2016] [Indexed: 12/11/2022] Open
Abstract
The MC1R gene plays a crucial role in pigmentation synthesis. Loss-of-function MC1R variants, which impair protein function, are associated with red hair color (RHC) phenotype and increased skin cancer risk. Cultured cutaneous cells bearing loss-of-function MC1R variants show a distinct gene expression profile compared to wild-type MC1R cultured cutaneous cells. We analysed the gene signature associated with RHC co-cultured melanocytes and keratinocytes by Protein-Protein interaction (PPI) network analysis to identify genes related with non-functional MC1R variants. From two detected networks, we selected 23 nodes as hub genes based on topological parameters. Differential expression of hub genes was then evaluated in healthy skin biopsies from RHC and black hair color (BHC) individuals. We also compared gene expression in melanoma tumors from individuals with RHC versus BHC. Gene expression in normal skin from RHC cutaneous cells showed dysregulation in 8 out of 23 hub genes (CLN3, ATG10, WIPI2, SNX2, GABARAPL2, YWHA, PCNA and GBAS). Hub genes did not differ between melanoma tumors in RHC versus BHC individuals. The study suggests that healthy skin cells from RHC individuals present a constitutive genomic deregulation associated with the red hair phenotype and identify novel genes involved in melanocyte biology.
Collapse
Affiliation(s)
- Joan Anton Puig-Butille
- Biochemistry and Molecular Genetics Department, Melanoma Unit, Hospital Clinic & IDIBAPS, CIBER de Enfermedades Raras (CIBERER), Barcelona, Spain
| | - Pol Gimenez-Xavier
- Dermatology Department, Melanoma Unit, Hospital Clinic & IDIBAPS, CIBER de Enfermedades Raras (CIBERER), Barcelona, Spain
| | - Alessia Visconti
- Department of Twin Research and Genetic Epidemiology, King's College London, London, UK
| | - Jérémie Nsengimana
- Section of Epidemiology and Biostatistics, Leeds Institute of Cancer and Pathology, University of Leeds, Leeds, UK
| | - Francisco Garcia-García
- Computational Genomics Department, Centro de Investigación Príncipe Felipe (CIPF), Valencia, Spain
| | - Gemma Tell-Marti
- Dermatology Department, Melanoma Unit, Hospital Clinic & IDIBAPS, CIBER de Enfermedades Raras (CIBERER), Barcelona, Spain
| | - Maria José Escamez
- Departamento de Bioingeniería, Universidad Carlos III de Madrid, CIEMAT, IIS-Fundación Jiménez Díaz, CIBER de Enfermedades Raras (CIBERER), Madrid, Spain
| | - Julia Newton-Bishop
- Section of Epidemiology and Biostatistics, Leeds Institute of Cancer and Pathology, University of Leeds, Leeds, UK
| | - Veronique Bataille
- Department of Twin Research and Genetic Epidemiology, King's College London, London, UK
| | - Marcela del Río
- Departamento de Bioingeniería, Universidad Carlos III de Madrid, CIEMAT, IIS-Fundación Jiménez Díaz, CIBER de Enfermedades Raras (CIBERER), Madrid, Spain
| | - Joaquín Dopazo
- Computational Genomics Department, Centro de Investigación Príncipe Felipe (CIPF), Valencia, Spain
- Functional Genomics Node, (INB) at CIPF, Valencia, Spain
- CIBER de Enfermedades Raras (CIBERER), Valencia, Spain
| | - Mario Falchi
- Department of Twin Research and Genetic Epidemiology, King's College London, London, UK
| | - Susana Puig
- Dermatology Department, Melanoma Unit, Hospital Clinic & IDIBAPS, CIBER de Enfermedades Raras (CIBERER), Barcelona, Spain
| |
Collapse
|
84
|
Differentially Coexpressed Disease Gene Identification Based on Gene Coexpression Network. BIOMED RESEARCH INTERNATIONAL 2016; 2016:3962761. [PMID: 28042568 PMCID: PMC5155124 DOI: 10.1155/2016/3962761] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/23/2016] [Accepted: 10/26/2016] [Indexed: 11/17/2022]
Abstract
Screening disease-related genes by analyzing gene expression data has become a popular theme. Traditional disease-related gene selection methods always focus on identifying differentially expressed gene between case samples and a control group. These traditional methods may not fully consider the changes of interactions between genes at different cell states and the dynamic processes of gene expression levels during the disease progression. However, in order to understand the mechanism of disease, it is important to explore the dynamic changes of interactions between genes in biological networks at different cell states. In this study, we designed a novel framework to identify disease-related genes and developed a differentially coexpressed disease-related gene identification method based on gene coexpression network (DCGN) to screen differentially coexpressed genes. We firstly constructed phase-specific gene coexpression network using time-series gene expression data and defined the conception of differential coexpression of genes in coexpression network. Then, we designed two metrics to measure the value of gene differential coexpression according to the change of local topological structures between different phase-specific networks. Finally, we conducted meta-analysis of gene differential coexpression based on the rank-product method. Experimental results demonstrated the feasibility and effectiveness of DCGN and the superior performance of DCGN over other popular disease-related gene selection methods through real-world gene expression data sets.
Collapse
|
85
|
|