1
|
Predicting Protein–Protein Interactions Based on Ensemble Learning-Based Model from Protein Sequence. BIOLOGY 2022; 11:biology11070995. [PMID: 36101379 PMCID: PMC9311754 DOI: 10.3390/biology11070995] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/24/2022] [Revised: 05/27/2022] [Accepted: 06/29/2022] [Indexed: 11/17/2022]
Abstract
Simple Summary Due to most traditional high-throughput experiments are tedious and laborious in identifying potential protein–protein interaction. To better improve accuracy prediction in protein–protein interactions. We proposed a novel computational method that can identify unknown protein–protein interaction efficiently and hope this method can provide a helpful idea and tool for proteomics research. Abstract Protein–protein interactions (PPIs) play an essential role in many biological cellular functions. However, it is still tedious and time-consuming to identify protein–protein interactions through traditional experimental methods. For this reason, it is imperative and necessary to develop a computational method for predicting PPIs efficiently. This paper explores a novel computational method for detecting PPIs from protein sequence, the approach which mainly adopts the feature extraction method: Locality Preserving Projections (LPP) and classifier: Rotation Forest (RF). Specifically, we first employ the Position Specific Scoring Matrix (PSSM), which can remain evolutionary information of biological for representing protein sequence efficiently. Then, the LPP descriptor is applied to extract feature vectors from PSSM. The feature vectors are fed into the RF to obtain the final results. The proposed method is applied to two datasets: Yeast and H. pylori, and obtained an average accuracy of 92.81% and 92.56%, respectively. We also compare it with K nearest neighbors (KNN) and support vector machine (SVM) to better evaluate the performance of the proposed method. In summary, all experimental results indicate that the proposed approach is stable and robust for predicting PPIs and promising to be a useful tool for proteomics research.
Collapse
|
2
|
Comprehensive assessment of differential ChIP-seq tools guides optimal algorithm selection. Genome Biol 2022; 23:119. [PMID: 35606795 PMCID: PMC9128273 DOI: 10.1186/s13059-022-02686-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2021] [Accepted: 05/09/2022] [Indexed: 11/21/2022] Open
Abstract
Background The analysis of chromatin binding patterns of proteins in different biological states is a main application of chromatin immunoprecipitation followed by sequencing (ChIP-seq). A large number of algorithms and computational tools for quantitative comparison of ChIP-seq datasets exist, but their performance is strongly dependent on the parameters of the biological system under investigation. Thus, a systematic assessment of available computational tools for differential ChIP-seq analysis is required to guide the optimal selection of analysis tools based on the present biological scenario. Results We created standardized reference datasets by in silico simulation and sub-sampling of genuine ChIP-seq data to represent different biological scenarios and binding profiles. Using these data, we evaluated the performance of 33 computational tools and approaches for differential ChIP-seq analysis. Tool performance was strongly dependent on peak size and shape as well as on the scenario of biological regulation. Conclusions Our analysis provides unbiased guidelines for the optimized choice of software tools in differential ChIP-seq analysis. Supplementary Information The online version contains supplementary material available at 10.1186/s13059-022-02686-y.
Collapse
|
3
|
St Germain C, Zhao H, Sinha V, Sanz LA, Chédin F, Barlow J. OUP accepted manuscript. Nucleic Acids Res 2022; 50:2051-2073. [PMID: 35100392 PMCID: PMC8887484 DOI: 10.1093/nar/gkac035] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2021] [Revised: 01/05/2022] [Accepted: 01/14/2022] [Indexed: 11/13/2022] Open
Abstract
Conflicts between transcription and replication machinery are a potent source of replication stress and genome instability; however, no technique currently exists to identify endogenous genomic locations prone to transcription–replication interactions. Here, we report a novel method to identify genomic loci prone to transcription–replication interactions termed transcription–replication immunoprecipitation on nascent DNA sequencing, TRIPn-Seq. TRIPn-Seq employs the sequential immunoprecipitation of RNA polymerase 2 phosphorylated at serine 5 (RNAP2s5) followed by enrichment of nascent DNA previously labeled with bromodeoxyuridine. Using TRIPn-Seq, we mapped 1009 unique transcription–replication interactions (TRIs) in mouse primary B cells characterized by a bimodal pattern of RNAP2s5, bidirectional transcription, an enrichment of RNA:DNA hybrids, and a high probability of forming G-quadruplexes. TRIs are highly enriched at transcription start sites and map to early replicating regions. TRIs exhibit enhanced Replication Protein A association and TRI-associated genes exhibit higher replication fork termination than control transcription start sites, two marks of replication stress. TRIs colocalize with double-strand DNA breaks, are enriched for deletions, and accumulate mutations in tumors. We propose that replication stress at TRIs induces mutations potentially contributing to age-related disease, as well as tumor formation and development.
Collapse
Affiliation(s)
- Commodore P St Germain
- Department of Microbiology and Molecular Genetics, University of California Davis, One Shields Avenue, Davis, CA 95616, USA
- School of Mathematics and Science, Solano Community College, 4000 Suisun Valley Road, Fairfield, CA 94534, USA
| | - Hongchang Zhao
- Department of Microbiology and Molecular Genetics, University of California Davis, One Shields Avenue, Davis, CA 95616, USA
| | - Vrishti Sinha
- Department of Microbiology and Molecular Genetics, University of California Davis, One Shields Avenue, Davis, CA 95616, USA
| | - Lionel A Sanz
- Department of Molecular and Cellular Biology, University of California Davis, One Shields Avenue, Davis, CA 95616, USA
| | - Frédéric Chédin
- Department of Molecular and Cellular Biology, University of California Davis, One Shields Avenue, Davis, CA 95616, USA
| | - Jacqueline H Barlow
- To whom correspondence should be addressed. Tel: +1 530 752 9529; Fax: +1 530 752 9014;
| |
Collapse
|
4
|
Taguchi YH, Turki T. Unsupervised tensor decomposition-based method to extract candidate transcription factors as histone modification bookmarks in post-mitotic transcriptional reactivation. PLoS One 2021; 16:e0251032. [PMID: 34032804 PMCID: PMC8148352 DOI: 10.1371/journal.pone.0251032] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2021] [Accepted: 04/17/2021] [Indexed: 11/25/2022] Open
Abstract
The histone group added to a gene sequence must be removed during mitosis to halt transcription during the DNA replication stage of the cell cycle. However, the detailed mechanism of this transcription regulation remains unclear. In particular, it is not realistic to reconstruct all appropriate histone modifications throughout the genome from scratch after mitosis. Thus, it is reasonable to assume that there might be a type of “bookmark” that retains the positions of histone modifications, which can be readily restored after mitosis. We developed a novel computational approach comprising tensor decomposition (TD)-based unsupervised feature extraction (FE) to identify transcription factors (TFs) that bind to genes associated with reactivated histone modifications as candidate histone bookmarks. To the best of our knowledge, this is the first application of TD-based unsupervised FE to the cell division context and phases pertaining to the cell cycle in general. The candidate TFs identified with this approach were functionally related to cell division, suggesting the suitability of this method and the potential of the identified TFs as bookmarks for histone modification during mitosis.
Collapse
Affiliation(s)
- Y-h. Taguchi
- Department of Physics, Chuo University, Tokyo, Japan
- * E-mail:
| | - Turki Turki
- Department of Computer Science, King Abdulaziz University, Jeddah, Saudi Arabia
| |
Collapse
|
5
|
Dickson BM, Tiedemann RL, Chomiak AA, Cornett EM, Vaughan RM, Rothbart SB. A physical basis for quantitative ChIP-sequencing. J Biol Chem 2020; 295:15826-15837. [PMID: 32994221 PMCID: PMC7681007 DOI: 10.1074/jbc.ra120.015353] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2020] [Revised: 09/09/2020] [Indexed: 01/28/2023] Open
Abstract
ChIP followed by next-generation sequencing (ChIP-Seq) is a key technique for mapping the distribution of histone posttranslational modifications (PTMs) and chromatin-associated factors across genomes. There is a perceived challenge to define a quantitative scale for ChIP-Seq data, and as such, several approaches making use of exogenous additives, or "spike-ins," have recently been developed. Herein, we report on the development of a quantitative, physical model defining ChIP-Seq. The quantitative scale on which ChIP-Seq results should be compared emerges from the model. To test the model and demonstrate the quantitative scale, we examine the impacts of an EZH2 inhibitor through the lens of ChIP-Seq. We report a significant increase in immunoprecipitation of presumed off-target histone PTMs after inhibitor treatment, a trend predicted by the model but contrary to spike-in-based indications. Our work also identifies a sensitivity issue in spike-in normalization that has not been considered in the literature, placing limitations on its utility and trustworthiness. We call our new approach the sans-spike-in method for quantitative ChIP-sequencing (siQ-ChIP). A number of changes in community practice of ChIP-Seq, data reporting, and analysis are motivated by this work.
Collapse
Affiliation(s)
- Bradley M Dickson
- Center for Epigenetics, Van Andel Research Institute, Grand Rapids, Michigan, USA.
| | - Rochelle L Tiedemann
- Center for Epigenetics, Van Andel Research Institute, Grand Rapids, Michigan, USA
| | - Alison A Chomiak
- Center for Epigenetics, Van Andel Research Institute, Grand Rapids, Michigan, USA
| | - Evan M Cornett
- Center for Epigenetics, Van Andel Research Institute, Grand Rapids, Michigan, USA; Department of Biochemistry and Molecular Biology, Indiana University School of Medicine, Indianapolis, Indiana, USA
| | - Robert M Vaughan
- Center for Epigenetics, Van Andel Research Institute, Grand Rapids, Michigan, USA
| | - Scott B Rothbart
- Center for Epigenetics, Van Andel Research Institute, Grand Rapids, Michigan, USA
| |
Collapse
|
6
|
Zhan XK, You ZH, Li LP, Li Y, Wang Z, Pan J. Using Random Forest Model Combined With Gabor Feature to Predict Protein-Protein Interaction From Protein Sequence. Evol Bioinform Online 2020; 16:1176934320934498. [PMID: 32655275 PMCID: PMC7328357 DOI: 10.1177/1176934320934498] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2020] [Accepted: 05/20/2020] [Indexed: 12/12/2022] Open
Abstract
Protein-protein interactions (PPIs) play a crucial role in the life cycles of
living cells. Thus, it is important to understand the underlying mechanisms of
PPIs. Although many high-throughput technologies have generated large amounts of
PPI data in different organisms, the experiments for detecting PPIs are still
costly and time-consuming. Therefore, novel computational methods are urgently
needed for predicting PPIs. For this reason, developing a new computational
method for predicting PPIs is drawing more and more attention. In this study, we
proposed a novel computational method based on texture feature of protein
sequence for predicting PPIs. Especially, the Gabor feature is used to extract
texture feature and protein evolutionary information from Position-Specific
Scoring Matrix, which is generated by Position-Specific Iterated Basic Local
Alignment Search Tool. Then, random forest–based classifiers are used to infer
the protein interactions. When performed on PPI data sets of yeast,
human, and Helicobacter pylori, we obtained good
results with average accuracies of 92.10%, 97.03%, and 86.45%, respectively. To
better evaluate the proposed method, we compared Gabor feature, Discrete Cosine
Transform, and Local Phase Quantization. Our results show that the proposed
method is both feasible and stable and the Gabor feature descriptor is reliable
in extracting protein sequence information. Furthermore, additional experiments
have been conducted to predict PPIs of other 4 species data sets. The promising
results indicate that our proposed method is both powerful and robust.
Collapse
Affiliation(s)
- Xin-Ke Zhan
- School of Information Engineering, Xijing University, Xi'an, China
| | - Zhu-Hong You
- School of Information Engineering, Xijing University, Xi'an, China
| | - Li-Ping Li
- School of Information Engineering, Xijing University, Xi'an, China
| | - Yang Li
- School of Information Engineering, Xijing University, Xi'an, China
| | - Zheng Wang
- School of Information Engineering, Xijing University, Xi'an, China
| | - Jie Pan
- School of Information Engineering, Xijing University, Xi'an, China
| |
Collapse
|
7
|
Li CC, Liu B. MotifCNN-fold: protein fold recognition based on fold-specific features extracted by motif-based convolutional neural networks. Brief Bioinform 2019; 21:2133-2141. [PMID: 31774907 DOI: 10.1093/bib/bbz133] [Citation(s) in RCA: 51] [Impact Index Per Article: 10.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2019] [Revised: 09/16/2019] [Accepted: 09/17/2019] [Indexed: 12/31/2022] Open
Abstract
Protein fold recognition is one of the most critical tasks to explore the structures and functions of the proteins based on their primary sequence information. The existing protein fold recognition approaches rely on features reflecting the characteristics of protein folds. However, the feature extraction methods are still the bottleneck of the performance improvement of these methods. In this paper, we proposed two new feature extraction methods called MotifCNN and MotifDCNN to extract more discriminative fold-specific features based on structural motif kernels to construct the motif-based convolutional neural networks (CNNs). The pairwise sequence similarity scores calculated based on fold-specific features are then fed into support vector machines to construct the predictor for fold recognition, and a predictor called MotifCNN-fold has been proposed. Experimental results on the benchmark dataset showed that MotifCNN-fold obviously outperformed all the other competing methods. In particular, the fold-specific features extracted by MotifCNN and MotifDCNN are more discriminative than the fold-specific features extracted by other deep learning techniques, indicating that incorporating the structural motifs into the CNN is able to capture the characteristics of protein folds.
Collapse
Affiliation(s)
- Chen-Chen Li
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing 100081, China.,School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen, Guangdong 518055, China
| | - Bin Liu
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing 100081, China.,Advanced Research Institute of Multidisciplinary Science, Beijing Institute of Technology, Beijing, China
| |
Collapse
|
8
|
Wang L, Wang HF, Liu SR, Yan X, Song KJ. Predicting Protein-Protein Interactions from Matrix-Based Protein Sequence Using Convolution Neural Network and Feature-Selective Rotation Forest. Sci Rep 2019; 9:9848. [PMID: 31285519 PMCID: PMC6614364 DOI: 10.1038/s41598-019-46369-4] [Citation(s) in RCA: 41] [Impact Index Per Article: 8.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2019] [Accepted: 06/10/2019] [Indexed: 01/09/2023] Open
Abstract
Protein is an essential component of the living organism. The prediction of protein-protein interactions (PPIs) has important implications for understanding the behavioral processes of life, preventing diseases, and developing new drugs. Although the development of high-throughput technology makes it possible to identify PPIs in large-scale biological experiments, it restricts the extensive use of experimental methods due to the constraints of time, cost, false positive rate and other conditions. Therefore, there is an urgent need for computational methods as a supplement to experimental methods to predict PPIs rapidly and accurately. In this paper, we propose a novel approach, namely CNN-FSRF, for predicting PPIs based on protein sequence by combining deep learning Convolution Neural Network (CNN) with Feature-Selective Rotation Forest (FSRF). The proposed method firstly converts the protein sequence into the Position-Specific Scoring Matrix (PSSM) containing biological evolution information, then uses CNN to objectively and efficiently extracts the deeply hidden features of the protein, and finally removes the redundant noise information by FSRF and gives the accurate prediction results. When performed on the PPIs datasets Yeast and Helicobacter pylori, CNN-FSRF achieved a prediction accuracy of 97.75% and 88.96%. To further evaluate the prediction performance, we compared CNN-FSRF with SVM and other existing methods. In addition, we also verified the performance of CNN-FSRF on independent datasets. Excellent experimental results indicate that CNN-FSRF can be used as a useful complement to biological experiments to identify protein interactions.
Collapse
Affiliation(s)
- Lei Wang
- College of Information Science and Engineering, Zaozhuang University, Zaozhuang, Shandong, 277100, P.R. China. .,Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Urumqi, 830011, P.R. China.
| | - Hai-Feng Wang
- College of Information Science and Engineering, Zaozhuang University, Zaozhuang, Shandong, 277100, P.R. China
| | - San-Rong Liu
- College of Information Science and Engineering, Zaozhuang University, Zaozhuang, Shandong, 277100, P.R. China
| | - Xin Yan
- School of Foreign Languages, Zaozhuang University, Zaozhuang, Shandong, 277100, P.R. China.
| | - Ke-Jian Song
- School of information engineering, JiangXi University of Science and Technology, Ganzhou, Jiangxi, 341000, P.R. China
| |
Collapse
|
9
|
Using Two-dimensional Principal Component Analysis and Rotation Forest for Prediction of Protein-Protein Interactions. Sci Rep 2018; 8:12874. [PMID: 30150728 PMCID: PMC6110764 DOI: 10.1038/s41598-018-30694-1] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2018] [Accepted: 07/17/2018] [Indexed: 11/09/2022] Open
Abstract
The interaction among proteins is essential in all life activities, and it is the basis of all the metabolic activities of the cells. By studying the protein-protein interactions (PPIs), people can better interpret the function of protein, decoding the phenomenon of life, especially in the design of new drugs with great practical value. Although many high-throughput techniques have been devised for large-scale detection of PPIs, these methods are still expensive and time-consuming. For this reason, there is a much-needed to develop computational methods for predicting PPIs at the entire proteome scale. In this article, we propose a new approach to predict PPIs using Rotation Forest (RF) classifier combine with matrix-based protein sequence. We apply the Position-Specific Scoring Matrix (PSSM), which contains biological evolution information, to represent protein sequences and extract the features through the two-dimensional Principal Component Analysis (2DPCA) algorithm. The descriptors are then sending to the rotation forest classifier for classification. We obtained 97.43% prediction accuracy with 94.92% sensitivity at the precision of 99.93% when the proposed method was applied to the PPIs data of yeast. To evaluate the performance of the proposed method, we compared it with other methods in the same dataset, and validate it on an independent datasets. The results obtained show that the proposed method is an appropriate and promising method for predicting PPIs.
Collapse
|
10
|
Liu Q, Bonneville R, Li T, Jin VX. Transcription factor-associated combinatorial epigenetic pattern reveals higher transcriptional activity of TCF7L2-regulated intragenic enhancers. BMC Genomics 2017; 18:375. [PMID: 28499350 PMCID: PMC5429574 DOI: 10.1186/s12864-017-3764-9] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2017] [Accepted: 05/03/2017] [Indexed: 01/24/2023] Open
Abstract
Background Recent studies have suggested that combinations of multiple epigenetic modifications are essential for controlling gene expression. Despite numerous computational approaches have been developed to decipher the combinatorial epigenetic patterns or “epigenetic code”, none of them has explicitly addressed the relationship between a specific transcription factor (TF) and the patterns. Methods Here, we developed a novel computational method, T-cep, for annotating chromatin states associated with a specific TF. T-cep is composed of three key consecutive modules: (i) Data preprocessing, (ii) HMM training, and (iii) Potential TF-states calling. Results We evaluated T-cep on a TCF7L2-omics data. Unexpectedly, our method has uncovered a novel set of TCF7L2-regulated intragenic enhancers missed by other software tools, where the associated genes exert the highest gene expression. We further used siRNA knockdown, Co-transfection, RT-qPCR and Luciferase Reporter Assay not only to validate the accuracy and efficiency of prediction by T-cep, but also to confirm the functionality of TCF7L2-regulated enhancers in both MCF7 and PANC1 cells respectively. Conclusions Our study for the first time at a genome-wide scale reveals the enhanced transcriptional activity of cell-type-specific TCF7L2 intragenic enhancers in regulating gene expression. Electronic supplementary material The online version of this article (doi:10.1186/s12864-017-3764-9) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Qi Liu
- Department of Molecular Medicine, University of Texas Health Science Center, 8403 Floyd Curl, San Antonio, TX, 78229, USA.,College of Life Science, Jilin University, Changchun, 130012, China
| | - Russell Bonneville
- Biomedical Sciences Graduate Program, The Ohio State University, Columbus, OH, 43210, USA
| | - Tianbao Li
- Department of Molecular Medicine, University of Texas Health Science Center, 8403 Floyd Curl, San Antonio, TX, 78229, USA.,College of Life Science, Jilin University, Changchun, 130012, China
| | - Victor X Jin
- Department of Molecular Medicine, University of Texas Health Science Center, 8403 Floyd Curl, San Antonio, TX, 78229, USA.
| |
Collapse
|
11
|
|
12
|
Wang L, You ZH, Chen X, Li JQ, Yan X, Zhang W, Huang YA. An ensemble approach for large-scale identification of protein- protein interactions using the alignments of multiple sequences. Oncotarget 2017; 8:5149-5159. [PMID: 28029645 PMCID: PMC5354898 DOI: 10.18632/oncotarget.14103] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2016] [Accepted: 11/15/2016] [Indexed: 11/25/2022] Open
Abstract
Protein-Protein Interactions (PPI) is not only the critical component of various biological processes in cells, but also the key to understand the mechanisms leading to healthy and diseased states in organisms. However, it is time-consuming and cost-intensive to identify the interactions among proteins using biological experiments. Hence, how to develop a more efficient computational method rapidly became an attractive topic in the post-genomic era. In this paper, we propose a novel method for inference of protein-protein interactions from protein amino acids sequences only. Specifically, protein amino acids sequence is firstly transformed into Position-Specific Scoring Matrix (PSSM) generated by multiple sequences alignments; then the Pseudo PSSM is used to extract feature descriptors. Finally, ensemble Rotation Forest (RF) learning system is trained to predict and recognize PPIs based solely on protein sequence feature. When performed the proposed method on the three benchmark data sets (Yeast, H. pylori, and independent dataset) for predicting PPIs, our method can achieve good average accuracies of 98.38%, 89.75%, and 96.25%, respectively. In order to further evaluate the prediction performance, we also compare the proposed method with other methods using same benchmark data sets. The experiment results demonstrate that the proposed method consistently outperforms other state-of-the-art method. Therefore, our method is effective and robust and can be taken as a useful tool in exploring and discovering new relationships between proteins. A web server is made publicly available at the URL http://202.119.201.126:8888/PsePSSM/ for academic use.
Collapse
Affiliation(s)
- Lei Wang
- School of Computer Science and Technology, China University of Mining and Technology, Xuzhou 221116, China
- College of Information Science and Engineering, Zaozhuang University, Zaozhuang, Shandong 277100, China
| | - Zhu-Hong You
- Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Science, Urumqi 830011, China
| | - Xing Chen
- School of Information and Electrical Engineering, China University of Mining and Technology, Xuzhou 221116, China
| | - Jian-Qiang Li
- College of Computer Science and Software Engineering, Shenzhen University, Shenzhen, Guangdong 518060, China
| | - Xin Yan
- School of Foreign Languages, Zaozhuang University, Zaozhuang, Shandong 277100, China
| | - Wei Zhang
- College of Information Science and Engineering, Zaozhuang University, Zaozhuang, Shandong 277100, China
| | - Yu-An Huang
- College of Computer Science and Software Engineering, Shenzhen University, Shenzhen, Guangdong 518060, China
| |
Collapse
|
13
|
Qin Z, Li B, Conneely KN, Wu H, Hu M, Ayyala D, Park Y, Jin VX, Zhang F, Zhang H, Li L, Lin S. Statistical challenges in analyzing methylation and long-range chromosomal interaction data. STATISTICS IN BIOSCIENCES 2016; 8:284-309. [PMID: 28008337 PMCID: PMC5167536 DOI: 10.1007/s12561-016-9145-0] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2015] [Revised: 02/22/2016] [Accepted: 02/22/2016] [Indexed: 12/21/2022]
Abstract
With the rapid development of high throughput technologies such as array and next generation sequencing (NGS), genome-wide, nucleotide-resolution epigenomic data are increasingly available. In recent years, there has been particular interest in data on DNA methylation and 3-dimensional (3D) chromosomal organization, which are believed to hold keys to understand biological mechanisms, such as transcription regulation, that are closely linked to human health and diseases. However, small sample size, complicated correlation structure, substantial noise, biases, and uncertainties, all present difficulties for performing statistical inference. In this review, we present an overview of the new technologies that are frequently utilized in studying DNA methylation and 3D chromosomal organization. We focus on reviewing recent developments in statistical methodologies designed for better interrogating epigenomic data, pointing out statistical challenges facing the field whenever appropriate.
Collapse
Affiliation(s)
- Zhaohui Qin
- Department of Biostatistics and Bioinformatics, Rollins School of Public Health, Emory University, Atlanta, GA 30322, USA
| | - Ben Li
- Department of Biostatistics and Bioinformatics, Rollins School of Public Health, Emory University, Atlanta, GA 30322, USA
| | - Karen N Conneely
- Department of Human Genetics, Emory University School of Medicine, Atlanta, GA 30322, USA
| | - Hao Wu
- Department of Biostatistics and Bioinformatics, Rollins School of Public Health, Emory University, Atlanta, GA 30322, USA
| | - Ming Hu
- Division of Biostatistics, Department of Population Health, New York University School of Medicine, New York, NY 10016, USA
| | - Deepak Ayyala
- Department of Statistics, The Ohio State University, Columbus, OH 43210, USA
| | - Yongseok Park
- Department of Biostatistics, University of Pittsburgh, Pittsburgh, PA 15261 USA
| | - Victor X Jin
- Department of Molecular Medicine, The University of Texas Health Science Center at San Antonio, San Antonio, TX 78229, USA
| | - Fangyuan Zhang
- Department of Mathematics & Statistics, Texas Tech University, Lubbock, TX 79409, USA
| | - Han Zhang
- Department of Statistics, The Ohio State University, Columbus, OH 43210, USA
| | - Li Li
- Department of Biostatistics and Bioinformatics, Rollins School of Public Health, Emory University, Atlanta, GA 30322, USA
| | - Shili Lin
- Department of Statistics, The Ohio State University, Columbus, OH 43210, USA
| |
Collapse
|
14
|
Ens-PPI: A Novel Ensemble Classifier for Predicting the Interactions of Proteins Using Autocovariance Transformation from PSSM. BIOMED RESEARCH INTERNATIONAL 2016; 2016:4563524. [PMID: 27437399 PMCID: PMC4942601 DOI: 10.1155/2016/4563524] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/09/2016] [Accepted: 05/08/2016] [Indexed: 11/17/2022]
Abstract
Protein-Protein Interactions (PPIs) play vital roles in most biological activities. Although the development of high-throughput biological technologies has generated considerable PPI data for various organisms, many problems are still far from being solved. A number of computational methods based on machine learning have been developed to facilitate the identification of novel PPIs. In this study, a novel predictor was designed using the Rotation Forest (RF) algorithm combined with Autocovariance (AC) features extracted from the Position-Specific Scoring Matrix (PSSM). More specifically, the PSSMs are generated using the information of protein amino acids sequence. Then, an effective sequence-based features representation, Autocovariance, is employed to extract features from PSSMs. Finally, the RF model is used as a classifier to distinguish between the interacting and noninteracting protein pairs. The proposed method achieves promising prediction performance when performed on the PPIs of Yeast, H. pylori, and independent datasets. The good results show that the proposed model is suitable for PPIs prediction and could also provide a useful supplementary tool for solving other bioinformatics problems.
Collapse
|
15
|
Chen J, Liu B, Huang D. Protein Remote Homology Detection Based on an Ensemble Learning Approach. BIOMED RESEARCH INTERNATIONAL 2016; 2016:5813645. [PMID: 27294123 PMCID: PMC4875977 DOI: 10.1155/2016/5813645] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/29/2016] [Accepted: 02/21/2016] [Indexed: 12/15/2022]
Abstract
Protein remote homology detection is one of the central problems in bioinformatics. Although some computational methods have been proposed, the problem is still far from being solved. In this paper, an ensemble classifier for protein remote homology detection, called SVM-Ensemble, was proposed with a weighted voting strategy. SVM-Ensemble combined three basic classifiers based on different feature spaces, including Kmer, ACC, and SC-PseAAC. These features consider the characteristics of proteins from various perspectives, incorporating both the sequence composition and the sequence-order information along the protein sequences. Experimental results on a widely used benchmark dataset showed that the proposed SVM-Ensemble can obviously improve the predictive performance for the protein remote homology detection. Moreover, it achieved the best performance and outperformed other state-of-the-art methods.
Collapse
Affiliation(s)
- Junjie Chen
- School of Computer Science and Technology, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, Guangdong 518055, China
| | - Bingquan Liu
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, Heilongjiang 150001, China
| | - Dong Huang
- School of Computer Science and Technology, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, Guangdong 518055, China
- Key Laboratory of Network Oriented Intelligent Computation, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, Guangdong 518055, China
| |
Collapse
|
16
|
Liu B, Fang L. WITHDRAWN: Identification of microRNA precursor based on gapped n-tuple structure status composition kernel. Comput Biol Chem 2016:S1476-9271(16)30036-6. [PMID: 26935400 DOI: 10.1016/j.compbiolchem.2016.02.010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2016] [Accepted: 02/01/2016] [Indexed: 10/22/2022]
Abstract
This article has been withdrawn at the request of the author(s) and/or editor. The Publisher apologizes for any inconvenience this may cause. The full Elsevier Policy on Article Withdrawal can be found at http://www.elsevier.com/locate/withdrawalpolicy.
Collapse
Affiliation(s)
- Bin Liu
- School of Computer Science and Technology, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, Guangdong 518055, China; Key Laboratory of Network Oriented Intelligent Computation, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, Guangdong 518055, China.
| | - Longyun Fang
- School of Computer Science and Technology, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, Guangdong 518055, China.
| |
Collapse
|
17
|
Steinhauser S, Kurzawa N, Eils R, Herrmann C. A comprehensive comparison of tools for differential ChIP-seq analysis. Brief Bioinform 2016; 17:953-966. [PMID: 26764273 PMCID: PMC5142015 DOI: 10.1093/bib/bbv110] [Citation(s) in RCA: 52] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2015] [Revised: 11/21/2015] [Indexed: 11/13/2022] Open
Abstract
ChIP-seq has become a widely adopted genomic assay in recent years to determine binding sites for transcription factors or enrichments for specific histone modifications. Beside detection of enriched or bound regions, an important question is to determine differences between conditions. While this is a common analysis for gene expression, for which a large number of computational approaches have been validated, the same question for ChIP-seq is particularly challenging owing to the complexity of ChIP-seq data in terms of noisiness and variability. Many different tools have been developed and published in recent years. However, a comprehensive comparison and review of these tools is still missing. Here, we have reviewed 14 tools, which have been developed to determine differential enrichment between two conditions. They differ in their algorithmic setups, and also in the range of applicability. Hence, we have benchmarked these tools on real data sets for transcription factors and histone modifications, as well as on simulated data sets to quantitatively evaluate their performance. Overall, there is a great variety in the type of signal detected by these tools with a surprisingly low level of agreement. Depending on the type of analysis performed, the choice of method will crucially impact the outcome.
Collapse
|
18
|
Cheng L, Lo LY, Tang NLS, Wang D, Leung KS. CrossNorm: a novel normalization strategy for microarray data in cancers. Sci Rep 2016; 6:18898. [PMID: 26732145 PMCID: PMC4702063 DOI: 10.1038/srep18898] [Citation(s) in RCA: 37] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2015] [Accepted: 11/27/2015] [Indexed: 01/17/2023] Open
Abstract
Normalization is essential to get rid of biases in microarray data for their accurate analysis. Existing normalization methods for microarray gene expression data commonly assume a similar global expression pattern among samples being studied. However, scenarios of global shifts in gene expressions are dominant in cancers, making the assumption invalid. To alleviate the problem, here we propose and develop a novel normalization strategy, Cross Normalization (CrossNorm), for microarray data with unbalanced transcript levels among samples. Conventional procedures, such as RMA and LOESS, arbitrarily flatten the difference between case and control groups leading to biased gene expression estimates. Noticeably, applying these methods under the strategy of CrossNorm, which makes use of the overall statistics of the original signals, the results showed significantly improved robustness and accuracy in estimating transcript level dynamics for a series of publicly available datasets, including titration experiment, simulated data, spike-in data and several real-life microarray datasets across various types of cancers. The results have important implications for the past and the future cancer studies based on microarray samples with non-negligible difference. Moreover, the strategy can also be applied to other sorts of high-throughput data as long as the experiments have global expression variations between conditions.
Collapse
Affiliation(s)
- Lixin Cheng
- Department of Computer Science and Engineering, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong
| | - Leung-Yau Lo
- Department of Computer Science and Engineering, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong
| | - Nelson L S Tang
- Department of Chemical Pathology, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong
| | - Dong Wang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Kwong-Sak Leung
- Department of Computer Science and Engineering, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong
| |
Collapse
|
19
|
Huang YA, You ZH, Gao X, Wong L, Wang L. Using Weighted Sparse Representation Model Combined with Discrete Cosine Transformation to Predict Protein-Protein Interactions from Protein Sequence. BIOMED RESEARCH INTERNATIONAL 2015; 2015:902198. [PMID: 26634213 PMCID: PMC4641304 DOI: 10.1155/2015/902198] [Citation(s) in RCA: 66] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/13/2015] [Accepted: 10/04/2015] [Indexed: 01/08/2023]
Abstract
Increasing demand for the knowledge about protein-protein interactions (PPIs) is promoting the development of methods for predicting protein interaction network. Although high-throughput technologies have generated considerable PPIs data for various organisms, it has inevitable drawbacks such as high cost, time consumption, and inherently high false positive rate. For this reason, computational methods are drawing more and more attention for predicting PPIs. In this study, we report a computational method for predicting PPIs using the information of protein sequences. The main improvements come from adopting a novel protein sequence representation by using discrete cosine transform (DCT) on substitution matrix representation (SMR) and from using weighted sparse representation based classifier (WSRC). When performing on the PPIs dataset of Yeast, Human, and H. pylori, we got excellent results with average accuracies as high as 96.28%, 96.30%, and 86.74%, respectively, significantly better than previous methods. Promising results obtained have proven that the proposed method is feasible, robust, and powerful. To further evaluate the proposed method, we compared it with the state-of-the-art support vector machine (SVM) classifier. Extensive experiments were also performed in which we used Yeast PPIs samples as training set to predict PPIs of other five species datasets.
Collapse
Affiliation(s)
- Yu-An Huang
- College of Computer Science and Software Engineering, Shenzhen University, Shenzhen, Guangdong 518060, China
| | - Zhu-Hong You
- School of Computer Science and Technology, China University of Mining and Technology, Xuzhou, Jiangsu 221116, China
| | - Xin Gao
- Department of Medical Imaging, Suzhou Institute of Biomedical Engineering and Technology, Suzhou, Jiangsu 215163, China
| | - Leon Wong
- College of Computer Science and Software Engineering, Shenzhen University, Shenzhen, Guangdong 518060, China
| | - Lirong Wang
- School of Electronic and Information Engineering, Soochow University, Suzhou, Jiangsu 215123, China
| |
Collapse
|
20
|
Veazey KJ, Parnell SE, Miranda RC, Golding MC. Dose-dependent alcohol-induced alterations in chromatin structure persist beyond the window of exposure and correlate with fetal alcohol syndrome birth defects. Epigenetics Chromatin 2015; 8:39. [PMID: 26421061 PMCID: PMC4587584 DOI: 10.1186/s13072-015-0031-7] [Citation(s) in RCA: 53] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2015] [Accepted: 09/15/2015] [Indexed: 01/16/2023] Open
Abstract
Background In recent years, we have come to recognize that a multitude of in utero exposures have the capacity to induce the development of congenital and metabolic defects. As most of these encounters manifest their effects beyond the window of exposure, deciphering the mechanisms of teratogenesis is incredibly difficult. For many agents, altered epigenetic programming has become suspect in transmitting the lasting signature of exposure leading to dysgenesis. However, while several chemicals can perturb chromatin structure acutely, for many agents (particularly alcohol) it remains unclear if these modifications represent transient responses to exposure or heritable lesions leading to pathology. Results Here, we report that mice encountering an acute exposure to alcohol on gestational Day-7 exhibit significant alterations in chromatin structure (histone 3 lysine 9 dimethylation, lysine 9 acetylation, and lysine 27 trimethylation) at Day-17, and that these changes strongly correlate with the development of craniofacial and central nervous system defects. Using a neural cortical stem cell model, we find that the epigenetic changes arising as a consequence of alcohol exposure are heavily dependent on the gene under investigation, the dose of alcohol encountered, and that the signatures arising acutely differ significantly from those observed after a 4-day recovery period. Importantly, the changes observed post-recovery are consistent with those modeled in vivo, and associate with alterations in transcripts encoding multiple homeobox genes directing neurogenesis. Unexpectedly, we do not observe a correlation between alcohol-induced changes in chromatin structure and alterations in transcription. Interestingly, the majority of epigenetic changes observed occur in marks associated with repressive chromatin structure, and we identify correlative disruptions in transcripts encoding Dnmt1, Eed, Ehmt2 (G9a), EzH2, Kdm1a, Kdm4c, Setdb1, Sod3, Tet1 and Uhrf1. Conclusions These observations suggest that the immediate and long-term impacts of alcohol exposure on chromatin structure are distinct, and hint at the existence of a possible coordinated
epigenetic response to ethanol during development. Collectively, our results indicate that alcohol-induced modifications to chromatin structure persist beyond the window of exposure, and likely contribute to the development of fetal alcohol syndrome-associated congenital abnormalities. Electronic supplementary material The online version of this article (doi:10.1186/s13072-015-0031-7) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Kylee J Veazey
- Room 338 VMA, 4466 TAMU, Department of Veterinary Physiology, College of Veterinary Medicine and Biomedical Sciences, Texas A&M University, College Station, TX 77843-4466 USA
| | - Scott E Parnell
- Bowles Center for Alcohol Studies and Department of Cell Biology and Physiology, School of Medicine, CB# 7178, University of North Carolina, Chapel Hill, NC 27599 USA
| | - Rajesh C Miranda
- Texas A&M Health Sciences Center, Texas A&M University, 8441 State Highway 47, Clinical Building 1, Suite 3100, Bryan, TX 77807 USA
| | - Michael C Golding
- Room 338 VMA, 4466 TAMU, Department of Veterinary Physiology, College of Veterinary Medicine and Biomedical Sciences, Texas A&M University, College Station, TX 77843-4466 USA
| |
Collapse
|
21
|
Survey of Programs Used to Detect Alternative Splicing Isoforms from Deep Sequencing Data In Silico. BIOMED RESEARCH INTERNATIONAL 2015; 2015:831352. [PMID: 26421304 PMCID: PMC4573434 DOI: 10.1155/2015/831352] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/26/2014] [Revised: 02/17/2015] [Accepted: 03/02/2015] [Indexed: 11/29/2022]
Abstract
Next-generation sequencing techniques have been rapidly emerging. However, the massive sequencing reads hide a great deal of unknown important information. Advances have enabled researchers to discover alternative splicing (AS) sites and isoforms using computational approaches instead of molecular experiments. Given the importance of AS for gene expression and protein diversity in eukaryotes, detecting alternative splicing and isoforms represents a hot topic in systems biology and epigenetics research. The computational methods applied to AS prediction have improved since the emergence of next-generation sequencing. In this study, we introduce state-of-the-art research on AS and then compare the research methods and software tools available for AS based on next-generation sequencing reads. Finally, we discuss the prospects of computational methods related to AS.
Collapse
|
22
|
Grzybowski AT, Chen Z, Ruthenburg AJ. Calibrating ChIP-Seq with Nucleosomal Internal Standards to Measure Histone Modification Density Genome Wide. Mol Cell 2015; 58:886-99. [PMID: 26004229 DOI: 10.1016/j.molcel.2015.04.022] [Citation(s) in RCA: 52] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2014] [Revised: 03/09/2015] [Accepted: 03/25/2015] [Indexed: 02/01/2023]
Abstract
Chromatin immunoprecipitation (ChIP) serves as a central experimental technique in epigenetics research, yet there are serious drawbacks: it is a relative measurement, which untethered to any external scale obscures fair comparison among experiments; it employs antibody reagents that have differing affinities and specificities for target epitopes that vary in abundance; and it is frequently not reproducible. To address these problems, we developed Internal Standard Calibrated ChIP (ICeChIP), wherein a native chromatin sample is spiked with nucleosomes reconstituted from recombinant and semisynthetic histones on barcoded DNA prior to immunoprecipitation. ICeChIP measures local histone modification densities on a biologically meaningful scale, enabling unbiased trans-experimental comparisons, and reveals unique insight into the nature of bivalent domains. This technology provides in situ assessment of the immunoprecipitation step, accommodating for many experimental pitfalls as well as providing a critical examination of untested assumptions inherent to conventional ChIP.
Collapse
Affiliation(s)
- Adrian T Grzybowski
- Department of Molecular Genetics and Cell Biology, The University of Chicago, 920 East 58th Street, Chicago, IL 60637, USA
| | - Zhonglei Chen
- Department of Chemistry, The University of Chicago, 920 East 58th Street, Chicago, IL 60637, USA
| | - Alexander J Ruthenburg
- Department of Molecular Genetics and Cell Biology, The University of Chicago, 920 East 58th Street, Chicago, IL 60637, USA; Department of Biochemistry and Molecular Biology, The University of Chicago, 920 East 58th Street, Chicago, IL 60637, USA.
| |
Collapse
|
23
|
Xu R, Zhou J, Wang H, He Y, Wang X, Liu B. Identifying DNA-binding proteins by combining support vector machine and PSSM distance transformation. BMC SYSTEMS BIOLOGY 2015; 9 Suppl 1:S10. [PMID: 25708928 PMCID: PMC4331676 DOI: 10.1186/1752-0509-9-s1-s10] [Citation(s) in RCA: 64] [Impact Index Per Article: 7.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
Abstract
BACKGROUND DNA-binding proteins play a pivotal role in various intra- and extra-cellular activities ranging from DNA replication to gene expression control. Identification of DNA-binding proteins is one of the major challenges in the field of genome annotation. There have been several computational methods proposed in the literature to deal with the DNA-binding protein identification. However, most of them can't provide an invaluable knowledge base for our understanding of DNA-protein interactions. RESULTS We firstly presented a new protein sequence encoding method called PSSM Distance Transformation, and then constructed a DNA-binding protein identification method (SVM-PSSM-DT) by combining PSSM Distance Transformation with support vector machine (SVM). First, the PSSM profiles are generated by using the PSI-BLAST program to search the non-redundant (NR) database. Next, the PSSM profiles are transformed into uniform numeric representations appropriately by distance transformation scheme. Lastly, the resulting uniform numeric representations are inputted into a SVM classifier for prediction. Thus whether a sequence can bind to DNA or not can be determined. In benchmark test on 525 DNA-binding and 550 non DNA-binding proteins using jackknife validation, the present model achieved an ACC of 79.96%, MCC of 0.622 and AUC of 86.50%. This performance is considerably better than most of the existing state-of-the-art predictive methods. When tested on a recently constructed independent dataset PDB186, SVM-PSSM-DT also achieved the best performance with ACC of 80.00%, MCC of 0.647 and AUC of 87.40%, and outperformed some existing state-of-the-art methods. CONCLUSIONS The experiment results demonstrate that PSSM Distance Transformation is an available protein sequence encoding method and SVM-PSSM-DT is a useful tool for identifying the DNA-binding proteins. A user-friendly web-server of SVM-PSSM-DT was constructed, which is freely accessible to the public at the web-site on http://bioinformatics.hitsz.edu.cn/PSSM-DT/.
Collapse
Affiliation(s)
- Ruifeng Xu
- School of Computer Science and Technology, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, Guangdong, China
- Key Laboratory of Network Oriented Intelligent Computation, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, Guangdong, China
| | - Jiyun Zhou
- School of Computer Science and Technology, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, Guangdong, China
| | - Hongpeng Wang
- School of Computer Science and Technology, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, Guangdong, China
| | - Yulan He
- School of Engineering & Applied Science, Aston University, Birmingham, UK
| | - Xiaolong Wang
- School of Computer Science and Technology, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, Guangdong, China
- Key Laboratory of Network Oriented Intelligent Computation, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, Guangdong, China
| | - Bin Liu
- School of Computer Science and Technology, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, Guangdong, China
- Key Laboratory of Network Oriented Intelligent Computation, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, Guangdong, China
| |
Collapse
|
24
|
Ganjtabesh M, Montaseri S, Zare-Mirakabad F. Using temperature effects to predict the interactions between two RNAs. J Theor Biol 2015; 364:98-102. [PMID: 25218429 DOI: 10.1016/j.jtbi.2014.09.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2014] [Revised: 08/30/2014] [Accepted: 09/02/2014] [Indexed: 10/24/2022]
Abstract
MOTIVATION Interaction of two RNA molecules is considered as an important factor that regulates gene expression post-transcriptional process. Most of the ncRNAs prevent the translation of their target mRNA(s) by forming stable bindings with them. Although several computational methods have been proposed to predict the interactions between two RNAs, none of them can produce reliable and accurate results. RESULTS In this paper, a new approach entitled tempRNAs is presented to accurately predict interaction structure between two RNAs based on a gradual temperature decrease. For each specified temperature, our algorithm contains two main steps. First, the secondary structure of each RNA is determined with respect to the previous base pairs as constraints. Second, two RNAs are concatenated and then the interaction between them is calculated according to the previous base pairs. The secondary structures are determined based on minimum free energy model. The proposed algorithm is evaluated for a set of known interacting RNA pairs. The results show the higher accuracy of the proposed method in comparison to the other state-of-the-art algorithms, namely inRNAs and RactIP.
Collapse
Affiliation(s)
- Mohammad Ganjtabesh
- Department of Computer Science, School of Mathematics, Statistics, and Computer Science, University of Tehran, Tehran, Iran.
| | - Soheila Montaseri
- Department of Computer Science, School of Mathematics, Statistics, and Computer Science, University of Tehran, Tehran, Iran.
| | - Fatemeh Zare-Mirakabad
- Department of Computer Science, Faculty of Mathematics and Computer Science, Amirkabir University of Technology, Tehran, Iran.
| |
Collapse
|
25
|
Orlando DA, Chen MW, Brown VE, Solanki S, Choi YJ, Olson ER, Fritz CC, Bradner JE, Guenther MG. Quantitative ChIP-Seq normalization reveals global modulation of the epigenome. Cell Rep 2014; 9:1163-70. [PMID: 25437568 DOI: 10.1016/j.celrep.2014.10.018] [Citation(s) in RCA: 332] [Impact Index Per Article: 33.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2014] [Revised: 09/24/2014] [Accepted: 10/09/2014] [Indexed: 10/24/2022] Open
Abstract
Epigenomic profiling by chromatin immunoprecipitation coupled with massively parallel DNA sequencing (ChIP-seq) is a prevailing methodology used to investigate chromatin-based regulation in biological systems such as human disease, but the lack of an empirical methodology to enable normalization among experiments has limited the precision and usefulness of this technique. Here, we describe a method called ChIP with reference exogenous genome (ChIP-Rx) that allows one to perform genome-wide quantitative comparisons of histone modification status across cell populations using defined quantities of a reference epigenome. ChIP-Rx enables the discovery and quantification of dynamic epigenomic profiles across mammalian cells that would otherwise remain hidden using traditional normalization methods. We demonstrate the utility of this method for measuring epigenomic changes following chemical perturbations and show how reference normalization of ChIP-seq experiments enables the discovery of disease-relevant changes in histone modification occupancy.
Collapse
Affiliation(s)
- David A Orlando
- Syros Pharmaceuticals, 480 Arsenal Street, Watertown, MA 02472, USA.
| | - Mei Wei Chen
- Syros Pharmaceuticals, 480 Arsenal Street, Watertown, MA 02472, USA
| | - Victoria E Brown
- Syros Pharmaceuticals, 480 Arsenal Street, Watertown, MA 02472, USA
| | | | - Yoon J Choi
- Syros Pharmaceuticals, 480 Arsenal Street, Watertown, MA 02472, USA
| | - Eric R Olson
- Syros Pharmaceuticals, 480 Arsenal Street, Watertown, MA 02472, USA
| | | | - James E Bradner
- Department of Medical Oncology, Dana-Farber Cancer Institute; Department of Medicine, Harvard Medical School, Boston, MA 02115, USA
| | | |
Collapse
|
26
|
Liu B, Xu J, Fan S, Xu R, Zhou J, Wang X. PseDNA-Pro: DNA-Binding Protein Identification by Combining Chou’s PseAAC and Physicochemical Distance Transformation. Mol Inform 2014; 34:8-17. [DOI: 10.1002/minf.201400025] [Citation(s) in RCA: 135] [Impact Index Per Article: 13.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2014] [Accepted: 05/27/2014] [Indexed: 11/06/2022]
|
27
|
Ibrahim MM, Lacadie SA, Ohler U. JAMM: a peak finder for joint analysis of NGS replicates. ACTA ACUST UNITED AC 2014; 31:48-55. [PMID: 25223640 DOI: 10.1093/bioinformatics/btu568] [Citation(s) in RCA: 47] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023]
Abstract
MOTIVATION Although peak finding in next-generation sequencing (NGS) datasets has been addressed extensively, there is no consensus on how to analyze and process biological replicates. Furthermore, most peak finders do not focus on accurate determination of enrichment site widths and are not widely applicable to different types of datasets. RESULTS We developed JAMM (Joint Analysis of NGS replicates via Mixture Model clustering): a peak finder that can integrate information from biological replicates, determine enrichment site widths accurately and resolve neighboring narrow peaks. JAMM is a universal peak finder that is applicable to different types of datasets. We show that JAMM is among the best performing peak finders in terms of site detection accuracy and in terms of accurate determination of enrichment sites widths. In addition, JAMM's replicate integration improves peak spatial resolution, sorting and peak finding accuracy. AVAILABILITY AND IMPLEMENTATION JAMM is available for free and can run on Linux machines through the command line: http://code.google.com/p/jamm-peak-finder.
Collapse
Affiliation(s)
- Mahmoud M Ibrahim
- Department of Biology, Humboldt University, Invalidenstrasse 43, D-10115 Berlin, Germany and The Berlin Institute for Medical Systems Biology, Max Delbrück Center for Molecular Medicine Berlin-Buch, Robert Rössle Str. 10, Berlin 13125, Germany Department of Biology, Humboldt University, Invalidenstrasse 43, D-10115 Berlin, Germany and The Berlin Institute for Medical Systems Biology, Max Delbrück Center for Molecular Medicine Berlin-Buch, Robert Rössle Str. 10, Berlin 13125, Germany
| | - Scott A Lacadie
- Department of Biology, Humboldt University, Invalidenstrasse 43, D-10115 Berlin, Germany and The Berlin Institute for Medical Systems Biology, Max Delbrück Center for Molecular Medicine Berlin-Buch, Robert Rössle Str. 10, Berlin 13125, Germany
| | - Uwe Ohler
- Department of Biology, Humboldt University, Invalidenstrasse 43, D-10115 Berlin, Germany and The Berlin Institute for Medical Systems Biology, Max Delbrück Center for Molecular Medicine Berlin-Buch, Robert Rössle Str. 10, Berlin 13125, Germany Department of Biology, Humboldt University, Invalidenstrasse 43, D-10115 Berlin, Germany and The Berlin Institute for Medical Systems Biology, Max Delbrück Center for Molecular Medicine Berlin-Buch, Robert Rössle Str. 10, Berlin 13125, Germany
| |
Collapse
|
28
|
Protein binding site prediction by combining hidden Markov support vector machine and profile-based propensities. ScientificWorldJournal 2014; 2014:464093. [PMID: 25133234 PMCID: PMC4122092 DOI: 10.1155/2014/464093] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2014] [Accepted: 07/01/2014] [Indexed: 11/22/2022] Open
Abstract
Identification of protein binding sites is critical for studying the function of the proteins. In this paper, we proposed a method for protein binding site prediction, which combined the order profile propensities and hidden Markov support vector machine (HM-SVM). This method employed the sequential labeling technique to the field of protein binding site prediction. The input features of HM-SVM include the profile-based propensities, the Position-Specific Score Matrix (PSSM), and Accessible Surface Area (ASA). When tested on different data sets, the proposed method showed promising results, and outperformed some closely relative methods by more than 10% in terms of AUC.
Collapse
|
29
|
enDNA-Prot: identification of DNA-binding proteins by applying ensemble learning. BIOMED RESEARCH INTERNATIONAL 2014; 2014:294279. [PMID: 24977146 PMCID: PMC4058174 DOI: 10.1155/2014/294279] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/28/2014] [Revised: 05/05/2014] [Accepted: 05/05/2014] [Indexed: 12/03/2022]
Abstract
DNA-binding proteins are crucial for various cellular processes, such as recognition of specific nucleotide, regulation of transcription, and regulation of gene expression. Developing an effective model for identifying DNA-binding proteins is an urgent research problem. Up to now, many methods have been proposed, but most of them focus on only one classifier and cannot make full use of the large number of negative samples to improve predicting performance. This study proposed a predictor called enDNA-Prot for DNA-binding protein identification by employing the ensemble learning technique. Experiential results showed that enDNA-Prot was comparable with DNA-Prot and outperformed DNAbinder and iDNA-Prot with performance improvement in the range of 3.97–9.52% in ACC and 0.08–0.19 in MCC. Furthermore, when the benchmark dataset was expanded with negative samples, the performance of enDNA-Prot outperformed the three existing methods by 2.83–16.63% in terms of ACC and 0.02–0.16 in terms of MCC. It indicated that enDNA-Prot is an effective method for DNA-binding protein identification and expanding training dataset with negative samples can improve its performance. For the convenience of the vast majority of experimental scientists, we developed a user-friendly web-server for enDNA-Prot which is freely accessible to the public.
Collapse
|
30
|
Zhang B, Huang Y, McDermott JE, Posey RH, Xu H, Zhao Z. Interdisciplinary dialogue for education, collaboration, and innovation: intelligent Biology and Medicine in and beyond 2013. BMC Genomics 2013; 14 Suppl 8:S1. [PMID: 24564388 PMCID: PMC4042234 DOI: 10.1186/1471-2164-14-s8-s1] [Citation(s) in RCA: 46] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022] Open
Abstract
The 2013 International Conference on Intelligent Biology and Medicine (ICIBM 2013) was held on August 11-13, 2013 in Nashville, Tennessee, USA. The conference included six scientific sessions, two tutorial sessions, one workshop, two poster sessions, and four keynote presentations that covered cutting-edge research topics in bioinformatics, systems biology, computational medicine, and intelligent computing. Here, we present a summary of the conference and an editorial report of the supplements to BMC Genomics and BMC Systems Biology that include 19 research papers selected from ICIBM 2013.
Collapse
|