1
|
P P, Riyaz A, Choudhury A, Choudhury PR, Pradhan N, Singh A, Nakul M, Dudeja C, Yadav A, Nath SK, Khanna V, Sharma T, Pradhan G, Takkar S, Rawal K. DNASCANNER v2: A Web-Based Tool to Analyze the Characteristic Properties of Nucleotide Sequences. J Comput Biol 2024; 31:651-669. [PMID: 38662479 DOI: 10.1089/cmb.2023.0227] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/18/2024] Open
Abstract
Throughout the process of evolution, DNA undergoes the accumulation of distinct mutations, which can often result in highly organized patterns that serve various essential biological functions. These patterns encompass various genomic elements and provide valuable insights into the regulatory and functional aspects of DNA. The physicochemical, mechanical, thermodynamic, and structural properties of DNA sequences play a crucial role in the formation of specific patterns. These properties contribute to the three-dimensional structure of DNA and influence their interactions with proteins, regulatory elements, and other molecules. In this study, we introduce DNASCANNER v2, an advanced version of our previously published algorithm DNASCANNER for analyzing DNA properties. The current tool is built using the FLASK framework in Python language. Featuring a user-friendly interface tailored for nonspecialized researchers, it offers an extensive analysis of 158 DNA properties, including mono/di/trinucleotide frequencies, structural, physicochemical, thermodynamics, and mechanical properties of DNA sequences. The tool provides downloadable results and offers interactive plots for easy interpretation and comparison between different features. We also demonstrate the utility of DNASCANNER v2 in analyzing splice-site junctions, casposon insertion sequences, and transposon insertion sites (TIS) within the bacterial and human genomes, respectively. We also developed a deep learning module for the prediction of potential TIS in a given nucleotide sequence. In the future, we aim to optimize the performance of this prediction model through extensive training on larger data sets.
Collapse
Affiliation(s)
- Preeti P
- Center for Computational Biology and Bioinformatics, Amity Institute of Biotechnology, Amity University, Noida, Uttar Pradesh, India
| | - Azeen Riyaz
- Center for Computational Biology and Bioinformatics, Amity Institute of Biotechnology, Amity University, Noida, Uttar Pradesh, India
| | - Alakto Choudhury
- Center for Computational Biology and Bioinformatics, Amity Institute of Biotechnology, Amity University, Noida, Uttar Pradesh, India
| | - Priyanka Ray Choudhury
- Center for Computational Biology and Bioinformatics, Amity Institute of Biotechnology, Amity University, Noida, Uttar Pradesh, India
| | - Nischal Pradhan
- Center for Computational Biology and Bioinformatics, Amity Institute of Biotechnology, Amity University, Noida, Uttar Pradesh, India
| | - Abhishek Singh
- Center for Computational Biology and Bioinformatics, Amity Institute of Biotechnology, Amity University, Noida, Uttar Pradesh, India
| | - Mihir Nakul
- Center for Computational Biology and Bioinformatics, Amity Institute of Biotechnology, Amity University, Noida, Uttar Pradesh, India
| | - Chhavi Dudeja
- Center for Computational Biology and Bioinformatics, Amity Institute of Biotechnology, Amity University, Noida, Uttar Pradesh, India
| | - Abhijeet Yadav
- Center for Computational Biology and Bioinformatics, Amity Institute of Biotechnology, Amity University, Noida, Uttar Pradesh, India
| | - Swarsat Kaushik Nath
- Center for Computational Biology and Bioinformatics, Amity Institute of Biotechnology, Amity University, Noida, Uttar Pradesh, India
| | - Vrinda Khanna
- Center for Computational Biology and Bioinformatics, Amity Institute of Biotechnology, Amity University, Noida, Uttar Pradesh, India
| | - Trapti Sharma
- Center for Computational Biology and Bioinformatics, Amity Institute of Biotechnology, Amity University, Noida, Uttar Pradesh, India
| | - Gayatri Pradhan
- Center for Computational Biology and Bioinformatics, Amity Institute of Biotechnology, Amity University, Noida, Uttar Pradesh, India
| | - Simran Takkar
- Center for Computational Biology and Bioinformatics, Amity Institute of Biotechnology, Amity University, Noida, Uttar Pradesh, India
| | - Kamal Rawal
- Center for Computational Biology and Bioinformatics, Amity Institute of Biotechnology, Amity University, Noida, Uttar Pradesh, India
| |
Collapse
|
2
|
Santoni D. The impact of flanking sequence features on DNA CpG methylation. Comput Biol Chem 2021; 92:107480. [PMID: 33826970 DOI: 10.1016/j.compbiolchem.2021.107480] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2020] [Revised: 01/18/2021] [Accepted: 03/24/2021] [Indexed: 10/21/2022]
Abstract
Epigenetics and DNA methylation play a pivotal role in many processes of the cell and we often observe that an aberrant methylation pattern characterizes pathologies. In this work we investigate the role that the flanking sequences of CGs play in the methylation process in human. We built four different CG datasets: methylated, unmethylated, and two randomly extracted ones. We evaluated features associated to the flanking sequences of those CG sets, for different size around the CG, through five measures accounting for different aspects of sequence composition complexity and structure. The analysis performed through those measures revealed evident different behaviors between methylated and unmethylated probe sets. Major differences were observed for GC content and CG dinucleotide frequency in a window size of 300-400 bp and for CG self-attraction in 3K bp. It is remarkable as the effect of methylated CG lasts much more than expected far from the CG.
Collapse
Affiliation(s)
- Daniele Santoni
- Institute for System Analysis and Computer Science "Antonio Ruberti", National Research Council of Italy, Via dei Taurini 19, 00185 Rome, Italy.
| |
Collapse
|
3
|
Wang M, Ngo V, Wang W. Deciphering the genetic code of DNA methylation. Brief Bioinform 2021; 22:6082840. [PMID: 33432324 DOI: 10.1093/bib/bbaa424] [Citation(s) in RCA: 24] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2020] [Revised: 12/03/2020] [Accepted: 12/22/2020] [Indexed: 12/17/2022] Open
Abstract
DNA methylation plays crucial roles in many biological processes and abnormal DNA methylation patterns are often observed in diseases. Recent studies have shed light on cis-acting DNA elements that regulate locus-specific DNA methylation, which involves transcription factors, histone modification and DNA secondary structures. In addition, several recent studies have surveyed DNA motifs that regulate DNA methylation and suggest potential applications in diagnosis and prognosis. Here, we discuss the current biological foundation for the cis-acting genetic code that regulates DNA methylation. We review the computational models that predict DNA methylation with genetic features and discuss the biological insights revealed from these models. We also provide an in-depth discussion on how to leverage such knowledge in clinical applications, particularly in the context of liquid biopsy for early cancer diagnosis and treatment.
Collapse
Affiliation(s)
- Mengchi Wang
- Bioinformatics and Systems Biology at University of California, USA
| | - Vu Ngo
- Bioinformatics and Systems Biology at University of California, USA
| | - Wei Wang
- Bioinformatics and Systems Biology, Department of Chemistry and Biochemistry, and Department of Cellular and Molecular Medicine at University of California, USA
| |
Collapse
|
4
|
Chou KC. An Insightful 10-year Recollection Since the Emergence of the 5-steps Rule. Curr Pharm Des 2020; 25:4223-4234. [PMID: 31782354 DOI: 10.2174/1381612825666191129164042] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2019] [Accepted: 11/25/2019] [Indexed: 11/22/2022]
Abstract
OBJECTIVE One of the most challenging and also the most difficult problems is how to formulate a biological sequence with a vector but considerably keep its sequence order information. METHODS To address such a problem, the approach of Pseudo Amino Acid Components or PseAAC has been developed. RESULTS AND CONCLUSION It has become increasingly clear via the 10-year recollection that the aforementioned proposal has been indeed very powerful.
Collapse
Affiliation(s)
- Kuo-Chen Chou
- Gordon Life Science Institute, Boston, Massachusetts 02478, United States.,Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| |
Collapse
|
5
|
Identification of CpG Islands in DNA Sequences Using Short-Time Fourier Transform. Interdiscip Sci 2020; 12:355-367. [PMID: 32394270 DOI: 10.1007/s12539-020-00370-y] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2019] [Revised: 04/07/2020] [Accepted: 04/17/2020] [Indexed: 10/24/2022]
Abstract
In the era of big data analysis, genomics data analysis is highly needed to extract the hidden information present in the DNA sequences. One of the important hidden features present in the DNA sequences is CpG islands. CpG Islands are important as these are used as gene markers and also these are associated with cancer etc. Therefore, various methods have been reported for the identification of CpG islands in DNA sequences. The key contributions of this work are (i) extraction of the periodicity feature associated with CpG islands using Short-time Fourier transform (ii) a short-time Fourier transform-based algorithm has been proposed for the identification of CpG Islands in DNA sequences. The results of the proposed algorithm amply demonstrate its better performance as compared to other reported methods on CpG islands detection.
Collapse
|
6
|
Tahir RA, Zheng D, Nazir A, Qing H. A review of computational algorithms for CpG islands detection. J Biosci 2019. [DOI: 10.1007/s12038-019-9961-8] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
|
7
|
Nomura Y, Hara ES, Yoshioka Y, Nguyen HT, Nosho S, Komori T, Ishibashi K, Oohashi T, Ono M, Kuboki T. DNA Methylation-Based Regulation of Human Bone Marrow-Derived Mesenchymal Stem/Progenitor Cell Chondrogenic Differentiation. Cells Tissues Organs 2019; 207:115-126. [PMID: 31574516 DOI: 10.1159/000502885] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2019] [Accepted: 08/22/2019] [Indexed: 11/19/2022] Open
Abstract
Stem cells have essential applications in in vitro tissue engineering or regenerative medicine. However, there is still a need to understand more deeply the mechanisms of stem cell differentiation and to optimize the methods to control stem cell function. In this study, we first investigated the activity of DNA methyltransferases (DNMTs) during chondrogenic differentiation of human bone marrow-derived mesenchymal stem/progenitor cells (hBMSCs) and found that DNMT3A and DNMT3B were markedly upregulated during hBMSC chondrogenic differentiation. In an attempt to understand the effect of DNMT3A and DNMT3B on the chondrogenic differentiation of hBMSCs, we transiently transfected the cells with expression vectors for the two enzymes. Interestingly, DNMT3A overexpression strongly enhanced the chondrogenesis of hBMSCs, by increasing the gene expression of the mature chondrocyte marker, collagen type II, more than 200-fold. Analysis of the methylation condition in the cells revealed that DNMT3A and DNMT3B methylated the promoter sequence of early stem cell markers, NANOG and POU5F1(OCT-4). Conversely, the suppression of chondrogenic differentiation and the increase in stem cell markers of hBMSCs were obtained by chemical stimulation with the demethylating agent, 5-azacitidine. Loss-of-function assays with siRNAs targeting DNMT3A also significantly suppressed the chondrogenic differentiation of hBMSCs. Together, these results not only show the critical roles of DNMTs in regulating the chondrogenic differentiation of hBMSCs, but also suggest that manipulation of DNMT activity can be important tools to enhance the differentiation of hBMSCs towards chondrogenesis for potential application in cartilage tissue engineering or cartilage regeneration.
Collapse
Affiliation(s)
- Yu Nomura
- Department of Oral Rehabilitation and Regenerative Medicine, Okayama University Graduate School of Medicine, Dentistry and Pharmaceutical Sciences, Okayama, Japan
| | - Emilio Satoshi Hara
- Department of Biomaterials, Okayama University Graduate School of Medicine, Dentistry and Pharmaceutical Sciences, Okayama, Japan,
| | - Yuya Yoshioka
- Department of Oral Rehabilitation and Regenerative Medicine, Okayama University Graduate School of Medicine, Dentistry and Pharmaceutical Sciences, Okayama, Japan
| | - Há Thi Nguyen
- Department of Oral Rehabilitation and Regenerative Medicine, Okayama University Graduate School of Medicine, Dentistry and Pharmaceutical Sciences, Okayama, Japan
- Department of Molecular Biology and Biochemistry, Okayama University Graduate School of Medicine, Dentistry and Pharmaceutical Sciences, Okayama, Japan
| | - Shuji Nosho
- Department of Oral Rehabilitation and Regenerative Medicine, Okayama University Graduate School of Medicine, Dentistry and Pharmaceutical Sciences, Okayama, Japan
| | - Taishi Komori
- Department of Oral Rehabilitation and Regenerative Medicine, Okayama University Graduate School of Medicine, Dentistry and Pharmaceutical Sciences, Okayama, Japan
| | - Kei Ishibashi
- Department of Oral Rehabilitation and Regenerative Medicine, Okayama University Graduate School of Medicine, Dentistry and Pharmaceutical Sciences, Okayama, Japan
- Department of Molecular Biology and Biochemistry, Okayama University Graduate School of Medicine, Dentistry and Pharmaceutical Sciences, Okayama, Japan
| | - Toshitaka Oohashi
- Department of Molecular Biology and Biochemistry, Okayama University Graduate School of Medicine, Dentistry and Pharmaceutical Sciences, Okayama, Japan
| | - Mitsuaki Ono
- Department of Molecular Biology and Biochemistry, Okayama University Graduate School of Medicine, Dentistry and Pharmaceutical Sciences, Okayama, Japan
| | - Takuo Kuboki
- Department of Oral Rehabilitation and Regenerative Medicine, Okayama University Graduate School of Medicine, Dentistry and Pharmaceutical Sciences, Okayama, Japan
| |
Collapse
|
8
|
Wang M, Zhang K, Ngo V, Liu C, Fan S, Whitaker JW, Chen Y, Ai R, Chen Z, Wang J, Zheng L, Wang W. Identification of DNA motifs that regulate DNA methylation. Nucleic Acids Res 2019; 47:6753-6768. [PMID: 31334813 PMCID: PMC6649826 DOI: 10.1093/nar/gkz483] [Citation(s) in RCA: 33] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2019] [Revised: 05/14/2019] [Accepted: 06/20/2019] [Indexed: 01/11/2023] Open
Abstract
DNA methylation is an important epigenetic mark but how its locus-specificity is decided in relation to DNA sequence is not fully understood. Here, we have analyzed 34 diverse whole-genome bisulfite sequencing datasets in human and identified 313 motifs, including 92 and 221 associated with methylation (methylation motifs, MMs) and unmethylation (unmethylation motifs, UMs), respectively. The functionality of these motifs is supported by multiple lines of evidence. First, the methylation levels at the MM and UM motifs are respectively higher and lower than the genomic background. Second, these motifs are enriched at the binding sites of methylation modifying enzymes including DNMT3A and TET1, indicating their possible roles of recruiting these enzymes. Third, these motifs significantly overlap with "somatic QTLs" (quantitative trait loci) of methylation and expression. Fourth, disruption of these motifs by mutation is associated with significantly altered methylation level of the CpGs in the neighbor regions. Furthermore, these motifs together with somatic mutations are predictive of cancer subtypes and patient survival. We revealed some of these motifs were also associated with histone modifications, suggesting a possible interplay between the two types of epigenetic modifications. We also found some motifs form feed forward loops to contribute to DNA methylation dynamics.
Collapse
Affiliation(s)
- Mengchi Wang
- Bioinformatics and Systems Biology Graduate Program, University of California, San Diego, La Jolla, CA, USA
| | - Kai Zhang
- Bioinformatics and Systems Biology Graduate Program, University of California, San Diego, La Jolla, CA, USA
| | - Vu Ngo
- Bioinformatics and Systems Biology Graduate Program, University of California, San Diego, La Jolla, CA, USA
| | - Chengyu Liu
- Department of Chemistry and Biochemistry, University of California, San Diego, La Jolla, CA, USA
| | - Shicai Fan
- Department of Chemistry and Biochemistry, University of California, San Diego, La Jolla, CA, USA
- School of Automation Engineering, University of Electronic Science and Technology of China, Chengdu, China
| | - John W Whitaker
- Department of Genomics, Denovo Biopharma, 10240 Science Center Dr., San Diego, CA, USA
| | - Yue Chen
- Department of Chemistry and Biochemistry, University of California, San Diego, La Jolla, CA, USA
- School of Life Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Rizi Ai
- Department of Chemistry and Biochemistry, University of California, San Diego, La Jolla, CA, USA
| | - Zhao Chen
- Department of Chemistry and Biochemistry, University of California, San Diego, La Jolla, CA, USA
| | - Jun Wang
- Department of Chemistry and Biochemistry, University of California, San Diego, La Jolla, CA, USA
| | - Lina Zheng
- Bioinformatics and Systems Biology Graduate Program, University of California, San Diego, La Jolla, CA, USA
| | - Wei Wang
- Bioinformatics and Systems Biology Graduate Program, University of California, San Diego, La Jolla, CA, USA
- Department of Chemistry and Biochemistry, University of California, San Diego, La Jolla, CA, USA
- Department of Cellular and Molecular Medicine, University of California, San Diego, La Jolla, CA, USA
| |
Collapse
|
9
|
Zhang L, Kong L. iRSpot-PDI: Identification of recombination spots by incorporating dinucleotide property diversity information into Chou's pseudo components. Genomics 2019; 111:457-464. [DOI: 10.1016/j.ygeno.2018.03.003] [Citation(s) in RCA: 27] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2017] [Revised: 02/27/2018] [Accepted: 03/03/2018] [Indexed: 12/11/2022]
|
10
|
Sabooh MF, Iqbal N, Khan M, Khan M, Maqbool HF. Identifying 5-methylcytosine sites in RNA sequence using composite encoding feature into Chou's PseKNC. J Theor Biol 2018; 452:1-9. [PMID: 29727634 DOI: 10.1016/j.jtbi.2018.04.037] [Citation(s) in RCA: 74] [Impact Index Per Article: 12.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2018] [Revised: 04/24/2018] [Accepted: 04/27/2018] [Indexed: 02/02/2023]
Abstract
This study examines accurate and efficient computational method for identification of 5-methylcytosine sites in RNA modification. The occurrence of 5-methylcytosine (m5C) plays a vital role in a number of biological processes. For better comprehension of the biological functions and mechanism it is necessary to recognize m5C sites in RNA precisely. The laboratory techniques and procedures are available to identify m5C sites in RNA, but these procedures require a lot of time and resources. This study develops a new computational method for extracting the features of RNA sequence. In this method, first the RNA sequence is encoded via composite feature vector, then, for the selection of discriminate features, the minimum-redundancy-maximum-relevance algorithm was used. Secondly, the classification method used has been based on a support vector machine by using jackknife cross validation test. The suggested method efficiently identifies m5C sites from non- m5C sites and the outcome of the suggested algorithm is 93.33% with sensitivity of 90.0 and specificity of 96.66 on bench mark datasets. The result exhibits that proposed algorithm shown significant identification performance compared to the existing computational techniques. This study extends the knowledge about the occurrence sites of RNA modification which paves the way for better comprehension of the biological uses and mechanism.
Collapse
Affiliation(s)
- M Fazli Sabooh
- Department of Computer Science, Abdul Wali Khan University Mardan, Pakistan
| | - Nadeem Iqbal
- Department of Computer Science, Abdul Wali Khan University Mardan, Pakistan.
| | - Mukhtaj Khan
- Department of Computer Science, Abdul Wali Khan University Mardan, Pakistan
| | - Muslim Khan
- Department of Computer Science, Abdul Wali Khan University Mardan, Pakistan
| | - H F Maqbool
- University of Engineering & Technology Lahore, Pakistan
| |
Collapse
|
11
|
Zhang L, Kong L. iRSpot-ADPM: Identify recombination spots by incorporating the associated dinucleotide product model into Chou's pseudo components. J Theor Biol 2018; 441:1-8. [PMID: 29305179 DOI: 10.1016/j.jtbi.2017.12.025] [Citation(s) in RCA: 44] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2017] [Revised: 12/18/2017] [Accepted: 12/24/2017] [Indexed: 10/18/2022]
Abstract
Gene recombination is a key process to produce hereditary differences. Recombination spot identification plays an important role in revealing genome evolution and promoting DNA function study. However, traditional experiments are not good at identifying recombination spot with huge amounts of DNA sequences springed up by sequencing. At present, some machine learning methods have been proposed to speed up this identification process. However, the correlations between nucleotides pairs at different positions along DNA sequence is often ignored, which reflects the important sequence order information. For this purpose, this study proposes a novel feature extraction method, called iRSpot-ADPM, based on DNA property in a given DNA sequence. 85 features are selected from the original feature set according to the weights calculated by support vector machine. Five-fold cross validation tests on two widely used benchmark datasets indicate that the proposed method outperforms its existing counterparts on the individual specificity(Spec), Matthews correlation coefficient(MCC) value and overall accuracy(OA). The experimental results show that the proposed method is effective for accurate recombination spot identification. Moreover, it is anticipated that the proposed method could be extended to other biology sequence and be helpful in future research. The datasets and Matlab source codes can be download from the URL: http://stxy.neuq.edu.cn/info/1095/1157.htm.
Collapse
Affiliation(s)
- Lichao Zhang
- School of Mathematics and Statistics, Northeastern University at Qinhuangdao, Qinhuangdao 066004, PR China.
| | - Liang Kong
- School of Mathematics and Information Science & Technology, Hebei Normal University of Science & Technology, Qinhuangdao 066004, PR China
| |
Collapse
|
12
|
Wei L, Tang J, Zou Q. SkipCPP-Pred: an improved and promising sequence-based predictor for predicting cell-penetrating peptides. BMC Genomics 2017. [PMID: 29513192 PMCID: PMC5657092 DOI: 10.1186/s12864-017-4128-1] [Citation(s) in RCA: 76] [Impact Index Per Article: 10.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/08/2023] Open
Abstract
Background Cell-penetrating peptides (CPPs) are short peptides (5–30 amino acids) that can enter almost any cell without significant damage. On account of their high delivery efficiency, CPPs are promising candidates for gene therapy and cancer treatment. Accordingly, techniques that correctly predict CPPs are anticipated to accelerate CPP applications in future therapeutics. Recently, computational methods have been reportedly successful in predicting CPPs. Unfortunately, the predictive performance of existing methods is not satisfactory and reliable so as to accurately identify CPPs. Results In this study, we propose a novel computational predictor called SkipCPP-Pred to further improve the predictive performance. The novelty of the proposed predictor is that we present a sequence-based feature representation algorithm called adaptive k-skip-n-gram that sufficiently captures the intrinsic correlation information of residues. By fusing the proposed adaptive skip features with a random forest (RF) classifier, we successfully construct the prediction model of SkipCPP-Pred. The various jackknife results demonstrate that the proposed SkipCPP-Pred is 3.6% higher than state-of-the-art CPP predictors in terms of accuracy. Moreover, we construct a high-quality benchmark dataset by reducing the data redundancy and enhancing the similarity between the positive and negative classes. Using this dataset to build prediction models, we can successfully avoid the performance bias lying in existing methods and yield a promising predictive model. Conclusions The proposed SkipCPP-Pred is a simple and fast sequence-based predictor featured with the adaptive k-skip-n-gram model for the improved prediction of CPPs. Currently, SkipCPP-Pred is publicly available from an online webserver (http://server.malab.cn/SkipCPP-Pred/Index.html). Electronic supplementary material The online version of this article (10.1186/s12864-017-4128-1) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Leyi Wei
- School of Computer Science and Technology, Tianjin University, Tianjin, 30050, China.,State Key Laboratory of Medicinal Chemical Biology, Nankai University, Tianjin, 300074, China
| | - Jijun Tang
- School of Computer Science and Technology, Tianjin University, Tianjin, 30050, China
| | - Quan Zou
- School of Computer Science and Technology, Tianjin University, Tianjin, 30050, China.
| |
Collapse
|
13
|
Wei L, Bowen Z, Zhiyong C, Gao X, Liao M. Exploring local discriminative information from evolutionary profiles for cytokine–receptor interaction prediction. Neurocomputing 2016. [DOI: 10.1016/j.neucom.2016.02.078] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023]
|
14
|
Predicting the Organelle Location of Noncoding RNAs Using Pseudo Nucleotide Compositions. Interdiscip Sci 2016; 9:540-544. [PMID: 27739055 DOI: 10.1007/s12539-016-0193-4] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2016] [Revised: 09/28/2016] [Accepted: 10/06/2016] [Indexed: 11/27/2022]
Abstract
Noncoding RNAs (ncRNAs) are implicated in various biological processes. Recent findings have demonstrated that the function of ncRNAs correlates with their provenance. Therefore, the recognition of ncRNAs from different organelle genomes will be helpful to understand their molecular functions. However, the weakness of experimental techniques limits the progress toward studying organellar ncRNAs and their functional relevance. As a complement of experiments, computational method provides an important choice to identify ncRNA in different organelles. Thus, a computational model was developed to identify ncRNAs from kinetoplast and mitochondrion organelle genomes. In this model, RNA sequences are encoded by "pseudo dinucleotide composition." It was observed by the jackknife test that the overall success rate achieved by the proposed model was 90.08 %. We hope that the proposed method will be helpful in predicting ncRNA organellar locations.
Collapse
|
15
|
Chen W, Feng P, Ding H, Lin H. PAI: Predicting adenosine to inosine editing sites by using pseudo nucleotide compositions. Sci Rep 2016; 6:35123. [PMID: 27725762 PMCID: PMC5057124 DOI: 10.1038/srep35123] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2016] [Accepted: 09/20/2016] [Indexed: 12/24/2022] Open
Abstract
The adenosine to inosine (A-to-I) editing is the most prevalent kind of RNA editing and involves in many biological processes. Accurate identification of A-to-I editing site is invaluable for better understanding its biological functions. Due to the limitations of experimental methods, in the present study, a support vector machine based-model, called PAI, is proposed to identify A-to-I editing site in D. melanogaster. In this model, RNA sequences are encoded by "pseudo dinucleotide composition" into which six RNA physiochemical properties were incorporated. PAI achieves promising performances in jackknife test and independent dataset test, indicating that it holds very high potential to become a useful tool for identifying A-to-I editing site. For the convenience of experimental scientists, a web-server was constructed for PAI and it is freely accessible at http://lin.uestc.edu.cn/server/PAI.
Collapse
Affiliation(s)
- Wei Chen
- Department of Physics, School of Sciences, and Center for Genomics and Computational Biology, North China University of Science and Technology, Tangshan, 063000, China
| | - Pengmian Feng
- School of Public Health, North China University of Science and Technology, Tangshan, 063000, China
| | - Hui Ding
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Hao Lin
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| |
Collapse
|
16
|
Chen W, Feng P, Tang H, Ding H, Lin H. RAMPred: identifying the N(1)-methyladenosine sites in eukaryotic transcriptomes. Sci Rep 2016; 6:31080. [PMID: 27511610 PMCID: PMC4980636 DOI: 10.1038/srep31080] [Citation(s) in RCA: 41] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2016] [Accepted: 07/12/2016] [Indexed: 12/23/2022] Open
Abstract
N(1)-methyladenosine (m(1)A) is a prominent RNA modification involved in many biological processes. Accurate identification of m(1)A site is invaluable for better understanding the biological functions of m(1)A. However, limitations in experimental methods preclude the progress towards the identification of m(1)A site. As an excellent complement of experimental methods, a support vector machine based-method called RAMPred is proposed to identify m(1)A sites in H. sapiens, M. musculus and S. cerevisiae genomes for the first time. In this method, RNA sequences are encoded by using nucleotide chemical property and nucleotide compositions. RAMPred achieves promising performances in jackknife tests, cross cell line tests and cross species tests, indicating that RAMPred holds very high potential to become a useful tool for identifying m(1)A sites. For the convenience of experimental scientists, a web-server based on the proposed model was constructed and could be freely accessible at http://lin.uestc.edu.cn/server/RAMPred.
Collapse
Affiliation(s)
- Wei Chen
- Department of Physics, School of Sciences, and Center for Genomics and Computational Biology, North China University of Science and Technology, Tangshan, Tangshan 063000, China
| | - Pengmian Feng
- School of Public Health, North China University of Science and Technology, Tangshan, 063000, China
| | - Hua Tang
- Department of Pathophysiology, Southwest Medical University, Luzhou 646000, China
| | - Hui Ding
- Key Laboratory for Neuro-Information of Ministry of Education, Center of Bioinformatics and Center for Information in Biomedicine, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Hao Lin
- Key Laboratory for Neuro-Information of Ministry of Education, Center of Bioinformatics and Center for Information in Biomedicine, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 610054, China
| |
Collapse
|
17
|
Improving N(6)-methyladenosine site prediction with heuristic selection of nucleotide physical-chemical properties. Anal Biochem 2016; 508:104-13. [PMID: 27293216 DOI: 10.1016/j.ab.2016.06.001] [Citation(s) in RCA: 33] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2016] [Revised: 05/31/2016] [Accepted: 06/01/2016] [Indexed: 12/28/2022]
Abstract
N(6)-methyladenosine (m(6)A) is one of the most common and abundant post-transcriptional RNA modifications found in viruses and most eukaryotes. m(6)A plays an essential role in many vital biological processes to regulate gene expression. Because of its widespread distribution across the genomes, the identification of m(6)A sites from RNA sequences is of significant importance for better understanding the regulatory mechanism of m(6)A. Although progress has been achieved in m(6)A site prediction, challenges remain. This article aims to further improve the performance of m(6)A site prediction by introducing a new heuristic nucleotide physical-chemical property selection (HPCS) algorithm. The proposed HPCS algorithm can effectively extract an optimized subset of nucleotide physical-chemical properties under the prescribed feature representation for encoding an RNA sequence into a feature vector. We demonstrate the efficacy of the proposed HPCS algorithm under different feature representations, including pseudo dinucleotide composition (PseDNC), auto-covariance (AC), and cross-covariance (CC). Based on the proposed HPCS algorithm, we implemented an m(6)A site predictor, called M6A-HPCS, which is freely available at http://csbio.njust.edu.cn/bioinf/M6A-HPCS. Experimental results over rigorous jackknife tests on benchmark datasets demonstrated that the proposed M6A-HPCS achieves higher success rates and outperforms existing state-of-the-art sequence-based m(6)A site predictors.
Collapse
|
18
|
Chen W, Feng P, Tang H, Ding H, Lin H. Identifying 2'-O-methylationation sites by integrating nucleotide chemical properties and nucleotide compositions. Genomics 2016; 107:255-8. [PMID: 27191866 DOI: 10.1016/j.ygeno.2016.05.003] [Citation(s) in RCA: 43] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2016] [Revised: 05/04/2016] [Accepted: 05/13/2016] [Indexed: 10/21/2022]
Abstract
2'-O-methylationation is an important post-transcriptional modification and plays important roles in many biological processes. Although experimental technologies have been proposed to detect 2'-O-methylationation sites, they are cost-ineffective. As complements to experimental techniques, computational methods will facilitate the identification of 2'-O-methylationation sites. In the present study, we proposed a support vector machine-based method to identify 2'-O-methylationation sites. In this method, RNA sequences were formulated by nucleotide chemical properties and nucleotide compositions. In the jackknife cross-validation test, the proposed method obtained an accuracy of 95.58% for identifying 2'-O-methylationation sites in the human genome. Moreover, the model was also validated by identifying 2'-O-methylation sites in the Mus musculus and Saccharomyces cerevisiae genomes, and the obtained accuracies are also satisfactory. These results indicate that the proposed method will become a useful tool for the research on 2'-O-methylation.
Collapse
Affiliation(s)
- Wei Chen
- Department of Physics, School of Sciences, and Center for Genomics and Computational Biology, North China University of Science and Technology, Tangshan 063000, China.
| | - Pengmian Feng
- School of Public Health, North China University of Science and Technology, Tangshan 063000, China
| | - Hua Tang
- Department of Pathophysiology, Sichuan Medical University, Luzhou 646000, China
| | - Hui Ding
- Key Laboratory for Neuro-Information of Ministry of Education, Center of Bioinformatics and Center for Information in Biomedicine, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Hao Lin
- Key Laboratory for Neuro-Information of Ministry of Education, Center of Bioinformatics and Center for Information in Biomedicine, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 610054, China.
| |
Collapse
|
19
|
Chen W, Tang H, Lin H. MethyRNA: a web server for identification of N6-methyladenosine sites. J Biomol Struct Dyn 2016; 35:683-687. [DOI: 10.1080/07391102.2016.1157761] [Citation(s) in RCA: 74] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
Affiliation(s)
- Wei Chen
- Department of Physics, School of Sciences, Center for Genomics and Computational Biology, North China University of Science and Technology, Tangshan 063009, China
| | - Hua Tang
- Department of Pathophysiology, Sichuan Medical University, Luzhou 646000, China
| | - Hao Lin
- Key Laboratory for Neuro-Information of Ministry of Education, Center of Bioinformatics, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 610054, China
| |
Collapse
|
20
|
Che Y, Ju Y, Xuan P, Long R, Xing F. Identification of Multi-Functional Enzyme with Multi-Label Classifier. PLoS One 2016; 11:e0153503. [PMID: 27078147 PMCID: PMC4831692 DOI: 10.1371/journal.pone.0153503] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2016] [Accepted: 03/30/2016] [Indexed: 11/23/2022] Open
Abstract
Enzymes are important and effective biological catalyst proteins participating in almost all active cell processes. Identification of multi-functional enzymes is essential in understanding the function of enzymes. Machine learning methods perform better in protein structure and function prediction than traditional biological wet experiments. Thus, in this study, we explore an efficient and effective machine learning method to categorize enzymes according to their function. Multi-functional enzymes are predicted with a special machine learning strategy, namely, multi-label classifier. Sequence features are extracted from a position-specific scoring matrix with autocross-covariance transformation. Experiment results show that the proposed method obtains an accuracy rate of 94.1% in classifying six main functional classes through five cross-validation tests and outperforms state-of-the-art methods. In addition, 91.25% accuracy is achieved in multi-functional enzyme prediction, which is often ignored in other enzyme function prediction studies. The online prediction server and datasets can be accessed from the link http://server.malab.cn/MEC/.
Collapse
Affiliation(s)
- Yuxin Che
- School of Information Science and Technology, Xiamen University, Xiamen, Fujian 361005, China
| | - Ying Ju
- School of Information Science and Technology, Xiamen University, Xiamen, Fujian 361005, China
| | - Ping Xuan
- School of Computer Science and Technology, Heilongjiang University, Harbin 150080, China
| | - Ren Long
- School of Computer Science and Technology, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, Guangdong 518055, China
| | - Fei Xing
- School of Aerospace Engineering, Xiamen University, Xiamen, Fujian 361005, China
| |
Collapse
|
21
|
Predicting CpG methylation levels by integrating Infinium HumanMethylation450 BeadChip array data. Genomics 2016; 107:132-7. [DOI: 10.1016/j.ygeno.2016.02.005] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2015] [Revised: 02/19/2016] [Accepted: 02/22/2016] [Indexed: 12/23/2022]
|
22
|
Liu B, Fang L. WITHDRAWN: Identification of microRNA precursor based on gapped n-tuple structure status composition kernel. Comput Biol Chem 2016:S1476-9271(16)30036-6. [PMID: 26935400 DOI: 10.1016/j.compbiolchem.2016.02.010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2016] [Accepted: 02/01/2016] [Indexed: 10/22/2022]
Abstract
This article has been withdrawn at the request of the author(s) and/or editor. The Publisher apologizes for any inconvenience this may cause. The full Elsevier Policy on Article Withdrawal can be found at http://www.elsevier.com/locate/withdrawalpolicy.
Collapse
Affiliation(s)
- Bin Liu
- School of Computer Science and Technology, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, Guangdong 518055, China; Key Laboratory of Network Oriented Intelligent Computation, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, Guangdong 518055, China.
| | - Longyun Fang
- School of Computer Science and Technology, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, Guangdong 518055, China.
| |
Collapse
|
23
|
Fan S, Li C, Ai R, Wang M, Firestein GS, Wang W. Computationally expanding infinium HumanMethylation450 BeadChip array data to reveal distinct DNA methylation patterns of rheumatoid arthritis. Bioinformatics 2016; 32:1773-8. [PMID: 26883487 DOI: 10.1093/bioinformatics/btw089] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2015] [Accepted: 02/11/2016] [Indexed: 12/31/2022] Open
Abstract
MOTIVATION DNA methylation signatures in rheumatoid arthritis (RA) have been identified in fibroblast-like synoviocytes (FLS) with Illumina HumanMethylation450 array. Since <2% of CpG sites are covered by the Illumina 450K array and whole genome bisulfite sequencing is still too expensive for many samples, computationally predicting DNA methylation levels based on 450K data would be valuable to discover more RA-related genes. RESULTS We developed a computational model that is trained on 14 tissues with both whole genome bisulfite sequencing and 450K array data. This model integrates information derived from the similarity of local methylation pattern between tissues, the methylation information of flanking CpG sites and the methylation tendency of flanking DNA sequences. The predicted and measured methylation values were highly correlated with a Pearson correlation coefficient of 0.9 in leave-one-tissue-out cross-validations. Importantly, the majority (76%) of the top 10% differentially methylated loci among the 14 tissues was correctly detected using the predicted methylation values. Applying this model to 450K data of RA, osteoarthritis and normal FLS, we successfully expanded the coverage of CpG sites 18.5-fold and accounts for about 30% of all the CpGs in the human genome. By integrative omics study, we identified genes and pathways tightly related to RA pathogenesis, among which 12 genes were supported by triple evidences, including 6 genes already known to perform specific roles in RA and 6 genes as new potential therapeutic targets. AVAILABILITY AND IMPLEMENTATION The source code, required data for prediction, and demo data for test are freely available at: http://wanglab.ucsd.edu/star/LR450K/ CONTACT: wei-wang@ucsd.edu or gfirestein@ucsd.edu SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Shicai Fan
- School of Automation Engineering, University of Electronic Science and Technology of China, Chengdu, China Department of Chemistry and Biochemistry
| | - Chengzhe Li
- School of Automation Engineering, University of Electronic Science and Technology of China, Chengdu, China
| | - Rizi Ai
- Department of Chemistry and Biochemistry
| | | | - Gary S Firestein
- Division of Rheumatology, Allergy and Immunology, University of California San Diego, La Jolla, CA, USA
| | - Wei Wang
- Department of Chemistry and Biochemistry
| |
Collapse
|
24
|
Feng P, Ding H, Chen W, Lin H. Identifying RNA 5-methylcytosine sites via pseudo nucleotide compositions. MOLECULAR BIOSYSTEMS 2016; 12:3307-3311. [DOI: 10.1039/c6mb00471g] [Citation(s) in RCA: 44] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/28/2022]
Abstract
RNA 5-methylcytosine (m5C) has been discovered from archaea to eukaryotes, which is catalyzed by RNA methyltransferase.
Collapse
Affiliation(s)
- Pengmian Feng
- School of Public Health
- North China University of Science and Technology
- Tangshan
- China
| | - Hui Ding
- Key Laboratory for NeuroInformation of Ministry of Education
- School of Life Science and Technology
- University of Electronic Science and Technology of China
- Chengdu 610054
- China
| | - Wei Chen
- Department of Physics
- School of Sciences
- Center for Genomics and Computational Biology
- North China University of Science and Technology
- Tangshan 063009
| | - Hao Lin
- Key Laboratory for NeuroInformation of Ministry of Education
- School of Life Science and Technology
- University of Electronic Science and Technology of China
- Chengdu 610054
- China
| |
Collapse
|
25
|
Tang H, Chen W, Lin H. Identification of immunoglobulins using Chou's pseudo amino acid composition with feature selection technique. MOLECULAR BIOSYSTEMS 2016; 12:1269-75. [DOI: 10.1039/c5mb00883b] [Citation(s) in RCA: 147] [Impact Index Per Article: 18.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/14/2023]
Abstract
Immunoglobulins, also called antibodies, are a group of cell surface proteins which are produced by the immune system in response to the presence of a foreign substance (called antigen).
Collapse
Affiliation(s)
- Hua Tang
- Department of Pathophysiology
- Sichuan Medical University
- Luzhou 646000
- China
| | - Wei Chen
- Department of Physics
- School of Sciences
- Center for Genomics and Computational Biology
- North China University of Science and Technology
- Tangshan 063009
| | - Hao Lin
- Key Laboratory for NeuroInformation of Ministry of Education
- School of Life Science and Technology
- University of Electronic Science and Technology of China
- Chengdu 610054
- China
| |
Collapse
|
26
|
Using weighted features to predict recombination hotspots in Saccharomyces cerevisiae. J Theor Biol 2015; 382:15-22. [DOI: 10.1016/j.jtbi.2015.06.030] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2015] [Revised: 06/04/2015] [Accepted: 06/20/2015] [Indexed: 01/06/2023]
|
27
|
Chen W, Feng P, Ding H, Lin H, Chou KC. Benchmark data for identifying N(6)-methyladenosine sites in the Saccharomyces cerevisiae genome. Data Brief 2015; 5:376-8. [PMID: 26958595 PMCID: PMC4773366 DOI: 10.1016/j.dib.2015.09.008] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2015] [Revised: 08/30/2015] [Accepted: 09/10/2015] [Indexed: 11/19/2022] Open
Abstract
This data article contains the benchmark dataset for training and testing iRNA-Methyl, a web-server predictor for identifying N(6)-methyladenosine sites in RNA (Chen et al., 2015 [15]). It can also be used to develop other predictors for identifying N(6)-methyladenosine sites in the Saccharomyces cerevisiae genome.
Collapse
Affiliation(s)
- Wei Chen
- Department of Physics, School of Sciences, Center for Genomics and Computational Biology, North China University of Science and Technology, Tangshan 063009, China; Gordon Life Science Institute, Belmont, MA, United States
| | - Pengmian Feng
- School of Public Health, North China University of Science and Technology, Tangshan 063000, China
| | - Hui Ding
- School of Public Health, North China University of Science and Technology, Tangshan 063000, China
| | - Hao Lin
- Key Laboratory for Neuro-Information of Ministry of Education, Center of Bioinformatics and Center for Information in Biomedicine, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 610054, China; Gordon Life Science Institute, Belmont, MA, United States
| | - Kuo-Chen Chou
- Gordon Life Science Institute, Belmont, MA, United States; Center of Excellence in Genomic Medicine Research (CEGMR), King Abdulaziz University, Jeddah 21589, Saudi Arabia
| |
Collapse
|
28
|
Survey of Programs Used to Detect Alternative Splicing Isoforms from Deep Sequencing Data In Silico. BIOMED RESEARCH INTERNATIONAL 2015; 2015:831352. [PMID: 26421304 PMCID: PMC4573434 DOI: 10.1155/2015/831352] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/26/2014] [Revised: 02/17/2015] [Accepted: 03/02/2015] [Indexed: 11/29/2022]
Abstract
Next-generation sequencing techniques have been rapidly emerging. However, the massive sequencing reads hide a great deal of unknown important information. Advances have enabled researchers to discover alternative splicing (AS) sites and isoforms using computational approaches instead of molecular experiments. Given the importance of AS for gene expression and protein diversity in eukaryotes, detecting alternative splicing and isoforms represents a hot topic in systems biology and epigenetics research. The computational methods applied to AS prediction have improved since the emergence of next-generation sequencing. In this study, we introduce state-of-the-art research on AS and then compare the research methods and software tools available for AS based on next-generation sequencing reads. Finally, we discuss the prospects of computational methods related to AS.
Collapse
|
29
|
Wei L, Liao M, Gao X, Zou Q. Enhanced Protein Fold Prediction Method Through a Novel Feature Extraction Technique. IEEE Trans Nanobioscience 2015; 14:649-59. [DOI: 10.1109/tnb.2015.2450233] [Citation(s) in RCA: 81] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
|
30
|
Chen W, Lin H, Chou KC. Pseudo nucleotide composition or PseKNC: an effective formulation for analyzing genomic sequences. MOLECULAR BIOSYSTEMS 2015; 11:2620-34. [DOI: 10.1039/c5mb00155b] [Citation(s) in RCA: 262] [Impact Index Per Article: 29.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
With the avalanche of DNA/RNA sequences generated in the post-genomic age, it is urgent to develop automated methods for analyzing the relationship between the sequences and their functions.
Collapse
Affiliation(s)
- Wei Chen
- Department of Physics
- School of Sciences
- and Center for Genomics and Computational Biology
- Hebei United University
- Tangshan 063000
| | - Hao Lin
- Gordon Life Science Institute
- Boston
- USA
- Key Laboratory for Neuro-Information of Ministry of Education
- Center of Bioinformatics
| | - Kuo-Chen Chou
- Department of Physics
- School of Sciences
- and Center for Genomics and Computational Biology
- Hebei United University
- Tangshan 063000
| |
Collapse
|