1
|
Alatrany AS, Khan W, Hussain AJ, Mustafina J, Al-Jumeily D. Transfer Learning for Classification of Alzheimer's Disease Based on Genome Wide Data. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:2700-2711. [PMID: 37018274 DOI: 10.1109/tcbb.2022.3233869] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
Alzheimer's disease (AD) is a type of brain disorder that is regarded as a degenerative disease because the corresponding symptoms aggravate with the time progression. Single nucleotide polymorphisms (SNPs) have been identified as relevant biomarkers for this condition. This study aims to identify SNPs biomarkers associated with the AD in order to perform a reliable classification of AD. In contrast to existing related works, we utilize deep transfer learning with varying experimental analysis for reliable classification of AD. For this purpose, the convolutional neural networks (CNN) are firstly trained over the genome-wide association studies (GWAS) dataset requested from the AD neuroimaging initiative. We then employ the deep transfer learning for further training of our CNN (as base model) over a different AD GWAS dataset, to extract the final set of features. The extracted features are then fed into Support Vector Machine for classification of AD. Detailed experiments are performed using multiple datasets and varying experimental configurations. The statistical outcomes indicate an accuracy of 89% which is a significant improvement when benchmarked with existing related works.
Collapse
|
2
|
Cui Z, Chen ZH, Zhang QH, Gribova V, Filaretov VF, Huang DS. RMSCNN: A Random Multi-Scale Convolutional Neural Network for Marine Microbial Bacteriocins Identification. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:3663-3672. [PMID: 34699364 DOI: 10.1109/tcbb.2021.3122183] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
The abuse of traditional antibiotics has led to an increase in the resistance of bacteria and viruses. Similar to the function of antibacterial peptides, bacteriocins are more common as a kind of peptides produced by bacteria that have bactericidal or bacterial effects. More importantly, the marine environment is one of the most abundant resources for extracting marine microbial bacteriocins (MMBs). Identifying bacteriocins from marine microorganisms is a common goal for the development of new drugs. Effective use of MMBs will greatly alleviate the current antibiotic abuse problem. In this work, deep learning is used to identify meaningful MMBs. We propose a random multi-scale convolutional neural network method. In the scale setting, we set a random model to update the scale value randomly. The scale selection method can reduce the contingency caused by artificial setting under certain conditions, thereby making the method more extensive. The results show that the classification performance of the proposed method is better than the state-of-the-art classification methods. In addition, some potential MMBs are predicted, and some different sequence analyses are performed on these candidates. It is worth mentioning that after sequence analysis, the HNH endonucleases of different marine bacteria are considered as potential bacteriocins.
Collapse
|
3
|
Lu X, Li J, Zhu Z, Yuan Y, Chen G, He K. Predicting miRNA-Disease Associations via Combining Probability Matrix Feature Decomposition With Neighbor Learning. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:3160-3170. [PMID: 34260356 DOI: 10.1109/tcbb.2021.3097037] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Predicting the associations of miRNAs and diseases may uncover the causation of various diseases. Many methods are emerging to tackle the sparse and unbalanced disease related miRNA prediction. Here, we propose a Probabilistic matrix decomposition combined with neighbor learning to identify MiRNA-Disease Associations utilizing heterogeneous data(PMDA). First, we build similarity networks for diseases and miRNAs, respectively, by integrating semantic information and functional interactions. Second, we construct a neighbor learning model in which the neighbor information of individual miRNA or disease is utilized to enhance the association relationship to tackle the spare problem. Third, we predict the potential association between miRNAs and diseases via probability matrix decomposition. The experimental results show that PMDA is superior to other five methods in sparse and unbalanced data. The case study shows that the new miRNA-disease interactions predicted by the PMDA are effective and the performance of the PMDA is superior to other methods.
Collapse
|
4
|
Zhang Q, Zhang Y, Wang S, Chen ZH, Gribova V, Filaretov VF, Huang DS. Predicting In-Vitro DNA-Protein Binding With a Spatially Aligned Fusion of Sequence and Shape. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:3144-3153. [PMID: 34882561 DOI: 10.1109/tcbb.2021.3133869] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Discovery of transcription factor binding sites (TFBSs) is of primary importance for understanding the underlying binding mechanic and gene regulation process. Growing evidence indicates that apart from the primary DNA sequences, DNA shape landscape has a significant influence on transcription factor binding preference. To effectively model the co-influence of sequence and shape features, we emphasize the importance of position information of sequence motif and shape pattern. In this paper, we propose a novel deep learning-based architecture, named hybridShape eDeepCNN, for TFBS prediction which integrates DNA sequence and shape information in a spatially aligned manner. Our model utilizes the power of the multi-layer convolutional neural network and constructs an independent subnetwork to adapt for the distinct data distribution of heterogeneous features. Besides, we explore the usage of continuous embedding vectors as the representation of DNA sequences. Based on the experiments on 20 in-vitro datasets derived from universal protein binding microarrays (uPBMs), we demonstrate the superiority of our proposed method and validate the underlying design logic.
Collapse
|
5
|
Wei L, Chen J, Song C, Zhang Y, Zhang Y, Xu M, Feng C, Gao Y, Qian F, Wang Q, Shang D, Zhou X, Zhu J, Wang X, Jia Y, Liu J, Zhu Y, Li C. Cancer CRC: A Comprehensive Cancer Core Transcriptional Regulatory Circuit Resource and Analysis Platform. Front Oncol 2021; 11:761700. [PMID: 34712617 PMCID: PMC8546348 DOI: 10.3389/fonc.2021.761700] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2021] [Accepted: 09/24/2021] [Indexed: 11/30/2022] Open
Abstract
A core transcriptional regulatory circuit (CRC) is a group of interconnected auto-regulating transcription factors (TFs) that form loops and can be identified by super-enhancers (SEs). Studies have indicated that CRCs play an important role in defining cellular identity and determining cellular fate. Additionally, core TFs in CRCs are regulators of cell-type-specific transcriptional regulation. However, a global view of CRC properties across various cancer types has not been generated. Thus, we integrated paired cancer ATAC-seq and H3K27ac ChIP-seq data for specific cell lines to develop the Cancer CRC (http://bio.liclab.net/Cancer_crc/index.html). This platform documented 94,108 cancer CRCs, including 325 core TFs. The cancer CRC also provided the “SE active core TFs analysis” and “TF enrichment analysis” tools to identify potentially key TFs in cancer. In addition, we performed a comprehensive analysis of core TFs in various cancer types to reveal conserved and cancer-specific TFs.
Collapse
Affiliation(s)
- Ling Wei
- School of Medical Informatics, Daqing Campus, Harbin Medical University, Daqing, China.,The First Affiliated Hospital, Institute of Cardiovascular Disease, Hengyang Medical School, University of South China, Hengyang, China.,Cardiovascular Lab of Big Data and Imaging Artificial Intelligence, The First Affiliated Hospital, Hengyang Medical School, University of South China, Hengyang, China
| | - Jiaxin Chen
- School of Medical Informatics, Daqing Campus, Harbin Medical University, Daqing, China
| | - Chao Song
- School of Medical Informatics, Daqing Campus, Harbin Medical University, Daqing, China.,The First Affiliated Hospital, Institute of Cardiovascular Disease, Hengyang Medical School, University of South China, Hengyang, China.,Cardiovascular Lab of Big Data and Imaging Artificial Intelligence, The First Affiliated Hospital, Hengyang Medical School, University of South China, Hengyang, China
| | - Yuexin Zhang
- School of Medical Informatics, Daqing Campus, Harbin Medical University, Daqing, China
| | - Yimeng Zhang
- School of Medical Informatics, Daqing Campus, Harbin Medical University, Daqing, China
| | - Mingcong Xu
- School of Medical Informatics, Daqing Campus, Harbin Medical University, Daqing, China
| | - Chenchen Feng
- School of Medical Informatics, Daqing Campus, Harbin Medical University, Daqing, China
| | - Yu Gao
- School of Medical Informatics, Daqing Campus, Harbin Medical University, Daqing, China
| | - Fengcui Qian
- School of Medical Informatics, Daqing Campus, Harbin Medical University, Daqing, China.,The First Affiliated Hospital, Institute of Cardiovascular Disease, Hengyang Medical School, University of South China, Hengyang, China.,Cardiovascular Lab of Big Data and Imaging Artificial Intelligence, The First Affiliated Hospital, Hengyang Medical School, University of South China, Hengyang, China
| | - Qiuyu Wang
- School of Medical Informatics, Daqing Campus, Harbin Medical University, Daqing, China.,The First Affiliated Hospital, Institute of Cardiovascular Disease, Hengyang Medical School, University of South China, Hengyang, China.,Cardiovascular Lab of Big Data and Imaging Artificial Intelligence, The First Affiliated Hospital, Hengyang Medical School, University of South China, Hengyang, China.,School of Computer, University of South China, Hengyang, China.,Hunan Provincial Base for Scientific and Technological Innovation Cooperation, University of South China, Hengyang, China
| | - Desi Shang
- The First Affiliated Hospital, Institute of Cardiovascular Disease, Hengyang Medical School, University of South China, Hengyang, China.,Cardiovascular Lab of Big Data and Imaging Artificial Intelligence, The First Affiliated Hospital, Hengyang Medical School, University of South China, Hengyang, China.,School of Computer, University of South China, Hengyang, China.,Hunan Provincial Base for Scientific and Technological Innovation Cooperation, University of South China, Hengyang, China
| | - Xinyuan Zhou
- School of Medical Informatics, Daqing Campus, Harbin Medical University, Daqing, China
| | - Jiang Zhu
- School of Medical Informatics, Daqing Campus, Harbin Medical University, Daqing, China
| | - Xiaopeng Wang
- School of Medical Informatics, Daqing Campus, Harbin Medical University, Daqing, China
| | - Yijie Jia
- School of Medical Informatics, Daqing Campus, Harbin Medical University, Daqing, China
| | - Jiaqi Liu
- The First Affiliated Hospital, Institute of Cardiovascular Disease, Hengyang Medical School, University of South China, Hengyang, China.,Cardiovascular Lab of Big Data and Imaging Artificial Intelligence, The First Affiliated Hospital, Hengyang Medical School, University of South China, Hengyang, China.,School of Computer, University of South China, Hengyang, China.,Hunan Provincial Base for Scientific and Technological Innovation Cooperation, University of South China, Hengyang, China
| | - Yanbing Zhu
- Experimental and Translational Research Center, Beijing Friendship Hospital, Capital Medical University, Beijing, China.,Beijing Clinical Research Institute, Beijing, China
| | - Chunquan Li
- The First Affiliated Hospital, Institute of Cardiovascular Disease, Hengyang Medical School, University of South China, Hengyang, China.,School of Medical Informatics, Daqing Campus, Harbin Medical University, Daqing, China.,Cardiovascular Lab of Big Data and Imaging Artificial Intelligence, The First Affiliated Hospital, Hengyang Medical School, University of South China, Hengyang, China.,School of Computer, University of South China, Hengyang, China.,Hunan Provincial Base for Scientific and Technological Innovation Cooperation, University of South China, Hengyang, China.,General Surgery Department, Beijing Friendship Hospital, Capital Medical University, Beijing, China.,Guangxi Key Laboratory of Diabetic Systems Medicine, Guilin Medical University, Guilin, China
| |
Collapse
|
6
|
Sun S, Yu X, Sun F, Tang Y, Zhao J, Zeng T. Dynamically characterizing individual clinical change by the steady state of disease-associated pathway. BMC Bioinformatics 2019; 20:697. [PMID: 31874621 PMCID: PMC6929545 DOI: 10.1186/s12859-019-3271-x] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022] Open
Abstract
Background Along with the development of precision medicine, individual heterogeneity is attracting more and more attentions in clinical research and application. Although the biomolecular reaction seems to be some various when different individuals suffer a same disease (e.g. virus infection), the final pathogen outcomes of individuals always can be mainly described by two categories in clinics, i.e. symptomatic and asymptomatic. Thus, it is still a great challenge to characterize the individual specific intrinsic regulatory convergence during dynamic gene regulation and expression. Except for individual heterogeneity, the sampling time also increase the expression diversity, so that, the capture of similar steady biological state is a key to characterize individual dynamic biological processes. Results Assuming the similar biological functions (e.g. pathways) should be suitable to detect consistent functions rather than chaotic genes, we design and implement a new computational framework (ABP: Attractor analysis of Boolean network of Pathway). ABP aims to identify the dynamic phenotype associated pathways in a state-transition manner, using the network attractor to model and quantify the steady pathway states characterizing the final steady biological sate of individuals (e.g. normal or disease). By analyzing multiple temporal gene expression datasets of virus infections, ABP has shown its effectiveness on identifying key pathways associated with phenotype change; inferring the consensus functional cascade among key pathways; and grouping pathway activity states corresponding to disease states. Conclusions Collectively, ABP can detect key pathways and infer their consensus functional cascade during dynamical process (e.g. virus infection), and can also categorize individuals with disease state well, which is helpful for disease classification and prediction.
Collapse
Affiliation(s)
- Shaoyan Sun
- School of Mathematics and Statistics Science, Ludong University, Yantai, 264025, China.
| | - Xiangtian Yu
- Shanghai Jiao Tong University Affiliated Sixth People's Hospital, Shanghai, 200233, China.,Key Laboratory of Systems Biology, Institute of Biochemistry and Cell Biology, Chinese Academy Science, Shanghai, 200031, China
| | - Fengnan Sun
- Medical Laboratory, Yantaishan Hospital, Yantai, 264001, China
| | - Ying Tang
- Key Laboratory of Systems Biology, Institute of Biochemistry and Cell Biology, Chinese Academy Science, Shanghai, 200031, China
| | - Juan Zhao
- Key Laboratory of Systems Biology, Institute of Biochemistry and Cell Biology, Chinese Academy Science, Shanghai, 200031, China
| | - Tao Zeng
- Key Laboratory of Systems Biology, Institute of Biochemistry and Cell Biology, Chinese Academy Science, Shanghai, 200031, China. .,Shanghai Research Center for Brain Science and Brain-Inspired Intelligence, Shanghai, 201210, China.
| |
Collapse
|
7
|
Zhao J, Feng H, Zhu D, Zhang C, Xu Y. DTA-SiST: de novo transcriptome assembly by using simplified suffix trees. BMC Bioinformatics 2019; 20:698. [PMID: 31874618 PMCID: PMC6929406 DOI: 10.1186/s12859-019-3272-9] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
Background Alternative splicing allows the pre-mRNAs of a gene to be spliced into various mRNAs, which greatly increases the diversity of proteins. High-throughput sequencing of mRNAs has revolutionized our ability for transcripts reconstruction. However, the massive size of short reads makes de novo transcripts assembly an algorithmic challenge. Results We develop a novel radical framework, called DTA-SiST, for de novo transcriptome assembly based on suffix trees. DTA-SiST first extends contigs by reads that have the longest overlaps with the contigs’ terminuses. These reads can be found in linear time of the lengths of the reads through a well-designed suffix tree structure. Then, DTA-SiST constructs splicing graphs based on contigs for each gene locus. Finally, DTA-SiST proposes two strategies to extract transcript-representing paths: a depth-first enumeration strategy and a hybrid strategy based on length and coverage. We implemented the above two strategies and compared them with the state-of-the-art de novo assemblers on both simulated and real datasets. Experimental results showed that the depth-first enumeration strategy performs always better with recall and also better with precision for smaller datasets while the hybrid strategy leads with precision for big datasets. Conclusions DTA-SiST performs more competitive than the other compared de novo assemblers especially with precision measure, due to the read-based contig extension strategy and the elegant transcripts extraction rules.
Collapse
Affiliation(s)
- Jin Zhao
- School of Computer Science and Technology, Shandong University, Binhai Road, Qingdao, Shandong, People's Republic of China
| | - Haodi Feng
- School of Computer Science and Technology, Shandong University, Binhai Road, Qingdao, Shandong, People's Republic of China.
| | - Daming Zhu
- School of Computer Science and Technology, Shandong University, Binhai Road, Qingdao, Shandong, People's Republic of China
| | - Chi Zhang
- Department of Medical and Molecular Genetics and Center for Computational Biology and Bioinformatics, Indiana University, Indianapolis, IN, USA
| | - Ying Xu
- Department of Biochemistry and Molecular Biology, University of Georgia, Athens, GA, USA
| |
Collapse
|
8
|
Han Z, Wang T, Tian R, Zhou W, Wang P, Ren P, Zong J, Hu Y, Jin S, Jiang Q. BIN1 rs744373 variant shows different association with Alzheimer's disease in Caucasian and Asian populations. BMC Bioinformatics 2019; 20:691. [PMID: 31874619 PMCID: PMC6929404 DOI: 10.1186/s12859-019-3264-9] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/19/2023] Open
Abstract
BACKGROUND The association between BIN1 rs744373 variant and Alzheimer's disease (AD) had been identified by genome-wide association studies (GWASs) as well as candidate gene studies in Caucasian populations. But in East Asian populations, both positive and negative results had been identified by association studies. Considering the smaller sample sizes of the studies in East Asian, we believe that the results did not have enough statistical power. RESULTS We conducted a meta-analysis with 71,168 samples (22,395 AD cases and 48,773 controls, from 37 studies of 19 articles). Based on the additive model, we observed significant genetic heterogeneities in pooled populations as well as Caucasians and East Asians. We identified a significant association between rs744373 polymorphism with AD in pooled populations (P = 5 × 10- 07, odds ratio (OR) = 1.12, and 95% confidence interval (CI) 1.07-1.17) and in Caucasian populations (P = 3.38 × 10- 08, OR = 1.16, 95% CI 1.10-1.22). But in the East Asian populations, the association was not identified (P = 0.393, OR = 1.057, and 95% CI 0.95-1.15). Besides, the regression analysis suggested no significant publication bias. The results for sensitivity analysis as well as meta-analysis under the dominant model and recessive model remained consistent, which demonstrated the reliability of our finding. CONCLUSIONS The large-scale meta-analysis highlighted the significant association between rs744373 polymorphism and AD risk in Caucasian populations but not in the East Asian populations.
Collapse
Affiliation(s)
- Zhifa Han
- School of Life Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Tao Wang
- School of Life Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Rui Tian
- School of Life Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Wenyang Zhou
- School of Life Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Pingping Wang
- School of Life Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Peng Ren
- School of Life Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Jian Zong
- School of Life Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Yang Hu
- School of Life Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Shuilin Jin
- Department of Mathematics, Harbin Institute of Technology, Harbin, China.
| | - Qinghua Jiang
- School of Life Science and Technology, Harbin Institute of Technology, Harbin, China.
| |
Collapse
|
9
|
Zou B, Chen C, Zhao R, Ouyang P, Zhu C, Chen Q, Duan X. A novel glaucomatous representation method based on Radon and wavelet transform. BMC Bioinformatics 2019; 20:693. [PMID: 31874641 PMCID: PMC6929399 DOI: 10.1186/s12859-019-3267-6] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/02/2022] Open
Abstract
Background Glaucoma is an irreversible eye disease caused by the optic nerve injury. Therefore, it usually changes the structure of the optic nerve head (ONH). Clinically, ONH assessment based on fundus image is one of the most useful way for glaucoma detection. However, the effective representation for ONH assessment is a challenging task because its structural changes result in the complex and mixed visual patterns. Method We proposed a novel feature representation based on Radon and Wavelet transform to capture these visual patterns. Firstly, Radon transform (RT) is used to map the fundus image into Radon domain, in which the spatial radial variations of ONH are converted to a discrete signal for the description of image structural features. Secondly, the discrete wavelet transform (DWT) is utilized to capture differences and get quantitative representation. Finally, principal component analysis (PCA) and support vector machine (SVM) are used for dimensionality reduction and glaucoma detection. Results The proposed method achieves the state-of-the-art detection performance on RIMONE-r2 dataset with the accuracy and area under the curve (AUC) at 0.861 and 0.906, respectively. Conclusion In conclusion, we showed that the proposed method has the capacity as an effective tool for large-scale glaucoma screening, and it can provide a reference for the clinical diagnosis on glaucoma.
Collapse
Affiliation(s)
- Beiji Zou
- School of Computer Science and Engineering, Central South University, Changsha, 410083, China.,Hunan Province Engineering Technology Research Center of Computer Vision and Intelligent Medical Treatment, Changsha, 410083, China
| | - Changlong Chen
- School of Computer Science and Engineering, Central South University, Changsha, 410083, China.,Hunan Province Engineering Technology Research Center of Computer Vision and Intelligent Medical Treatment, Changsha, 410083, China
| | - Rongchang Zhao
- School of Computer Science and Engineering, Central South University, Changsha, 410083, China. .,Hunan Province Engineering Technology Research Center of Computer Vision and Intelligent Medical Treatment, Changsha, 410083, China.
| | - Pingbo Ouyang
- School of Computer Science and Engineering, Central South University, Changsha, 410083, China.,The Second Xiangya Hospital of Central South University, Changsha, 410011, China
| | - Chengzhang Zhu
- School of Computer Science and Engineering, Central South University, Changsha, 410083, China.,Hunan Province Engineering Technology Research Center of Computer Vision and Intelligent Medical Treatment, Changsha, 410083, China
| | - Qilin Chen
- School of Computer Science and Engineering, Central South University, Changsha, 410083, China.,Hunan Province Engineering Technology Research Center of Computer Vision and Intelligent Medical Treatment, Changsha, 410083, China
| | - Xuanchu Duan
- The Second Xiangya Hospital of Central South University, Changsha, 410011, China
| |
Collapse
|
10
|
Wang S, Wang X. Prediction of protein structural classes by different feature expressions based on 2-D wavelet denoising and fusion. BMC Bioinformatics 2019; 20:701. [PMID: 31874617 PMCID: PMC6929547 DOI: 10.1186/s12859-019-3276-5] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023] Open
Abstract
BACKGROUND Protein structural class predicting is a heavily researched subject in bioinformatics that plays a vital role in protein functional analysis, protein folding recognition, rational drug design and other related fields. However, when traditional feature expression methods are adopted, the features usually contain considerable redundant information, which leads to a very low recognition rate of protein structural classes. RESULTS We constructed a prediction model based on wavelet denoising using different feature expression methods. A new fusion idea, first fuse and then denoise, is proposed in this article. Two types of pseudo amino acid compositions are utilized to distill feature vectors. Then, a two-dimensional (2-D) wavelet denoising algorithm is used to remove the redundant information from two extracted feature vectors. The two feature vectors based on parallel 2-D wavelet denoising are fused, which is known as PWD-FU-PseAAC. The related source codes are available at https://github.com/Xiaoheng-Wang12/Wang-xiaoheng/tree/master. CONCLUSIONS Experimental verification of three low-similarity datasets suggests that the proposed model achieves notably good results as regarding the prediction of protein structural classes.
Collapse
Affiliation(s)
- Shunfang Wang
- Department of Computer Science and Engineering, School of Information Science and Engineering, Yunnan University, Kunming, 650504, People's Republic of China.
| | - Xiaoheng Wang
- Department of Computer Science and Engineering, School of Information Science and Engineering, Yunnan University, Kunming, 650504, People's Republic of China
| |
Collapse
|
11
|
Zhang Y, Qiao S, Ji S, Li Y. DeepSite: bidirectional LSTM and CNN models for predicting DNA–protein binding. INT J MACH LEARN CYB 2019. [DOI: 10.1007/s13042-019-00990-x] [Citation(s) in RCA: 36] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
|
12
|
Hameed SS, Hassan R, Muhammad FF. Selection and classification of gene expression in autism disorder: Use of a combination of statistical filters and a GBPSO-SVM algorithm. PLoS One 2017; 12:e0187371. [PMID: 29095904 PMCID: PMC5667738 DOI: 10.1371/journal.pone.0187371] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2017] [Accepted: 10/18/2017] [Indexed: 11/30/2022] Open
Abstract
In this work, gene expression in autism spectrum disorder (ASD) is analyzed with the goal of selecting the most attributed genes and performing classification. The objective was achieved by utilizing a combination of various statistical filters and a wrapper-based geometric binary particle swarm optimization-support vector machine (GBPSO-SVM) algorithm. The utilization of different filters was accentuated by incorporating a mean and median ratio criterion to remove very similar genes. The results showed that the most discriminative genes that were identified in the first and last selection steps included the presence of a repetitive gene (CAPS2), which was assigned as the gene most highly related to ASD risk. The merged gene subset that was selected by the GBPSO-SVM algorithm was able to enhance the classification accuracy.
Collapse
Affiliation(s)
- Shilan S. Hameed
- Department of Computer Science, Faculty of Computing, Universiti Teknologi Malaysia, Johor Bahru, Malaysia
- Department of Software and Informatics Engineering, College of Engineering, Salahaddin University, Erbil, Kurdistan Region, Iraq
| | - Rohayanti Hassan
- Department of Software Engineering, Faculty of Computing, Universiti Teknologi Malaysia, Johor Bahru, Malaysia
| | - Fahmi F. Muhammad
- Department of Physics, Faculty of Science & Health, Koya University, Koya, Kurdistan Region, Iraq
| |
Collapse
|
13
|
Mathematical and Computational Modeling in Complex Biological Systems. BIOMED RESEARCH INTERNATIONAL 2017; 2017:5958321. [PMID: 28386558 PMCID: PMC5366773 DOI: 10.1155/2017/5958321] [Citation(s) in RCA: 34] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/16/2016] [Revised: 12/20/2016] [Accepted: 01/16/2017] [Indexed: 12/22/2022]
Abstract
The biological process and molecular functions involved in the cancer progression remain difficult to understand for biologists and clinical doctors. Recent developments in high-throughput technologies urge the systems biology to achieve more precise models for complex diseases. Computational and mathematical models are gradually being used to help us understand the omics data produced by high-throughput experimental techniques. The use of computational models in systems biology allows us to explore the pathogenesis of complex diseases, improve our understanding of the latent molecular mechanisms, and promote treatment strategy optimization and new drug discovery. Currently, it is urgent to bridge the gap between the developments of high-throughput technologies and systemic modeling of the biological process in cancer research. In this review, we firstly studied several typical mathematical modeling approaches of biological systems in different scales and deeply analyzed their characteristics, advantages, applications, and limitations. Next, three potential research directions in systems modeling were summarized. To conclude, this review provides an update of important solutions using computational modeling approaches in systems biology.
Collapse
|