1
|
Acute cytotoxicity test of PM 2.5, NNK and BPDE in human normal bronchial epithelial cells: A comparison of a co-culture model containing macrophages and a mono-culture model. Toxicol In Vitro 2022; 85:105480. [PMID: 36152786 DOI: 10.1016/j.tiv.2022.105480] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2022] [Revised: 09/09/2022] [Accepted: 09/18/2022] [Indexed: 11/23/2022]
Abstract
BACKGROUND Based on extensive research on cytotoxicity of exogenous compounds in vitro, it is essential to develop a cell model that better mimics environment in vivo to explore cytotoxic mechanisms of exogenous compounds. METHODS A co-culture system was established using a transwell system with Beas-2B and U937 cells. Cells were treated with fine particulate matter (PM2.5; 25, 50 and 100 μg/mL), nicotine-derived nitrosamine ketone (NNK; 50, 100 and 200 μg/mL) and benzo(a)pyrene diol epoxide (BPDE; 0.5, 2 and 8 μM) for 24 h. Cell proliferation, apoptosis and cell cycle, DNA damage were detected by CCK-8 and EdU, flow cytometry, and comet assay, respectively. Differentially expressed transcript and cytokine concentrations were determined by transcriptome sequencing and Cytokine Array, respectively. RESULTS Compared with mono-culture, cell proliferation increased, apoptosis decreased, and DNA damage decreased in a dose-response relationship in co-culture. Gene expression profile was significantly different in co-culture, with significantly increased expression levels of 48 cytokines in co-culture. CONCLUSION Cytotoxic damage to Beas-2B cells induced by exogenous carcinogens, including PM2.5, NNK and BPDE, was significantly reduced in a co-culture system compared with a mono-culture system. The mechanism may be related to changes in expression of cytokines, such as LIF, and activation of related pathways, such as TNF signaling pathway. Cytotoxic damage to Beas-2B induced by PM2.5, NNK and BPDE, was significantly reduced in co-culture. The mechanism may be related to changes in expression of cytokines and activation of related pathways. These findings provide new insights into cytotoxicity and experimental basis for safety evaluations of exogenous carcinogens.
Collapse
|
2
|
Tan H, Sun Q, Li G, Xiao Q, Ding P, Luo J, Liang C. Multiview Consensus Graph Learning for lncRNA-Disease Association Prediction. Front Genet 2020; 11:89. [PMID: 32153646 PMCID: PMC7047769 DOI: 10.3389/fgene.2020.00089] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2019] [Accepted: 01/27/2020] [Indexed: 12/11/2022] Open
Abstract
Long noncoding RNAs (lncRNAs) are a class of noncoding RNA molecules longer than 200 nucleotides. Recent studies have uncovered their functional roles in diverse cellular processes and tumorigenesis. Therefore, identifying novel disease-related lncRNAs might deepen our understanding of disease etiology. However, due to the relatively small number of verified associations between lncRNAs and diseases, it remains a challenging task to reliably and effectively predict the associated lncRNAs for given diseases. In this paper, we propose a novel multiview consensus graph learning method to infer potential disease-related lncRNAs. Specifically, we first construct a set of similarity matrices for lncRNAs and diseases by taking advantage of the known associations. We then iteratively learn a consensus graph from the multiple input matrices and simultaneously optimize the predicted association probability based on a multi-label learning framework. To convey the utility of our method, three state-of-the-art methods are compared with our method on three widely used datasets. The experiment results illustrate that our method could obtain the best prediction performance under different cross validation schemes. The case study analysis implemented for uterine cervical neoplasms further confirmed the utility of our method in identifying lncRNAs as potential prognostic biomarkers in practice.
Collapse
Affiliation(s)
- Haojiang Tan
- School of Information Science and Engineering, Shandong Normal University, Jinan, China
| | - Quanmeng Sun
- School of Information Science and Engineering, Shandong Normal University, Jinan, China
| | - Guanghui Li
- School of Information Engineering, East China Jiaotong University, Nanchang, China
| | - Qiu Xiao
- College of Information Science and Engineering, Hunan Normal University, Changsha, China
| | - Pingjian Ding
- School of Computer Science, University of South China, Hengyang, China
| | - Jiawei Luo
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, China
| | - Cheng Liang
- School of Information Science and Engineering, Shandong Normal University, Jinan, China
| |
Collapse
|
3
|
Cai J, Wang D, Chen R, Niu Y, Ye X, Su R, Xiao G, Wei L. A Bioinformatics Tool for the Prediction of DNA N6-Methyladenine Modifications Based on Feature Fusion and Optimization Protocol. Front Bioeng Biotechnol 2020; 8:502. [PMID: 32582654 PMCID: PMC7287168 DOI: 10.3389/fbioe.2020.00502] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2020] [Accepted: 04/29/2020] [Indexed: 01/04/2023] Open
Abstract
DNA N6-methyladenine (6mA) is closely involved with various biological processes. Identifying the distributions of 6mA modifications in genome-scale is of great significance to in-depth understand the functions. In recent years, various experimental and computational methods have been proposed for this purpose. Unfortunately, existing methods cannot provide accurate and fast 6mA prediction. In this study, we present 6mAPred-FO, a bioinformatics tool that enables researchers to make predictions based on sequences only. To sufficiently capture the characteristics of 6mA sites, we integrate the sequence-order information with nucleotide positional specificity information for feature encoding, and further improve the feature representation capacity by analysis of variance-based feature optimization protocol. The experimental results show that using this feature protocol, we can significantly improve the predictive performance. Via further feature analysis, we found that the sequence-order information and positional specificity information are complementary to each other, contributing to the performance improvement. On the other hand, the improvement is also due to the use of the feature optimization protocol, which is capable of effectively capturing the most informative features from the original feature space. Moreover, benchmarking comparison results demonstrate that our 6mAPred-FO outperforms several existing predictors. Finally, we establish a web-server that implements the proposed method for convenience of researchers' use, which is currently available at http://server.malab.cn/6mAPred-FO.
Collapse
Affiliation(s)
- Jianhua Cai
- Fujian Provincial Key Laboratory of Information Processing and Intelligent Control, College of Computer and Control Engineering, Minjiang University, Fuzhou, China
- College of Mathematics and Computer Science, Fuzhou University, Fuzhou, China
| | - Donghua Wang
- Department of General Surgery, Heilongjiang Province Land Reclamation Headquarters General Hospital, Harbin, China
| | - Riqing Chen
- College of Computer and Information Sciences, Fujian Agriculture and Forestry University, Fuzhou, China
| | - Yuzhen Niu
- Fujian Provincial Key Laboratory of Information Processing and Intelligent Control, College of Computer and Control Engineering, Minjiang University, Fuzhou, China
| | - Xiucai Ye
- Department of Computer Science, University of Tsukuba, Tsukuba, Japan
| | - Ran Su
- College of Intelligence and Computing, Tianjin University, Tianjin, China
| | - Guobao Xiao
- Fujian Provincial Key Laboratory of Information Processing and Intelligent Control, College of Computer and Control Engineering, Minjiang University, Fuzhou, China
- *Correspondence: Guobao Xiao
| | - Leyi Wei
- Fujian Provincial Key Laboratory of Information Processing and Intelligent Control, College of Computer and Control Engineering, Minjiang University, Fuzhou, China
- School of Software, Shandong University, Jinan, China
- Leyi Wei
| |
Collapse
|
4
|
Shan X, Wang X, Li CD, Chu Y, Zhang Y, Xiong Y, Wei DQ. Prediction of CYP450 Enzyme–Substrate Selectivity Based on the Network-Based Label Space Division Method. J Chem Inf Model 2019; 59:4577-4586. [DOI: 10.1021/acs.jcim.9b00749] [Citation(s) in RCA: 42] [Impact Index Per Article: 8.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2023]
Affiliation(s)
- Xiaoqi Shan
- State Key Laboratory of Microbial Metabolism, School of Life Sciences and Biotechnology, and Joint Laboratory of International Cooperation in Metabolic and Developmental Sciences, Ministry of Education, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Xiangeng Wang
- State Key Laboratory of Microbial Metabolism, School of Life Sciences and Biotechnology, and Joint Laboratory of International Cooperation in Metabolic and Developmental Sciences, Ministry of Education, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Cheng-dong Li
- State Key Laboratory of Microbial Metabolism, School of Life Sciences and Biotechnology, and Joint Laboratory of International Cooperation in Metabolic and Developmental Sciences, Ministry of Education, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Yanyi Chu
- State Key Laboratory of Microbial Metabolism, School of Life Sciences and Biotechnology, and Joint Laboratory of International Cooperation in Metabolic and Developmental Sciences, Ministry of Education, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Yufang Zhang
- State Key Laboratory of Microbial Metabolism, School of Life Sciences and Biotechnology, and Joint Laboratory of International Cooperation in Metabolic and Developmental Sciences, Ministry of Education, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Yi Xiong
- State Key Laboratory of Microbial Metabolism, School of Life Sciences and Biotechnology, and Joint Laboratory of International Cooperation in Metabolic and Developmental Sciences, Ministry of Education, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Dong-Qing Wei
- State Key Laboratory of Microbial Metabolism, School of Life Sciences and Biotechnology, and Joint Laboratory of International Cooperation in Metabolic and Developmental Sciences, Ministry of Education, Shanghai Jiao Tong University, Shanghai 200240, China
- Peng Cheng Laboratory, Vanke Cloud City Phase I Building 8, Xili Street, Nanshan
District, Shenzhen, Guangdong 518055, China
| |
Collapse
|
5
|
Wang X, Wang Y, Xu Z, Xiong Y, Wei DQ. ATC-NLSP: Prediction of the Classes of Anatomical Therapeutic Chemicals Using a Network-Based Label Space Partition Method. Front Pharmacol 2019; 10:971. [PMID: 31543820 PMCID: PMC6739564 DOI: 10.3389/fphar.2019.00971] [Citation(s) in RCA: 20] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2019] [Accepted: 07/29/2019] [Indexed: 01/12/2023] Open
Abstract
Anatomical Therapeutic Chemical (ATC) classification system proposed by the World Health Organization is a widely accepted drug classification scheme in both academic and industrial realm. It is a multilabeling system which categorizes drugs into multiple classes according to their therapeutic, pharmacological, and chemical attributes. In this study, we adopted a data-driven network-based label space partition (NLSP) method for prediction of ATC classes of a given compound within the multilabel learning framework. The proposed method ATC-NLSP is trained on the similarity-based features such as chemical–chemical interaction and structural and fingerprint similarities of a compound to other compounds belonging to the different ATC categories. The NLSP method trains predictors for each label cluster (possibly intersecting) detected by community detection algorithms and takes the ensemble labels for a compound as final prediction. Experimental evaluation based on the jackknife test on the benchmark dataset demonstrated that our method has boosted the absolute true rate, which is the most stringent evaluation metrics in this study, from 0.6330 to 0.7497, in comparison to the state-of-the-art approaches. Moreover, the community structures of the label relation graph were detected through the label propagation method. The advantage of multilabel learning over the single-label models was shown by label-wise analysis. Our study indicated that the proposed method ATC-NLSP, which adopts ideas from network research community and captures the correlation of labels in a data driven manner, is the top-performing model in the ATC prediction task. We believed that the power of NLSP remains to be unleashed for the multilabel learning tasks in drug discovery. The source codes are freely available at https://github.com/dqwei-lab/ATC.
Collapse
Affiliation(s)
- Xiangeng Wang
- State Key Laboratory of Microbial Metabolism, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, China
| | - Yanjing Wang
- State Key Laboratory of Microbial Metabolism, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, China
| | - Zhenyu Xu
- State Key Laboratory of Microbial Metabolism, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, China
| | - Yi Xiong
- State Key Laboratory of Microbial Metabolism, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, China
| | - Dong-Qing Wei
- State Key Laboratory of Microbial Metabolism, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, China
| |
Collapse
|
6
|
PredLnc-GFStack: A Global Sequence Feature Based on a Stacked Ensemble Learning Method for Predicting lncRNAs from Transcripts. Genes (Basel) 2019; 10:genes10090672. [PMID: 31484412 PMCID: PMC6770532 DOI: 10.3390/genes10090672] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2019] [Revised: 08/05/2019] [Accepted: 08/28/2019] [Indexed: 11/16/2022] Open
Abstract
Long non-coding RNAs (lncRNAs) are a class of RNAs with the length exceeding 200 base pairs (bps), which do not encode proteins, nevertheless, lncRNAs have many vital biological functions. A large number of novel transcripts were discovered as a result of the development of high-throughput sequencing technology. Under this circumstance, computational methods for lncRNA prediction are in great demand. In this paper, we consider global sequence features and propose a stacked ensemble learning-based method to predict lncRNAs from transcripts, abbreviated as PredLnc-GFStack. We extract the critical features from the candidate feature list using the genetic algorithm (GA) and then employ the stacked ensemble learning method to construct PredLnc-GFStack model. Computational experimental results show that PredLnc-GFStack outperforms several state-of-the-art methods for lncRNA prediction. Furthermore, PredLnc-GFStack demonstrates an outstanding ability for cross-species ncRNA prediction.
Collapse
|
7
|
Zhang W, Jing K, Huang F, Chen Y, Li B, Li J, Gong J. SFLLN: A sparse feature learning ensemble method with linear neighborhood regularization for predicting drug–drug interactions. Inf Sci (N Y) 2019. [DOI: 10.1016/j.ins.2019.05.017] [Citation(s) in RCA: 22] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
|
8
|
Gupta AK, Murthy T, Paul KV, Ramirez O, Fisher JB, Rao S, Rosenberg AB, Seelig G, Minella AC, Pillai MM. Degenerate minigene library analysis enables identification of altered branch point utilization by mutant splicing factor 3B1 (SF3B1). Nucleic Acids Res 2019; 47:970-980. [PMID: 30462273 PMCID: PMC6344872 DOI: 10.1093/nar/gky1161] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2018] [Accepted: 10/31/2018] [Indexed: 12/13/2022] Open
Abstract
Cancer-associated mutations of the core splicing factor 3 B1 (SF3B1) result in selection of novel 3′ splice sites (3′SS), but precise molecular mechanisms of oncogenesis remain unclear. SF3B1 stabilizes the interaction between U2 snRNP and branch point (BP) on the pre-mRNA. It has hence been speculated that a change in BP selection is the basis for novel 3′SS selection. Direct quantitative determination of BP utilization is however technically challenging. To define BP utilization by SF3B1-mutant spliceosomes, we used an overexpression approach in human cells as well as a complementary strategy using isogenic murine embryonic stem cells with monoallelic K700E mutations constructed via CRISPR/Cas9-based genome editing and a dual vector homology-directed repair methodology. A synthetic minigene library with degenerate regions in 3′ intronic regions (3.4 million individual minigenes) was used to compare BP usage of SF3B1K700E and SF3B1WT. Using this model, we show that SF3B1K700E spliceosomes utilize non-canonical sequence variants (at position −1 relative to BP adenosine) more frequently than wild-type spliceosomes. These predictions were confirmed using minigene splicing assays. Our results suggest a model of BP utilization by mutant SF3B1 wherein it is able to utilize non-consensus alternative BP sequences by stabilizing weaker U2-BP interactions.
Collapse
Affiliation(s)
| | - Tushar Murthy
- Driskill Graduate Program, Northwestern University Feinberg School of Medicine, Chicago, IL, USA
| | - Kiran V Paul
- Section of Hematology, Yale Cancer Center, New Haven, CT, USA
| | - Oscar Ramirez
- Section of Hematology, Yale Cancer Center, New Haven, CT, USA
| | - Joseph B Fisher
- Blood Research Institute, BloodCenter of Wisconsin, Milwaukee, WI, USA
| | - Sridhar Rao
- Blood Research Institute, BloodCenter of Wisconsin, Milwaukee, WI, USA
| | | | - Georg Seelig
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA, USA
| | - Alex C Minella
- Blood Research Institute, BloodCenter of Wisconsin, Milwaukee, WI, USA
- Correspondence may also be addressed to Alex C. Minella. Tel: +1 414 937 6238;
| | - Manoj M Pillai
- Section of Hematology, Yale Cancer Center, New Haven, CT, USA
- To whom correspondence should be addressed. Tel: +1 203 737 6403;
| |
Collapse
|
9
|
Xiong Y, Qiao Y, Kihara D, Zhang HY, Zhu X, Wei DQ. Survey of Machine Learning Techniques for Prediction of the Isoform Specificity of Cytochrome P450 Substrates. Curr Drug Metab 2019; 20:229-235. [PMID: 30338736 DOI: 10.2174/1389200219666181019094526] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2018] [Revised: 08/05/2018] [Accepted: 08/06/2018] [Indexed: 12/23/2022]
Abstract
Background:Determination or prediction of the Absorption, Distribution, Metabolism, and Excretion (ADME) properties of drug candidates and drug-induced toxicity plays crucial roles in drug discovery and development. Metabolism is one of the most complicated pharmacokinetic properties to be understood and predicted. However, experimental determination of the substrate binding, selectivity, sites and rates of metabolism is time- and recourse- consuming. In the phase I metabolism of foreign compounds (i.e., most of drugs), cytochrome P450 enzymes play a key role. To help develop drugs with proper ADME properties, computational models are highly desired to predict the ADME properties of drug candidates, particularly for drugs binding to cytochrome P450.Objective:This narrative review aims to briefly summarize machine learning techniques used in the prediction of the cytochrome P450 isoform specificity of drug candidates.Results:Both single-label and multi-label classification methods have demonstrated good performance on modelling and prediction of the isoform specificity of substrates based on their quantitative descriptors.Conclusion:This review provides a guide for researchers to develop machine learning-based methods to predict the cytochrome P450 isoform specificity of drug candidates.
Collapse
Affiliation(s)
- Yi Xiong
- State Key Laboratory of Microbial Metabolism, and School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Yanhua Qiao
- School of Life Sciences, Anhui University, Hefei, Anhui 230601, China
| | - Daisuke Kihara
- Department of Biological Science, Purdue University, West Lafayette, IN 47907, United States
| | - Hui-Yuan Zhang
- State Key Laboratory of Microbial Metabolism, and School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Xiaolei Zhu
- School of Life Sciences, Anhui University, Hefei, Anhui 230601, China
| | - Dong-Qing Wei
- State Key Laboratory of Microbial Metabolism, and School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai 200240, China
| |
Collapse
|
10
|
Tang G, Shi J, Wu W, Yue X, Zhang W. Sequence-based bacterial small RNAs prediction using ensemble learning strategies. BMC Bioinformatics 2018; 19:503. [PMID: 30577759 PMCID: PMC6302447 DOI: 10.1186/s12859-018-2535-1] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023] Open
Abstract
Background Bacterial small non-coding RNAs (sRNAs) have emerged as important elements in diverse physiological processes, including growth, development, cell proliferation, differentiation, metabolic reactions and carbon metabolism, and attract great attention. Accurate prediction of sRNAs is important and challenging, and helps to explore functions and mechanism of sRNAs. Results In this paper, we utilize a variety of sRNA sequence-derived features to develop ensemble learning methods for the sRNA prediction. First, we compile a balanced dataset and four imbalanced datasets. Then, we investigate various sRNA sequence-derived features, such as spectrum profile, mismatch profile, reverse compliment k-mer and pseudo nucleotide composition. Finally, we consider two ensemble learning strategies to integrate all features for building ensemble learning models for the sRNA prediction. One is the weighted average ensemble method (WAEM), which uses the linear weighted sum of outputs from the individual feature-based predictors to predict sRNAs. The other is the neural network ensemble method (NNEM), which trains a deep neural network by combining diverse features. In the computational experiments, we evaluate our methods on these five datasets by using 5-fold cross validation. WAEM and NNEM can produce better results than existing state-of-the-art sRNA prediction methods. Conclusions WAEM and NNEM have great potential for the sRNA prediction, and are helpful for understanding the biological mechanism of bacteria.
Collapse
Affiliation(s)
- Guifeng Tang
- School of Computer Science, Wuhan University, Wuhan, 430072, China
| | - Jingwen Shi
- School of Mathematics and Statistics, Wuhan University, Wuhan, 430072, China
| | - Wenjian Wu
- Electronic Information School, Wuhan University, Wuhan, 430072, China
| | - Xiang Yue
- Department of Computer Science and Engineering, The Ohio State University, Columbus, OH, 43210, USA
| | - Wen Zhang
- College of Informatics, Huazhong Agricultural University, Wuhan, 430070, China.
| |
Collapse
|
11
|
Manifold regularized matrix factorization for drug-drug interaction prediction. J Biomed Inform 2018; 88:90-97. [DOI: 10.1016/j.jbi.2018.11.005] [Citation(s) in RCA: 48] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2017] [Revised: 11/03/2018] [Accepted: 11/11/2018] [Indexed: 12/20/2022]
|
12
|
Shen Y, Tang J, Guo F. Identification of protein subcellular localization via integrating evolutionary and physicochemical information into Chou's general PseAAC. J Theor Biol 2018; 462:230-239. [PMID: 30452958 DOI: 10.1016/j.jtbi.2018.11.012] [Citation(s) in RCA: 101] [Impact Index Per Article: 16.8] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2018] [Revised: 11/07/2018] [Accepted: 11/15/2018] [Indexed: 01/07/2023]
Abstract
Identifying the location of proteins in a cell plays an important role in understanding their functions, such as drug design, therapeutic target discovery and biological research. However, the traditional subcellular localization experiments are time-consuming, laborious and small scale. With the development of next-generation sequencing technology, the number of proteins has grown exponentially, which lays the foundation of the computational method for identifying protein subcellular localization. Although many methods for predicting subcellular localization of proteins have been proposed, most of them are limited to single-location. In this paper, we propose a multi-kernel SVM to predict subcellular localization of both multi-location and single-location proteins. First, we make use of the evolutionary information extracted from position specific scoring matrix (PSSM) and physicochemical properties of proteins, by Chou's general PseAAC and other efficient functions. Then, we propose a multi-kernel support vector machine (SVM) model to identify multi-label protein subcellular localization. As a result, our method has a good performance on predicting subcellular localization of proteins. It achieves an average precision of 0.7065 and 0.6889 on two human datasets, respectively. All results are higher than those achieved by other existing methods. Therefore, we provide an efficient system via a novel perspective to study the protein subcellular localization.
Collapse
Affiliation(s)
- Yinan Shen
- School of Computer Science and Technology, College of Intelligence and Computing, Tianjin University, Yaguan Road, Jinnan District, Tianjin, PR China.
| | - Jijun Tang
- School of Computer Science and Technology, College of Intelligence and Computing, Tianjin University, Yaguan Road, Jinnan District, Tianjin, PR China; School of Computational Science and Engineering, University of South Carolina, Columbia, USA.
| | - Fei Guo
- School of Computer Science and Technology, College of Intelligence and Computing, Tianjin University, Yaguan Road, Jinnan District, Tianjin, PR China.
| |
Collapse
|
13
|
He W, Ju Y, Zeng X, Liu X, Zou Q. Sc-ncDNAPred: A Sequence-Based Predictor for Identifying Non-coding DNA in Saccharomyces cerevisiae. Front Microbiol 2018; 9:2174. [PMID: 30258427 PMCID: PMC6144933 DOI: 10.3389/fmicb.2018.02174] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2018] [Accepted: 08/24/2018] [Indexed: 12/22/2022] Open
Abstract
With the rapid development of high-speed sequencing technologies and the implementation of many whole genome sequencing project, research in the genomics is advancing from genome sequencing to genome synthesis. Synthetic biology technologies such as DNA-based molecular assemblies, genome editing technology, directional evolution technology and DNA storage technology, and other cutting-edge technologies emerge in succession. Especially the rapid growth and development of DNA assembly technology may greatly push forward the success of artificial life. Meanwhile, DNA assembly technology needs a large number of target sequences of known information as data support. Non-coding DNA (ncDNA) sequences occupy most of the organism genomes, thus accurate recognizing of them is necessary. Although experimental methods have been proposed to detect ncDNA sequences, they are expensive for performing genome wide detections. Thus, it is necessary to develop machine-learning methods for predicting non-coding DNA sequences. In this study, we collected the ncDNA benchmark dataset of Saccharomyces cerevisiae and reported a support vector machine-based predictor, called Sc-ncDNAPred, for predicting ncDNA sequences. The optimal feature extraction strategy was selected from a group included mononucleotide, dimer, trimer, tetramer, pentamer, and hexamer, using support vector machine learning method. Sc-ncDNAPred achieved an overall accuracy of 0.98. For the convenience of users, an online web-server has been built at: http://server.malab.cn/Sc_ncDNAPred/index.jsp.
Collapse
Affiliation(s)
- Wenying He
- School of Computer Science and Technology, Tianjin University, Tianjin, China
| | - Ying Ju
- School of Information Science and Technology, Xiamen University, Xiamen, China
| | - Xiangxiang Zeng
- School of Information Science and Technology, Xiamen University, Xiamen, China
| | - Xiangrong Liu
- School of Information Science and Technology, Xiamen University, Xiamen, China
| | - Quan Zou
- School of Computer Science and Technology, Tianjin University, Tianjin, China.,Shandong Provincial Key Laboratory of Biophysics, Institute of Biophysics, Dezhou University, Dezhou, China
| |
Collapse
|
14
|
Wang K, Hoeksema J, Liang C. piRNN: deep learning algorithm for piRNA prediction. PeerJ 2018; 6:e5429. [PMID: 30083483 PMCID: PMC6078063 DOI: 10.7717/peerj.5429] [Citation(s) in RCA: 26] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2018] [Accepted: 07/19/2018] [Indexed: 12/22/2022] Open
Abstract
Piwi-interacting RNAs (piRNAs) are the largest class of small non-coding RNAs discovered in germ cells. Identifying piRNAs from small RNA data is a challenging task due to the lack of conserved sequences and structural features of piRNAs. Many programs have been developed to identify piRNA from small RNA data. However, these programs have limitations. They either rely on extracting complicated features, or only demonstrate strong performance on transposon related piRNAs. Here we proposed a new program called piRNN for piRNA identification. For our software, we applied a convolutional neural network classifier that was trained on the datasets from four different species (Caenorhabditis elegans, Drosophila melanogaster, rat and human). A matrix of k-mer frequency values was used to represent each sequence. piRNN has great usability and shows better performance in comparison with other programs. It is freely available at https://github.com/bioinfolabmu/piRNN.
Collapse
Affiliation(s)
- Kai Wang
- Department of Biology, Miami University, Oxford, OH, USA
| | - Joshua Hoeksema
- Department of Computer Science & Software Engineering, Miami University, Oxford, OH, USA
| | - Chun Liang
- Department of Biology, Miami University, Oxford, OH, USA
| |
Collapse
|
15
|
Niu M, Li Y, Wang C, Han K. RFAmyloid: A Web Server for Predicting Amyloid Proteins. Int J Mol Sci 2018; 19:ijms19072071. [PMID: 30013015 PMCID: PMC6073578 DOI: 10.3390/ijms19072071] [Citation(s) in RCA: 27] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2018] [Revised: 07/10/2018] [Accepted: 07/12/2018] [Indexed: 12/22/2022] Open
Abstract
Amyloid is an insoluble fibrous protein and its mis-aggregation can lead to some diseases, such as Alzheimer’s disease and Creutzfeldt–Jakob’s disease. Therefore, the identification of amyloid is essential for the discovery and understanding of disease. We established a novel predictor called RFAmy based on random forest to identify amyloid, and it employed SVMProt 188-D feature extraction method based on protein composition and physicochemical properties and pse-in-one feature extraction method based on amino acid composition, autocorrelation pseudo acid composition, profile-based features and predicted structures features. In the ten-fold cross-validation test, RFAmy’s overall accuracy was 89.19% and F-measure was 0.891. Results were obtained by comparison experiments with other feature, classifiers, and existing methods. This shows the effectiveness of RFAmy in predicting amyloid protein. The RFAmy proposed in this paper can be accessed through the URL http://server.malab.cn/RFAmyloid/.
Collapse
Affiliation(s)
- Mengting Niu
- School of Information and Computer Engineering, Northeast Forestry University, Harbin 150040, China.
| | - Yanjuan Li
- School of Information and Computer Engineering, Northeast Forestry University, Harbin 150040, China.
| | - Chunyu Wang
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150040, China.
| | - Ke Han
- School of Computer and Information Engineering, Harbin University of Commerce, Harbin 150040, China.
| |
Collapse
|
16
|
Wei L, Chen H, Su R. M6APred-EL: A Sequence-Based Predictor for Identifying N6-methyladenosine Sites Using Ensemble Learning. MOLECULAR THERAPY-NUCLEIC ACIDS 2018; 12:635-644. [PMID: 30081234 PMCID: PMC6082921 DOI: 10.1016/j.omtn.2018.07.004] [Citation(s) in RCA: 136] [Impact Index Per Article: 22.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/31/2018] [Revised: 07/03/2018] [Accepted: 07/03/2018] [Indexed: 12/28/2022]
Abstract
N6-methyladenosine (m6A) modification is the most abundant RNA methylation modification and involves various biological processes, such as RNA splicing and degradation. Recent studies have demonstrated the feasibility of identifying m6A peaks using high-throughput sequencing techniques. However, such techniques cannot accurately identify specific methylated sites, which is important for a better understanding of m6A functions. In this study, we develop a novel machine learning-based predictor called M6APred-EL for the identification of m6A sites. To predict m6A sites accurately within genomic sequences, we trained an ensemble of three support vector machine classifiers that explore the position-specific information and physical chemical information from position-specific k-mer nucleotide propensity, physical-chemical properties, and ring-function-hydrogen-chemical properties. We examined and compared the performance of our predictor with other state-of-the-art methods of benchmarking datasets. Comparative results showed that the proposed M6APred-EL performed more accurately for m6A site identification. Moreover, a user-friendly web server that implements the proposed M6APred-EL is well established and is currently available at http://server.malab.cn/M6APred-EL/. It is expected to be a practical and effective tool for the investigation of m6A functional mechanisms.
Collapse
Affiliation(s)
- Leyi Wei
- School of Computer Science and Technology, Tianjin University, Tianjin, China; State Key Laboratory of Medicinal Chemical Biology, Nankai University, Tianjin, China
| | - Huangrong Chen
- School of Computer Science and Technology, Tianjin University, Tianjin, China
| | - Ran Su
- School of Computer Software, Tianjin University, Tianjin, China; State Key Laboratory of Medicinal Chemical Biology, Nankai University, Tianjin, China.
| |
Collapse
|
17
|
Zhang W, Yue X, Lin W, Wu W, Liu R, Huang F, Liu F. Predicting drug-disease associations by using similarity constrained matrix factorization. BMC Bioinformatics 2018; 19:233. [PMID: 29914348 PMCID: PMC6006580 DOI: 10.1186/s12859-018-2220-4] [Citation(s) in RCA: 133] [Impact Index Per Article: 22.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2017] [Accepted: 05/28/2018] [Indexed: 02/06/2023] Open
Abstract
Background Drug-disease associations provide important information for the drug discovery. Wet experiments that identify drug-disease associations are time-consuming and expensive. However, many drug-disease associations are still unobserved or unknown. The development of computational methods for predicting unobserved drug-disease associations is an important and urgent task. Results In this paper, we proposed a similarity constrained matrix factorization method for the drug-disease association prediction (SCMFDD), which makes use of known drug-disease associations, drug features and disease semantic information. SCMFDD projects the drug-disease association relationship into two low-rank spaces, which uncover latent features for drugs and diseases, and then introduces drug feature-based similarities and disease semantic similarity as constraints for drugs and diseases in low-rank spaces. Different from the classic matrix factorization technique, SCMFDD takes the biological context of the problem into account. In computational experiments, the proposed method can produce high-accuracy performances on benchmark datasets, and outperform existing state-of-the-art prediction methods when evaluated by five-fold cross validation and independent testing. Conclusion We developed a user-friendly web server by using known associations collected from the CTD database, available at http://www.bioinfotech.cn/SCMFDD/. The case studies show that the server can find out novel associations, which are not included in the CTD database.
Collapse
Affiliation(s)
- Wen Zhang
- School of Computer Science, Wuhan University, Wuhan, 430072, China.
| | - Xiang Yue
- School of Computer Science, Wuhan University, Wuhan, 430072, China
| | - Weiran Lin
- School of Computer Science, Wuhan University, Wuhan, 430072, China
| | - Wenjian Wu
- School of Electronic Information, Wuhan University, Wuhan, 430072, China
| | - Ruoqi Liu
- School of Computer Science, Wuhan University, Wuhan, 430072, China
| | - Feng Huang
- School of Computer Science, Wuhan University, Wuhan, 430072, China
| | - Feng Liu
- School of Computer Science, Wuhan University, Wuhan, 430072, China.
| |
Collapse
|