1
|
Sohrawordi M, Hossain MA. Prediction of lysine formylation sites using support vector machine based on the sample selection from majority classes and synthetic minority over-sampling techniques. Biochimie 2021; 192:125-135. [PMID: 34627982 DOI: 10.1016/j.biochi.2021.10.001] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2021] [Revised: 10/03/2021] [Accepted: 10/05/2021] [Indexed: 12/22/2022]
Abstract
Lysine formylation is a newly discovered and mostly interested type of post-translational modification (PTM) that is generally found on core and linker histone proteins of prokaryote and eukaryote and plays various important roles on the regulation of various cellular mechanisms. Hence, it is very urgent to properly identify formylation site in protein for understanding the molecular mechanism of formylation deeply and defining drug for relevant diseases. As experimentally identification of formylation site using traditional processes are expensive and time consuming, a simple and high speedy mathematical model for predicting accurately lysine formylation sites is highly desired. A useful computational model named PLF_SVM is deigned and proposed in this study by using binary encoding (BE), amino acid composition (AAC), reverse position relative incidence matrix (RPRIM), position relative incidence matrix (PRIM), and position specific amino acid propensity (PSAAP) feature generation methods for predicting formylated and non-formylated lysine sites. Besides, the Synthetic Minority Oversampling Technique (SMOTE) and a proposed sample selection strategy named EnSVM are applied to handle the imbalance training dataset problem. Thereafter, the optimal number of features are selected by F-score method to train the model. Finally, it has been seen that PLF_SVM outperforms the state-of-the-art approaches in validation and independent test with an accuracy of 98.61% and 98.77% respectively. At https://plf-svm.herokuapp.com/, a user-friendly web tool is also created for identifying formylation sites. Therefore, the proposed method may be helpful guideline for the analysis and prediction of formylated lysine and knowing the process of cellular regulation.
Collapse
Affiliation(s)
- Md Sohrawordi
- Dept. of Computer Science and Engineering, Rajshahi University of Engineering and Technology, Rajshahi, Bangladesh; Dept. of Computer Science and Engineering, Hajee Mohammad Danesh Science and Technology University, Dinajpur, Bangladesh.
| | - Md Ali Hossain
- Dept. of Computer Science and Engineering, Rajshahi University of Engineering and Technology, Rajshahi, Bangladesh
| |
Collapse
|
2
|
A Real-Time Artificial Intelligence-Assisted System to Predict Weaning from Ventilator Immediately after Lung Resection Surgery. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2021; 18:ijerph18052713. [PMID: 33800239 PMCID: PMC7967444 DOI: 10.3390/ijerph18052713] [Citation(s) in RCA: 21] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/11/2021] [Revised: 03/01/2021] [Accepted: 03/03/2021] [Indexed: 12/11/2022]
Abstract
Assessment of risk before lung resection surgery can provide anesthesiologists with information about whether a patient can be weaned from the ventilator immediately after surgery. However, it is difficult for anesthesiologists to perform a complete integrated risk assessment in a time-limited pre-anesthetic clinic. We retrospectively collected the electronic medical records of 709 patients who underwent lung resection between 1 January 2017 and 31 July 2019. We used the obtained data to construct an artificial intelligence (AI) prediction model with seven supervised machine learning algorithms to predict whether patients could be weaned immediately after lung resection surgery. The AI model with Naïve Bayes Classifier algorithm had the best testing result and was therefore used to develop an application to evaluate risk based on patients' previous medical data, to assist anesthesiologists, and to predict patient outcomes in pre-anesthetic clinics. The individualization and digitalization characteristics of this AI application could improve the effectiveness of risk explanations and physician-patient communication to achieve better patient comprehension.
Collapse
|
3
|
Jing XY, Li FM. Predicting Cell Wall Lytic Enzymes Using Combined Features. Front Bioeng Biotechnol 2021; 8:627335. [PMID: 33585423 PMCID: PMC7874139 DOI: 10.3389/fbioe.2020.627335] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2020] [Accepted: 12/04/2020] [Indexed: 11/13/2022] Open
Abstract
Due to the overuse of antibiotics, people are worried that existing antibiotics will become ineffective against pathogens with the rapid rise of antibiotic-resistant strains. The use of cell wall lytic enzymes to destroy bacteria has become a viable alternative to avoid the crisis of antimicrobial resistance. In this paper, an improved method for cell wall lytic enzymes prediction was proposed and the amino acid composition (AAC), the dipeptide composition (DC), the position-specific score matrix auto-covariance (PSSM-AC), and the auto-covariance average chemical shift (acACS) were selected to predict the cell wall lytic enzymes with support vector machine (SVM). In order to overcome the imbalanced data classification problems and remove redundant or irrelevant features, the synthetic minority over-sampling technique (SMOTE) was used to balance the dataset. The F-score was used to select features. The Sn, Sp, MCC, and Acc were 99.35%, 99.02%, 0.98, and 99.19% with jackknife test using the optimized combination feature AAC+DC+acACS+PSSM-AC. The Sn, Sp, MCC, and Acc of cell wall lytic enzymes in our predictive model were higher than those in existing methods. This improved method may be helpful for protein function prediction.
Collapse
Affiliation(s)
- Xiao-Yang Jing
- College of Science, Inner Mongolia Agricultural University, Hohhot, China
| | - Feng-Min Li
- College of Science, Inner Mongolia Agricultural University, Hohhot, China
| |
Collapse
|
4
|
Zhu L, Yang X, Zhu R, Yu L. Identifying Discriminative Biological Function Features and Rules for Cancer-Related Long Non-coding RNAs. Front Genet 2021; 11:598773. [PMID: 33391350 PMCID: PMC7772407 DOI: 10.3389/fgene.2020.598773] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2020] [Accepted: 11/23/2020] [Indexed: 01/17/2023] Open
Abstract
Cancer has been a major public health problem worldwide for many centuries. Cancer is a complex disease associated with accumulative genetic mutations, epigenetic aberrations, chromosomal instability, and expression alteration. Increasing lines of evidence suggest that many non-coding transcripts, which are termed as non-coding RNAs, have important regulatory roles in cancer. In particular, long non-coding RNAs (lncRNAs) play crucial roles in tumorigenesis. Cancer-related lncRNAs serve as oncogenic factors or tumor suppressors. Although many lncRNAs are identified as potential regulators in tumorigenesis by using traditional experimental methods, they are time consuming and expensive considering the tremendous amount of lncRNAs needed. Thus, effective and fast approaches to recognize tumor-related lncRNAs should be developed. The proposed approach should help us understand not only the mechanisms of lncRNAs that participate in tumorigenesis but also their satisfactory performance in distinguishing cancer-related lncRNAs. In this study, we utilized a decision tree (DT), a type of rule learning algorithm, to investigate cancer-related lncRNAs with functional annotation contents [gene ontology (GO) terms and KEGG pathways] of their co-expressed genes. Cancer-related and other lncRNAs encoded by the key enrichment features of GO and KEGG filtered by feature selection methods were used to build an informative DT, which further induced several decision rules. The rules provided not only a new tool for identifying cancer-related lncRNAs but also connected the lncRNAs and cancers with the combinations of GO terms. Results provided new directions for understanding cancer-related lncRNAs.
Collapse
Affiliation(s)
- Liucun Zhu
- School of Life Sciences, Shanghai University, Shanghai, China
| | - Xin Yang
- School of Life Sciences, Shanghai University, Shanghai, China
| | - Rui Zhu
- School of Life Sciences, Shanghai University, Shanghai, China
| | - Lei Yu
- Department of Medical Oncology, Shanghai Concord Medical Cancer Center, Shanghai, China
| |
Collapse
|
5
|
Vuttipittayamongkol P, Elyan E, Petrovski A. On the class overlap problem in imbalanced data classification. Knowl Based Syst 2021. [DOI: 10.1016/j.knosys.2020.106631] [Citation(s) in RCA: 32] [Impact Index Per Article: 10.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022]
|
6
|
Zhu JH, Yan QL, Wang JW, Chen Y, Ye QH, Wang ZJ, Huang T. The Key Genes for Perineural Invasion in Pancreatic Ductal Adenocarcinoma Identified With Monte-Carlo Feature Selection Method. Front Genet 2020; 11:554502. [PMID: 33193628 PMCID: PMC7593847 DOI: 10.3389/fgene.2020.554502] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2020] [Accepted: 08/17/2020] [Indexed: 12/20/2022] Open
Abstract
Background Pancreatic ductal adenocarcinoma (PDAC) is the most aggressive form of pancreatic cancer. Its 5-year survival rate is only 3–5%. Perineural invasion (PNI) is a process of cancer cells invading the surrounding nerves and perineural spaces. It is considered to be associated with the poor prognosis of PDAC. About 90% of pancreatic cancer patients have PNI. The high incidence of PNI in pancreatic cancer limits radical resection and promotes local recurrence, which negatively affects life quality and survival time of the patients with pancreatic cancer. Objectives To investigate the mechanism of PNI in pancreatic cancer, we analyzed the gene expression profiles of tumors and adjacent tissues from 50 PDAC patients which included 28 patients with perineural invasion and 22 patients without perineural invasion. Method Using Monte-Carlo feature selection and Incremental Feature Selection (IFS) method, we identified 26 key features within which 15 features were from tumor tissues and 11 features were from adjacent tissues. Results Our results suggested that not only the tumor tissue, but also the adjacent tissue, was informative for perineural invasion prediction. The SVM classifier based on these 26 key features can predict perineural invasion accurately, with a high accuracy of 0.94 evaluated with leave-one-out cross validation (LOOCV). Conclusion The in-depth biological analysis of key feature genes, such as TNFRSF14, XPO1, and ATF3, shed light on the understanding of perineural invasion in pancreatic ductal adenocarcinoma.
Collapse
Affiliation(s)
- Jin-Hui Zhu
- Department of General Surgery, The Second Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, China
| | - Qiu-Liang Yan
- Department of General Surgery, Jinhua People's Hospital, Jinhua, China
| | - Jian-Wei Wang
- Department of Surgical Oncology, The Second Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, China
| | - Yan Chen
- Department of General Surgery, The Second Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, China
| | - Qing-Huang Ye
- Department of General Surgery, The Second Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, China
| | - Zhi-Jiang Wang
- Department of General Surgery, The Second Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, China
| | - Tao Huang
- Shanghai Institute of Nutrition and Health, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, China
| |
Collapse
|
7
|
Identification of Latent Oncogenes with a Network Embedding Method and Random Forest. BIOMED RESEARCH INTERNATIONAL 2020; 2020:5160396. [PMID: 33029511 PMCID: PMC7530476 DOI: 10.1155/2020/5160396] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/20/2020] [Revised: 09/09/2020] [Accepted: 09/14/2020] [Indexed: 12/29/2022]
Abstract
Oncogene is a special type of genes, which can promote the tumor initiation. Good study on oncogenes is helpful for understanding the cause of cancers. Experimental techniques in early time are quite popular in detecting oncogenes. However, their defects become more and more evident in recent years, such as high cost and long time. The newly proposed computational methods provide an alternative way to study oncogenes, which can provide useful clues for further investigations on candidate genes. Considering the limitations of some previous computational methods, such as lack of learning procedures and terming genes as individual subjects, a novel computational method was proposed in this study. The method adopted the features derived from multiple protein networks, viewing proteins in a system level. A classic machine learning algorithm, random forest, was applied on these features to capture the essential characteristic of oncogenes, thereby building the prediction model. All genes except validated oncogenes were ranked with a measurement yielded by the prediction model. Top genes were quite different from potential oncogenes discovered by previous methods, and they can be confirmed to become novel oncogenes. It was indicated that the newly identified genes can be essential supplements for previous results.
Collapse
|
8
|
Ren X, Wang S, Huang T. Decipher the connections between proteins and phenotypes. BIOCHIMICA ET BIOPHYSICA ACTA-PROTEINS AND PROTEOMICS 2020; 1868:140503. [PMID: 32707349 DOI: 10.1016/j.bbapap.2020.140503] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/31/2020] [Revised: 06/30/2020] [Accepted: 07/16/2020] [Indexed: 10/23/2022]
Abstract
As the outward-most representation of life, phenotype is the fundamental basis with which humans understand life and disease. But with the advent of molecular and sequencing technique and research, a growing portion of science research focuses primarily on the molecular level of life. Our understanding in molecular variations and mechanisms can only be fully utilized when they are translated into the phenotypic level. In this study, we constructed similarity network for phenotype ontology, and then applied network analysis methods to discover phenotype/disease clusters. Then, we used machine learning models to predict protein-phenotype associations. Each protein was characterized by the functional profiles of its interaction neighbors on the protein-protein interaction network. Our methods can not only predict protein-phenotype associations, but also reveal the underlying mechanisms from protein to phenotype.
Collapse
Affiliation(s)
- Xiaohui Ren
- Shanghai Institute of Nutrition and Health, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, China
| | - Steven Wang
- Department of Molecular Biology, Columbia University, New York, USA
| | - Tao Huang
- Shanghai Institute of Nutrition and Health, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, China.
| |
Collapse
|
9
|
Tao X, Wu X, Huang T, Mu D. Identification and Analysis of Dysfunctional Genes and Pathways in CD8 + T Cells of Non-Small Cell Lung Cancer Based on RNA Sequencing. Front Genet 2020; 11:352. [PMID: 32457792 PMCID: PMC7227791 DOI: 10.3389/fgene.2020.00352] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2020] [Accepted: 03/23/2020] [Indexed: 12/26/2022] Open
Abstract
Lung cancer, the most common of malignant tumors, is typically of the non-small cell (NSCLC) type. T-cell-based immunotherapies are a promising and powerful approach to treating NSCLCs. To characterize the CD8+ T cells of non-small cell lung cancer, we re-analyzed the published RNA-Seq gene expression profiles of 36 CD8+ T cell isolated from tumor (TIL) samples and 32 adjacent uninvolved lung (NTIL) samples. With an advanced Monte Carlo method of feature selection, we identified the CD8+ TIL specific expression patterns. These patterns revealed the key dysfunctional genes and pathways in CD8+ TIL and shed light on the molecular mechanisms of immunity and use of immunotherapy.
Collapse
Affiliation(s)
- Xuefang Tao
- Affiliated Hospital of Shaoxing University, Shaoxing, China
| | - Xiaotang Wu
- Shanghai Engineering Research Center of Pharmaceutical Translation, Shanghai, China
| | - Tao Huang
- Shanghai Institute of Nutrition and Health, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, China
| | - Deguang Mu
- Department of Respiratory Medicine, Zhejiang Provincial People's Hospital, People's Hospital of Hangzhou Medical College, Hangzhou, China
| |
Collapse
|
10
|
Shin S, Hong JH, Na Y, Lee M, Qian WJ, Kim VN, Kim JS. Development of Multiplexed Immuno-N-Terminomics to Reveal the Landscape of Proteolytic Processing in Early Embryogenesis of Drosophila melanogaster. Anal Chem 2020; 92:4926-4934. [DOI: 10.1021/acs.analchem.9b05035] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
Affiliation(s)
- Sanghee Shin
- Center for RNA Research, Institute for Basic Science, Seoul 08826, Korea
- School of Biological Sciences, Seoul National University, Seoul 08826, Korea
| | - Ji Hye Hong
- Center for RNA Research, Institute for Basic Science, Seoul 08826, Korea
- School of Biological Sciences, Seoul National University, Seoul 08826, Korea
| | - Yongwoo Na
- Center for RNA Research, Institute for Basic Science, Seoul 08826, Korea
- School of Biological Sciences, Seoul National University, Seoul 08826, Korea
| | - Mihye Lee
- Soonchunhyang Institute of Medi-bio Science, Soonchunhyang University, Cheonan-si, Chungcheongnam-do 31151, Korea
| | - Wei-Jun Qian
- Integrative Omics, Biological Sciences Division, Pacific Northwest National Laboratory, Richland, Washington 99352, United States
| | - V. Narry Kim
- Center for RNA Research, Institute for Basic Science, Seoul 08826, Korea
- School of Biological Sciences, Seoul National University, Seoul 08826, Korea
| | - Jong-Seo Kim
- Center for RNA Research, Institute for Basic Science, Seoul 08826, Korea
- School of Biological Sciences, Seoul National University, Seoul 08826, Korea
| |
Collapse
|
11
|
Chen L, Pan X, Guo W, Gan Z, Zhang YH, Niu Z, Huang T, Cai YD. Investigating the gene expression profiles of cells in seven embryonic stages with machine learning algorithms. Genomics 2020; 112:2524-2534. [PMID: 32045671 DOI: 10.1016/j.ygeno.2020.02.004] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2019] [Revised: 12/26/2019] [Accepted: 02/07/2020] [Indexed: 12/15/2022]
Abstract
The development of embryonic cells involves several continuous stages, and some genes are related to embryogenesis. To date, few studies have systematically investigated changes in gene expression profiles during mammalian embryogenesis. In this study, a computational analysis using machine learning algorithms was performed on the gene expression profiles of mouse embryonic cells at seven stages. First, the profiles were analyzed through a powerful Monte Carlo feature selection method for the generation of a feature list. Second, increment feature selection was applied on the list by incorporating two classification algorithms: support vector machine (SVM) and repeated incremental pruning to produce error reduction (RIPPER). Through SVM, we extracted several latent gene biomarkers, indicating the stages of embryonic cells, and constructed an optimal SVM classifier that produced a nearly perfect classification of embryonic cells. Furthermore, some interesting rules were accessed by the RIPPER algorithm, suggesting different expression patterns for different stages.
Collapse
Affiliation(s)
- Lei Chen
- School of Life Sciences, Shanghai University, Shanghai 200444, China; College of Information Engineering, Shanghai Maritime University, Shanghai 201306, China; Shanghai Key Laboratory of PMMP, East China Normal University, Shanghai 200241, China.
| | - XiaoYong Pan
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, Key Laboratory of System Control and Information Processing, Ministry of Education of China, 200240 Shanghai, China.
| | - Wei Guo
- Institute of Health Sciences, Shanghai Jiao Tong University School of Medicine, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai 200031, China.
| | - Zijun Gan
- Shanghai Institute of Nutrition and Health, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai 200031, China.
| | - Yu-Hang Zhang
- Shanghai Institute of Nutrition and Health, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai 200031, China; Channing Division of Network Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA.
| | - Zhibin Niu
- College of Intelligence and Computing, Tianjin University, Tianjin 300072, China.
| | - Tao Huang
- Shanghai Institute of Nutrition and Health, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai 200031, China.
| | - Yu-Dong Cai
- School of Life Sciences, Shanghai University, Shanghai 200444, China.
| |
Collapse
|
12
|
Analysis of Protein-Protein Functional Associations by Using Gene Ontology and KEGG Pathway. BIOMED RESEARCH INTERNATIONAL 2019; 2019:4963289. [PMID: 31396531 PMCID: PMC6668538 DOI: 10.1155/2019/4963289] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/04/2019] [Revised: 06/04/2019] [Accepted: 06/26/2019] [Indexed: 12/19/2022]
Abstract
Protein–protein interaction (PPI) plays an extremely remarkable role in the growth, reproduction, and metabolism of all lives. A thorough investigation of PPI can uncover the mechanism of how proteins express their functions. In this study, we used gene ontology (GO) terms and biological pathways to study an extended version of PPI (protein–protein functional associations) and subsequently identify some essential GO terms and pathways that can indicate the difference between two proteins with and without functional associations. The protein–protein functional associations validated by experiments were retrieved from STRING, a well-known database on collected associations between proteins from multiple sources, and they were termed as positive samples. The negative samples were constructed by randomly pairing two proteins. Each sample was represented by several features based on GO and KEGG pathway information of two proteins. Then, the mutual information was adopted to evaluate the importance of all features and some important ones could be accessed, from which a number of essential GO terms or KEGG pathways were identified. The final analysis of some important GO terms and one KEGG pathway can partly uncover the difference between proteins with and without functional associations.
Collapse
|
13
|
Chen X, Jin Y, Feng Y. Evaluation of Plasma Extracellular Vesicle MicroRNA Signatures for Lung Adenocarcinoma and Granuloma With Monte-Carlo Feature Selection Method. Front Genet 2019; 10:367. [PMID: 31105742 PMCID: PMC6498093 DOI: 10.3389/fgene.2019.00367] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2018] [Accepted: 04/05/2019] [Indexed: 12/24/2022] Open
Abstract
Extracellular Vesicle (EV) is a compilation of secreted vesicles, including micro vesicles, large oncosomes, and exosomes. It can be used in non-invasive diagnosis. MicroRNAs (miRNAs) processed by exosomes can be detected by liquid biopsy. To objectively evaluate the discriminative ability of miRNAs from whole plasma, EV and EV-free plasma, we analyzed the miRNA expression profiles in whole plasma, EV and EV-free plasma of 10 lung adenocarcinoma and 9 granuloma patients. With Monte-Carlo feature selection method, the top discriminative miRNAs in whole plasma, EV and EV-free plasma were identified, and they were quite different. Using the Repeated Incremental Pruning to Produce Error Reduction (RIPPER) method, we learned the classification rules: in whole plasma, granuloma patients did not express hsa-miR-223-3p while the lung adenocarcinoma patients expressed hsa-miR-223-3p; in EV, the hsa-miR-23b-3p was highly expressed in granuloma patients but not lung adenocarcinoma patients; in EV-free plasma, hsa-miR-376a-3p was expressed in granuloma patients but barely expressed in lung adenocarcinoma patients. For prediction performance, whole plasma had the highest weighted accuracy and EV outperformed EV-free plasma. Our results suggested that EV can be used as lung cancer biomarker. However, since it is less stable and not easy to detect, there are still technological difficulties to overcome.
Collapse
Affiliation(s)
- Xiangbo Chen
- Key Laboratory of Molecular Epigenetics of the Ministry of Education, Northeast Normal University, Changchun, China.,Hangzhou Baocheng Biotechnology Co., Ltd., Hangzhou, China
| | - Yunjie Jin
- Department of Oncology, Shanghai Putuo People's Hospital, Shanghai, China
| | - Yu Feng
- Shuguang Hospital, Shanghai University of Traditional Chinese Medicine, Shanghai, China
| |
Collapse
|
14
|
Wang T, Chen L, Zhao X. Prediction of Drug Combinations with a Network Embedding Method. Comb Chem High Throughput Screen 2019; 21:789-797. [DOI: 10.2174/1386207322666181226170140] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2018] [Revised: 11/02/2018] [Accepted: 11/28/2018] [Indexed: 01/10/2023]
Abstract
Aim and Objective:
There are several diseases having a complicated mechanism. For such
complicated diseases, a single drug cannot treat them very well because these diseases always
involve several targets and single targeted drugs cannot modulate these targets simultaneously. Drug
combination is an effective way to treat such diseases. However, determination of effective drug
combinations is time- and cost-consuming via traditional methods. It is urgent to build quick and
cheap methods in this regard. Designing effective computational methods incorporating advanced
computational techniques to predict drug combinations is an alternative and feasible way.
Method:
In this study, we proposed a novel network embedding method, which can extract
topological features of each drug combination from a drug network that was constructed using
chemical-chemical interaction information retrieved from STITCH. These topological features were
combined with individual features of drug combination reported in one previous study. Several
advanced computational methods were employed to construct an effective prediction model, such as
synthetic minority oversampling technique (SMOTE) that was used to tackle imbalanced dataset,
minimum redundancy maximum relevance (mRMR) and incremental feature selection (IFS)
methods that were adopted to analyze features and extract optimal features for building an optimal
support machine vector (SVM) classifier.
Results and Conclusion:
The constructed optimal SVM classifier yielded an MCC of 0.806, which
is superior to the classifier only using individual features with or without SMOTE. The performance
of the classifier can be improved by combining the topological features and essential features of a
drug combination.
Collapse
Affiliation(s)
- Tianyun Wang
- College of Information Engineering, Shanghai Maritime University, Shanghai 201306, China
| | - Lei Chen
- College of Information Engineering, Shanghai Maritime University, Shanghai 201306, China
| | - Xian Zhao
- College of Information Engineering, Shanghai Maritime University, Shanghai 201306, China
| |
Collapse
|
15
|
Abstract
Signal peptidases are the membrane bound enzymes that cleave off the amino-terminal signal peptide from secretory preproteins . There are two types of bacterial signal peptidases . Type I signal peptidase utilizes a serine/lysine catalytic dyad mechanism and is the major signal peptidase in most bacteria. Type II signal peptidase is an aspartic protease specific for prolipoproteins. This chapter will review what is known about the structure, function and mechanism of these unique enzymes.
Collapse
Affiliation(s)
- Mark Paetzel
- Department of Molecular Biology and Biochemistry, Simon Fraser University, South Science Building 8888 University Drive, Burnaby, BC, V5A 1S6, Canada.
| |
Collapse
|
16
|
AL-barakati HJ, Saigo H, Newman RH, KC DB. RF-GlutarySite: a random forest based predictor for glutarylation sites. Mol Omics 2019; 15:189-204. [DOI: 10.1039/c9mo00028c] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
Glutarylation, which is a newly identified posttranslational modification that occurs on lysine residues, has recently emerged as an important regulator of several metabolic and mitochondrial processes. Here, we describe the development of RF-GlutarySite, a random forest-based predictor designed to predict glutarylation sites based on protein primary amino acid sequence.
Collapse
Affiliation(s)
- Hussam J. AL-barakati
- Department of Computational Science and Engineering
- North Carolina Agricultural & Technical State University
- Greensboro
- USA
| | - Hiroto Saigo
- Department of Informatics
- Kyushu University
- Fukuoka 819-0395
- Japan
| | - Robert H. Newman
- Department of Biology
- North Carolina Agricultural & Technical State University
- Greensboro
- USA
| | - Dukka B. KC
- Department of Computational Science and Engineering
- North Carolina Agricultural & Technical State University
- Greensboro
- USA
| |
Collapse
|
17
|
Chen L, Pan X, Zhang YH, Liu M, Huang T, Cai YD. Classification of Widely and Rarely Expressed Genes with Recurrent Neural Network. Comput Struct Biotechnol J 2018; 17:49-60. [PMID: 30595815 PMCID: PMC6307323 DOI: 10.1016/j.csbj.2018.12.002] [Citation(s) in RCA: 30] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2018] [Revised: 12/07/2018] [Accepted: 12/09/2018] [Indexed: 02/06/2023] Open
Abstract
A tissue-specific gene expression shapes the formation of tissues, while gene expression changes reflect the immune response of the human body to environmental stimulations or pressure, particularly in disease conditions, such as cancers. A few genes are commonly expressed across tissues or various cancers, while others are not. To investigate the functional differences between widely and rarely expressed genes, we defined the genes that were expressed in 32 normal tissues/cancers (i.e., called widely expressed genes; FPKM >1 in all samples) and those that were not detected (i.e., called rarely expressed genes; FPKM <1 in all samples) based on the large gene expression data set provided by Uhlen et al. Each gene was encoded using the gene ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment scores. Minimum redundancy maximum relevance (mRMR) was used to measure and rank these features on the mRMR feature list. Thereafter, we applied the incremental feature selection method with a supervised classifier recurrent neural network (RNN) to select the discriminate features for classifying widely expressed genes from rarely expressed genes and construct an optimum RNN classifier. The Youden's indexes generated by the optimum RNN classifier and evaluated using a 10-fold cross validation were 0.739 for normal tissues and 0.639 for cancers. Furthermore, the underlying mechanisms of the key discriminate GO and KEGG features were analyzed. Results can facilitate the identification of the expression landscape of genes and elucidation of how gene expression shapes tissues and the microenvironment of cancers. Some genes are widely expressed across tissues or various cancers. A number of genes are rarely expressed across tissues or various cancers. The functional differences between widely and rarely expressed genes were studied. Several GO terms and KEGG pathways were extracted and analyzed.
Collapse
Affiliation(s)
- Lei Chen
- School of Life Sciences, Shanghai University, Shanghai 200444, People's Republic of China.,College of Information Engineering, Shanghai Maritime University, Shanghai 201306, People's Republic of China.,Shanghai Key Laboratory of PMMP, East China Normal University, Shanghai 200241, People's Republic of China
| | - XiaoYong Pan
- Department of Medical Informatics, Erasmus MC, Rotterdam, the Netherlands
| | - Yu-Hang Zhang
- Institute of Health Sciences, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai 200031, People's Republic of China
| | - Min Liu
- College of Information Engineering, Shanghai Maritime University, Shanghai 201306, People's Republic of China
| | - Tao Huang
- Institute of Health Sciences, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai 200031, People's Republic of China
| | - Yu-Dong Cai
- School of Life Sciences, Shanghai University, Shanghai 200444, People's Republic of China
| |
Collapse
|