1
|
Denger A, Helms V. Optimized Data Set and Feature Construction for Substrate Prediction of Membrane Transporters. J Chem Inf Model 2022; 62:6242-6257. [PMID: 36454173 DOI: 10.1021/acs.jcim.2c00850] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/03/2022]
Abstract
α-Helical transmembrane proteins termed membrane transporters mediate the passage of small hydrophilic substrate molecules across biological lipid bilayer membranes. Annotating the specific substrates of the dozens to hundreds of individual transporters of an organism is an important task. In the past, machine learning classifiers have been successfully trained on pan-organism data sets to predict putative substrates of transporters. Here, we critically examine the selection of an optimal data set of protein sequence features for the classification task. We focus on membrane transporters of the three model organisms Escherichia coli, Arabidopsis thaliana, and Saccharomyces cerevisiae, as well as human. We show that organism-specific classifiers can be robustly trained if at least 20 samples are available for each substrate class. If information from position-specific scoring matrices is included, such classifiers have F1 scores between 0.85 and 1.00. For the largest data set (A. thaliana), a 4-class classifier yielded an F-score of 0.97. On a pan-organism data set composed of transporters of all four organisms, amino acid and sugar transporters were predicted with an F1 score of 0.91.
Collapse
Affiliation(s)
- Andreas Denger
- Center for Bioinformatics, Saarland University, D-66123 Saarbrücken, Germany
| | - Volkhard Helms
- Center for Bioinformatics, Saarland University, D-66123 Saarbrücken, Germany
| |
Collapse
|
2
|
Alballa M, Aplop F, Butler G. TranCEP: Predicting the substrate class of transmembrane transport proteins using compositional, evolutionary, and positional information. PLoS One 2020; 15:e0227683. [PMID: 31935244 PMCID: PMC6959595 DOI: 10.1371/journal.pone.0227683] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2019] [Accepted: 12/26/2019] [Indexed: 11/24/2022] Open
Abstract
Transporters mediate the movement of compounds across the membranes that separate the cell from its environment and across the inner membranes surrounding cellular compartments. It is estimated that one third of a proteome consists of membrane proteins, and many of these are transport proteins. Given the increase in the number of genomes being sequenced, there is a need for computational tools that predict the substrates that are transported by the transmembrane transport proteins. In this paper, we present TranCEP, a predictor of the type of substrate transported by a transmembrane transport protein. TranCEP combines the traditional use of the amino acid composition of the protein, with evolutionary information captured in a multiple sequence alignment (MSA), and restriction to important positions of the alignment that play a role in determining the specificity of the protein. Our experimental results show that TranCEP significantly outperforms the state-of-the-art predictors. The results quantify the contribution made by each type of information used.
Collapse
Affiliation(s)
- Munira Alballa
- Department of Computer Science and Software Engineering, Concordia University, Montréal, Québec, Canada
- College of Computer and Information Sciences, King Saud University, Riyadh, Saudi Arabia
| | - Faizah Aplop
- School of Informatics and Applied Mathematics, Universiti Malaysia Terengganu, Malaysia
| | - Gregory Butler
- Department of Computer Science and Software Engineering, Concordia University, Montréal, Québec, Canada
- Centre for Structural and Functional Genomics, Concordia University, Montréal, Québec, Canada
- * E-mail:
| |
Collapse
|
3
|
Using word embedding technique to efficiently represent protein sequences for identifying substrate specificities of transporters. Anal Biochem 2019; 577:73-81. [PMID: 31022378 DOI: 10.1016/j.ab.2019.04.011] [Citation(s) in RCA: 25] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2019] [Revised: 04/02/2019] [Accepted: 04/12/2019] [Indexed: 02/08/2023]
Abstract
Membrane transport proteins and their substrate specificities play crucial roles in various cellular functions. Identifying the substrate specificities of membrane transport proteins is closely related to protein-target interaction prediction, drug design, membrane recruitment, and dysregulation analysis, thus being an important problem for bioinformatics researchers. In this study, we applied word embedding approach, the main cause for natural language processing breakout in recent years, to protein sequences of transporters. We defined each protein sequence based on the word embeddings and frequencies of its biological words. The protein features were then fed into machine learning models for prediction. We also varied the lengths of protein sequence's constituent biological words to find the optimal length which generated the most discriminative feature set. Compared to four other feature types created from protein sequences, our proposed features can help prediction models yield superior performance. Our best models reach an average area under the curve of 0.96 and 0.99, respectively on the 5-fold cross validation and the independent test. With this result, our study can help biologists identify transporters based on substrate specificities as well as provides a basis for further research that enriches a field of applying natural language processing techniques in bioinformatics.
Collapse
|
4
|
Dias O, Gomes D, Vilaca P, Cardoso J, Rocha M, Ferreira EC, Rocha I. Genome-Wide Semi-Automated Annotation of Transporter Systems. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2017; 14:443-456. [PMID: 26887005 DOI: 10.1109/tcbb.2016.2527647] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
Usually, transport reactions are added to genome-scale metabolic models (GSMMs) based on experimental data and literature. This approach does not allow associating specific genes with transport reactions, which impairs the ability of the model to predict effects of gene deletions. Novel methods for systematic genome-wide transporter functional annotation and their integration into GSMMs are therefore necessary. In this work, an automatic system to detect and classify all potential membrane transport proteins for a given genome and integrate the related reactions into GSMMs is proposed, based on the identification and classification of genes that encode transmembrane proteins. The Transport Reactions Annotation and Generation (TRIAGE) tool identifies the metabolites transported by each transmembrane protein and its transporter family. The localization of the carriers is also predicted and, consequently, their action is confined to a given membrane. The integration of the data provided by TRIAGE with highly curated models allowed the identification of new transport reactions. TRIAGE is included in the new release of merlin, a software tool previously developed by the authors, which expedites the GSMM reconstruction processes.
Collapse
|
5
|
McDermott JE, Bruillard P, Overall CC, Gosink L, Lindemann SR. Prediction of multi-drug resistance transporters using a novel sequence analysis method. F1000Res 2015; 4:60. [PMID: 26913187 PMCID: PMC4743146 DOI: 10.12688/f1000research.6200.2] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 05/18/2015] [Indexed: 11/20/2022] Open
Abstract
There are many examples of groups of proteins that have similar function, but the determinants of functional specificity may be hidden by lack of sequence similarity, or by large groups of similar sequences with different functions. Transporters are one such protein group in that the general function, transport, can be easily inferred from the sequence, but the substrate specificity can be impossible to predict from sequence with current methods. In this paper we describe a linguistic-based approach to identify functional patterns from groups of unaligned protein sequences and its application to predict multi-drug resistance transporters (MDRs) from bacteria. We first show that our method can recreate known patterns from PROSITE for several motifs from unaligned sequences. We then show that the method, MDRpred, can predict MDRs with greater accuracy and positive predictive value than a collection of currently available family-based models from the Pfam database. Finally, we apply MDRpred to a large collection of protein sequences from an environmental microbiome study to make novel predictions about drug resistance in a potential environmental reservoir.
Collapse
Affiliation(s)
- Jason E. McDermott
- Biological Sciences, Pacific Northwest National Laboratory, Washington, WA, 99352, USA
- Department of Molecular Microbiology and Immunology, Oregon Health & Science University, Portland, OR, 97239, USA
| | - Paul Bruillard
- National Security Divisions, Pacific Northwest National Laboratory, Washington, WA, 99352, USA
| | | | - Luke Gosink
- National Security Divisions, Pacific Northwest National Laboratory, Washington, WA, 99352, USA
| | - Stephen R. Lindemann
- Biological Sciences, Pacific Northwest National Laboratory, Washington, WA, 99352, USA
| |
Collapse
|
6
|
McDermott JE, Bruillard P, Overall CC, Gosink L, Lindemann SR. Prediction of multi-drug resistance transporters using a novel sequence analysis method. F1000Res 2015; 4:60. [PMID: 26913187 PMCID: PMC4743146 DOI: 10.12688/f1000research.6200.1] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 03/05/2015] [Indexed: 03/26/2024] Open
Abstract
There are many examples of groups of proteins that have similar function, but the determinants of functional specificity may be hidden by lack of sequence similarity, or by large groups of similar sequences with different functions. Transporters are one such protein group in that the general function, transport, can be easily inferred from the sequence, but the substrate specificity can be impossible to predict from sequence with current methods. In this paper we describe a linguistic-based approach to identify functional patterns from groups of unaligned protein sequences and its application to predict multi-drug resistance transporters (MDRs) from bacteria. We first show that our method can recreate known patterns from PROSITE for several motifs from unaligned sequences. We then show that the method, MDRpred, can predict MDRs with greater accuracy and positive predictive value than a collection of currently available family-based models from the Pfam database. Finally, we apply MDRpred to a large collection of protein sequences from an environmental microbiome study to make novel predictions about drug resistance in a potential environmental reservoir.
Collapse
Affiliation(s)
- Jason E. McDermott
- Biological Sciences, Pacific Northwest National Laboratory, Washington, WA, 99352, USA
- Department of Molecular Microbiology and Immunology, Oregon Health & Science University, Portland, OR, 97239, USA
| | - Paul Bruillard
- National Security Divisions, Pacific Northwest National Laboratory, Washington, WA, 99352, USA
| | | | - Luke Gosink
- National Security Divisions, Pacific Northwest National Laboratory, Washington, WA, 99352, USA
| | - Stephen R. Lindemann
- Biological Sciences, Pacific Northwest National Laboratory, Washington, WA, 99352, USA
| |
Collapse
|
7
|
Hu Y, Guo Y, Shi Y, Li M, Pu X. A consensus subunit-specific model for annotation of substrate specificity for ABC transporters. RSC Adv 2015. [DOI: 10.1039/c5ra05304h] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022] Open
Abstract
A consensus classification model was built by considering three subunit proteins individually to predict the substrate specificity of ABC transporters.
Collapse
Affiliation(s)
- Yayun Hu
- College of Chemistry
- Sichuan University
- Chengdu 610064
- People's Republic of China
| | - Yanzhi Guo
- College of Chemistry
- Sichuan University
- Chengdu 610064
- People's Republic of China
| | - Yinan Shi
- College of Chemistry
- Sichuan University
- Chengdu 610064
- People's Republic of China
| | - Menglong Li
- College of Chemistry
- Sichuan University
- Chengdu 610064
- People's Republic of China
| | - Xuemei Pu
- College of Chemistry
- Sichuan University
- Chengdu 610064
- People's Republic of China
| |
Collapse
|
8
|
Mishra NK, Chang J, Zhao PX. Prediction of membrane transport proteins and their substrate specificities using primary sequence information. PLoS One 2014; 9:e100278. [PMID: 24968309 PMCID: PMC4072671 DOI: 10.1371/journal.pone.0100278] [Citation(s) in RCA: 74] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2014] [Accepted: 05/23/2014] [Indexed: 11/18/2022] Open
Abstract
Background Membrane transport proteins (transporters) move hydrophilic substrates across hydrophobic membranes and play vital roles in most cellular functions. Transporters represent a diverse group of proteins that differ in topology, energy coupling mechanism, and substrate specificity as well as sequence similarity. Among the functional annotations of transporters, information about their transporting substrates is especially important. The experimental identification and characterization of transporters is currently costly and time-consuming. The development of robust bioinformatics-based methods for the prediction of membrane transport proteins and their substrate specificities is therefore an important and urgent task. Results Support vector machine (SVM)-based computational models, which comprehensively utilize integrative protein sequence features such as amino acid composition, dipeptide composition, physico-chemical composition, biochemical composition, and position-specific scoring matrices (PSSM), were developed to predict the substrate specificity of seven transporter classes: amino acid, anion, cation, electron, protein/mRNA, sugar, and other transporters. An additional model to differentiate transporters from non-transporters was also developed. Among the developed models, the biochemical composition and PSSM hybrid model outperformed other models and achieved an overall average prediction accuracy of 76.69% with a Mathews correlation coefficient (MCC) of 0.49 and a receiver operating characteristic area under the curve (AUC) of 0.833 on our main dataset. This model also achieved an overall average prediction accuracy of 78.88% and MCC of 0.41 on an independent dataset. Conclusions Our analyses suggest that evolutionary information (i.e., the PSSM) and the AAIndex are key features for the substrate specificity prediction of transport proteins. In comparison, similarity-based methods such as BLAST, PSI-BLAST, and hidden Markov models do not provide accurate predictions for the substrate specificity of membrane transport proteins. TrSSP: The Transporter Substrate Specificity Prediction Server, a web server that implements the SVM models developed in this paper, is freely available at http://bioinfo.noble.org/TrSSP.
Collapse
Affiliation(s)
- Nitish K. Mishra
- Plant Biology Division, The Samuel Roberts Noble Foundation, Ardmore, Oklahoma, United States of America
| | - Junil Chang
- Plant Biology Division, The Samuel Roberts Noble Foundation, Ardmore, Oklahoma, United States of America
| | - Patrick X. Zhao
- Plant Biology Division, The Samuel Roberts Noble Foundation, Ardmore, Oklahoma, United States of America
- * E-mail:
| |
Collapse
|
9
|
Barghash A, Helms V. Transferring functional annotations of membrane transporters on the basis of sequence similarity and sequence motifs. BMC Bioinformatics 2013; 14:343. [PMID: 24283849 PMCID: PMC4219331 DOI: 10.1186/1471-2105-14-343] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2013] [Accepted: 11/19/2013] [Indexed: 11/30/2022] Open
Abstract
Background Membrane transporters catalyze the transport of small solute molecules across biological barriers such as lipid bilayer membranes. Experimental identification of the transported substrates is very tedious. Once a particular transport mechanism has been identified in one organism, it is thus highly desirable to transfer this information to related transporter sequences in different organisms based on bioinformatics evidence. Results We present a thorough benchmark at which level of sequence identity membrane transporters from Escherichia coli, Saccharomyces cerevisiae, and Arabidopsis thaliana belong to the same families of the Transporter Classification (TC) system, and at what level these membrane transporters mediate the transport of the same substrate. We found that two membrane transporter sequences from different organisms that are aligned with normalized BLAST expectation value better than E-value 1e-8 are highly likely to belong to the same TC family (F-measure around 90%). Enriched sequence motifs identified by MEME at thresholds below 1e-12 support accurate classification into TC families for about two thirds of the sequences (F-measure 80% and higher). For the comparison of transported substrates, we focused on the four largest substrate classes of amino acids, sugars, metal ions, and phosphate. At similar identity thresholds, the nature of the transported substrates was more divergent (F-measure 40 - 75% at the same thresholds) than the TC family membership. Conclusions We suggest an acceptable threshold of 1e-8 for BLAST and HMMER where at least three quarters of the sequences are classified according to the TC system with a reasonably high accuracy. Researchers who wish to apply these thresholds in their studies should multiply these thresholds by the size of the database they search against. Our findings should be useful to those who wish to transfer transporter functional annotations across species.
Collapse
Affiliation(s)
- Ahmad Barghash
- Center for Bioinformatics, Saarland University, Postfach 15 11 50, 66041 Saarbrücken, Germany.
| | | |
Collapse
|
10
|
Gromiha MM, Ou YY. Bioinformatics approaches for functional annotation of membrane proteins. Brief Bioinform 2013; 15:155-68. [DOI: 10.1093/bib/bbt015] [Citation(s) in RCA: 33] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
|
11
|
Schaadt NS, Helms V. Functional classification of membrane transporters and channels based on filtered TM/non-TM amino acid composition. Biopolymers 2012; 97:558-67. [PMID: 22492257 DOI: 10.1002/bip.22043] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
Membrane transporters catalyze the transport of small solute molecules across biological barriers such as lipid bilayer membranes. As the experimental annotation of which proteins transport which substrates is incomplete it is highly desirable to develop computational methods that can assist in the classification and substrate annotation of putative membrane transport proteins. Here, we determined the similarity of membrane transporter sequences annotated in the Transport Classification Database (Saier et al., Nucleic Acids Res 2006, 34, D181-D186) and Arabidopsis thaliana membrane transporters annotated in the database Aramemnon (Schwacke et al., Plant Physiol 2003, 131, 16-26). The similarity measure was based on the amino acid composition either considering the full sequences or separately in the transmembrane (TM) and external parts of the sequences. We considered four different substrate sets and three different subfamilies and tried to classify the given proteins into these classes. Family or substrate prediction based on the simple amino acid frequency had an average accuracy of 76%. The differentiation between TM and non-TM regions led to an improved accuracy of 80% on average.
Collapse
Affiliation(s)
- N S Schaadt
- Department of Natural Sciences and Technology III, Center for Bioinformatics, Saarland University, Im Stadtwald, 66123 Saarbrucken, Germany
| | | |
Collapse
|
12
|
Slewinski TL. Diverse functional roles of monosaccharide transporters and their homologs in vascular plants: a physiological perspective. MOLECULAR PLANT 2011; 4:641-62. [PMID: 21746702 DOI: 10.1093/mp/ssr051] [Citation(s) in RCA: 133] [Impact Index Per Article: 10.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/18/2023]
Abstract
Vascular plants contain two gene families that encode monosaccharide transporter proteins. The classical monosaccharide transporter(-like) gene superfamily is large and functionally diverse, while the recently identified SWEET transporter family is smaller and, thus far, only found to transport glucose. These transporters play essential roles at many levels, ranging from organelles to the whole plant. Many family members are essential for cellular homeostasis and reproductive success. Although most transporters do not directly participate in long-distance transport, their indirect roles greatly impact carbon allocation and transport flux to the heterotrophic tissues of the plant. Functional characterization of some members from both gene families has revealed their diverse roles in carbohydrate partitioning, phloem function, resource allocation, plant defense, and sugar signaling. This review highlights the broad impacts and implications of monosaccharide transport by describing some of the functional roles of the monosaccharide transporter(-like) superfamily and the SWEET transporter family.
Collapse
Affiliation(s)
- Thomas L Slewinski
- Department of Plant Biology, Cornell University, 262 Plant Science Building, Ithaca, NY 14853, USA.
| |
Collapse
|