1
|
Pradhan UK, Mahapatra A, Naha S, Gupta A, Parsad R, Gahlaut V, Rath SN, Meher PK. ASPTF: A computational tool to predict abiotic stress-responsive transcription factors in plants by employing machine learning algorithms. Biochim Biophys Acta Gen Subj 2024; 1868:130597. [PMID: 38490467 DOI: 10.1016/j.bbagen.2024.130597] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2023] [Revised: 02/26/2024] [Accepted: 03/10/2024] [Indexed: 03/17/2024]
Abstract
BACKGROUND Abiotic stresses pose serious threat to the growth and yield of crop plants. Several studies suggest that in plants, transcription factors (TFs) are important regulators of gene expression, especially when it comes to coping with abiotic stresses. Therefore, it is crucial to identify TFs associated with abiotic stress response for breeding of abiotic stress tolerant crop cultivars. METHODS Based on a machine learning framework, a computational model was envisaged to predict TFs associated with abiotic stress response in plants. To numerically encode TF sequences, four distinct sequence derived features were generated. The prediction was performed using ten shallow learning and four deep learning algorithms. For prediction using more pertinent and informative features, feature selection techniques were also employed. RESULTS Using the features chosen by the light-gradient boosting machine-variable importance measure (LGBM-VIM), the LGBM achieved the highest cross-validation performance metrics (accuracy: 86.81%, auROC: 92.98%, and auPRC: 94.03%). Further evaluation of the proposed model (LGBM prediction method + LGBM-VIM selected features) was also done using an independent test dataset, where the accuracy, auROC and auPRC were observed 81.98%, 90.65% and 91.30%, respectively. CONCLUSIONS To facilitate the adoption of the proposed strategy by users, the approach was implemented as a prediction server called ASPTF, accessible at https://iasri-sg.icar.gov.in/asptf/. The developed approach and the corresponding web application are anticipated to supplement experimental methods in the identification of transcription factors (TFs) responsive to abiotic stress in plants.
Collapse
Affiliation(s)
- Upendra Kumar Pradhan
- Division of Statistical Genetics, ICAR-Indian Agricultural Statistics Research Institute, PUSA, New Delhi 110012, India.
| | - Anuradha Mahapatra
- Department of Bioinformatics, Odisha University of Agriculture & Technology, Bhubaneswar 751003, Odisha, India
| | - Sanchita Naha
- Division of Computer Applications, ICAR-Indian Agricultural Statistics Research Institute, PUSA, New Delhi 110012, India.
| | - Ajit Gupta
- Division of Statistical Genetics, ICAR-Indian Agricultural Statistics Research Institute, PUSA, New Delhi 110012, India.
| | - Rajender Parsad
- ICAR-Indian Agricultural Statistics Research Institute, PUSA, New Delhi 110012, India.
| | - Vijay Gahlaut
- University Centre for Research & Development, Chandigarh University, Mohali, Punjab, India.
| | - Surya Narayan Rath
- Department of Bioinformatics, Odisha University of Agriculture & Technology, Bhubaneswar 751003, Odisha, India
| | - Prabina Kumar Meher
- Division of Statistical Genetics, ICAR-Indian Agricultural Statistics Research Institute, PUSA, New Delhi 110012, India.
| |
Collapse
|
2
|
Qidwai U, Qidwai U, Sivapalan T, Ratnarajan G. iMIGS: An innovative AI based prediction system for selecting the best patient-specific glaucoma treatment. MethodsX 2023; 10:102209. [PMID: 37255575 PMCID: PMC10225931 DOI: 10.1016/j.mex.2023.102209] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2023] [Accepted: 05/05/2023] [Indexed: 06/01/2023] Open
Abstract
The use of AI-based techniques in healthcare are becoming more and more common and more disease-specific. Glaucoma is a disorder in eye that causes damage to the optic nerve which can lead to permanent blindness. It is caused by the elevated pressure inside the eye due to the obstruction to the flow of the drainage fluid (aqueous humor). Most recent treatment options involve minimally invasive glaucoma surgery (MIGS) in which a stent is placed to improve drainage of aqueous humor from the eye. Each MIGS surgery has a different mechanism of action, and the relative efficacy and chance of success is dependent on multiple patient-specific factors. Hence the ophthalmologists are faced with the critical question; which method would be better for a specific patient, both in terms of glaucoma control but also taking into consideration patient quality of life? In this paper, an Adaptive Neuro-Fuzzy Inference System (ANFIS) has been developed in the form of a Treatment Advice prediction system that will offer the clinician a suggested MIGS treatment from the baseline clinical parameters. ANFIS was used with a real-world MIGS data set which was a retrospective case series of 372 patients who underwent either of the four MIGS procedures from July 2016 till May 2020 at a single center in the UK.•Inputs used: Clinical measurements of Age, Visual Acuity, Intraocular Pressure (IOP), and Visual Field, etc.•Output Classes: iStent, iStent and Endoscopic Cyclophotocoagulation (ICE2), PreserFlo MicroShunt (PMS) and XEN-45).•Results: The proposed ANFIS system was found to be 91% accurate with high Sensitivity (80%) and Specificity (90%).
Collapse
Affiliation(s)
- Uvais Qidwai
- Department of Computer Science & Engineering, Qatar University, Doha, Qatar
| | - Umair Qidwai
- Consultant Ophthalmologist, James Paget University Hospital, Great Yarmouth, UK
| | - Thurka Sivapalan
- Queen Victoria Hospital NHS Foundation Trust, London, United Kingdom
| | | |
Collapse
|
3
|
Lamilla E, Sacarelo C, Alvarez-Alvarado MS, Pazmino A, Iza P. Optical Encoding Model Based on Orbital Angular Momentum Powered by Machine Learning. SENSORS (BASEL, SWITZERLAND) 2023; 23:2755. [PMID: 36904967 PMCID: PMC10007020 DOI: 10.3390/s23052755] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/19/2023] [Revised: 02/18/2023] [Accepted: 02/22/2023] [Indexed: 06/18/2023]
Abstract
Based on orbital angular momentum (OAM) properties of Laguerre-Gaussian beams LG(p,ℓ), a robust optical encoding model for efficient data transmission applications is designed. This paper presents an optical encoding model based on an intensity profile generated by a coherent superposition of two OAM-carrying Laguerre-Gaussian modes and a machine learning detection method. In the encoding process, the intensity profile for data encoding is generated based on the selection of p and ℓ indices, while the decoding process is performed using a support vector machine (SVM) algorithm. Two different decoding models based on an SVM algorithm are tested to verify the robustness of the optical encoding model, finding a BER =10-9 for 10.2 dB of signal-to-noise ratio in one of the SVM models.
Collapse
Affiliation(s)
- Erick Lamilla
- Escuela Superior Politécnica del Litoral, ESPOL, Departamento de Física, Campus Gustavo Galindo, Km 30.5 Vía Perimetral, P.O. Box 09-01-5863, Guayaquil 090150, Ecuador
- Facultad de Ciencias Matemáticas y Físicas, Universidad de Guayaquil, Guayaquil 090514, Ecuador
| | - Christian Sacarelo
- Escuela Superior Politécnica del Litoral, ESPOL, Departamento de Física, Campus Gustavo Galindo, Km 30.5 Vía Perimetral, P.O. Box 09-01-5863, Guayaquil 090150, Ecuador
| | - Manuel S. Alvarez-Alvarado
- Escuela Superior Politécnica del Litoral, ESPOL, Facultad de Ingeniería en Electricidad y Computación(FIEC), Campus Gustavo Galindo, Km 30.5 Vía Perimetral, P.O. Box 09-01-5863, Guayaquil 090150, Ecuador
| | - Arturo Pazmino
- Escuela Superior Politécnica del Litoral, ESPOL, Departamento de Física, Campus Gustavo Galindo, Km 30.5 Vía Perimetral, P.O. Box 09-01-5863, Guayaquil 090150, Ecuador
| | - Peter Iza
- Escuela Superior Politécnica del Litoral, ESPOL, Departamento de Física, Campus Gustavo Galindo, Km 30.5 Vía Perimetral, P.O. Box 09-01-5863, Guayaquil 090150, Ecuador
- Center of Research and Development in Nanotechnology, CIDNA, Escuela Superior Politécnica del Litoral, ESPOL, Campus G. Galindo, Km 30.5 víA Perimetral, Guayaquil 090150, Ecuador
| |
Collapse
|
4
|
Du Z, Huang T, Uversky VN, Li J. Predicting TF Proteins by Incorporating Evolution Information Through PSSM. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:1319-1326. [PMID: 35981062 DOI: 10.1109/tcbb.2022.3199758] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
Transcription factors (TFs) are DNA binding proteins involved in the regulation of gene expression. They exist in all organisms and activate or repress transcription by binding to specific DNA sequences. Traditionally, TFs have been identified by experimental methods that are time-consuming and costly. In recent years, various computational methods have been developed to identify TF to overcome these limitations. However, there is a room for further improvement in the predictive performance of these tools in terms of accuracy. We report here a novel computational tool, TFnet, that provides accurate and comprehensive TF predictions from protein sequences. The accuracy of these predictions is substantially better than the results of the existing TF predictors and methods. Especially, it outperforms comparable methods significantly when sequence similarity to other known sequences in the database drops below 40%. Ablation tests reveal that the high predictive performance stems from innovative ways used in TFnet to derive sequence Position-Specific Scoring Matrix (PSSM) and encode inputs.
Collapse
|
5
|
DeepTFactor: A deep learning-based tool for the prediction of transcription factors. Proc Natl Acad Sci U S A 2021; 118:2021171118. [PMID: 33372147 DOI: 10.1073/pnas.2021171118] [Citation(s) in RCA: 41] [Impact Index Per Article: 13.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022] Open
Abstract
A transcription factor (TF) is a sequence-specific DNA-binding protein that modulates the transcription of a set of particular genes, and thus regulates gene expression in the cell. TFs have commonly been predicted by analyzing sequence homology with the DNA-binding domains of TFs already characterized. Thus, TFs that do not show homologies with the reported ones are difficult to predict. Here we report the development of a deep learning-based tool, DeepTFactor, that predicts whether a protein in question is a TF. DeepTFactor uses a convolutional neural network to extract features of a protein. It showed high performance in predicting TFs of both eukaryotic and prokaryotic origins, resulting in F1 scores of 0.8154 and 0.8000, respectively. Analysis of the gradients of prediction score with respect to input suggested that DeepTFactor detects DNA-binding domains and other latent features for TF prediction. DeepTFactor predicted 332 candidate TFs in Escherichia coli K-12 MG1655. Among them, 84 candidate TFs belong to the y-ome, which is a collection of genes that lack experimental evidence of function. We experimentally validated the results of DeepTFactor prediction by further characterizing genome-wide binding sites of three predicted TFs, YqhC, YiaU, and YahB. Furthermore, we made available the list of 4,674,808 TFs predicted from 73,873,012 protein sequences in 48,346 genomes. DeepTFactor will serve as a useful tool for predicting TFs, which is necessary for understanding the regulatory systems of organisms of interest. We provide DeepTFactor as a stand-alone program, available at https://bitbucket.org/kaistsystemsbiology/deeptfactor.
Collapse
|
6
|
Shahid SM, Ko S, Kwon S. Real-Time Classification of Diesel Marine Engine Loads Using Machine Learning. SENSORS 2019; 19:s19143172. [PMID: 31323904 PMCID: PMC6679318 DOI: 10.3390/s19143172] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/13/2019] [Revised: 07/05/2019] [Accepted: 07/16/2019] [Indexed: 11/24/2022]
Abstract
An engine control system is responsible for controlling the combustion parameters of an internal combustion engine to increase the efficiency of the engine. An optimized parameter setting of an engine control system is highly influenced by the engine load. Therefore, with a change in engine load, the parameter settings need to be updated for higher engine efficiency. Hence, to optimize parameter settings during operation, engine load information is necessary. In this paper, we propose a real-time engine load classification from sensed signals. For the classification, an artificial neural network is used and trained using processed, real, measured data. To that end, a magnetic pickup sensor extracts the rotational speed of the prime mover of a four-stroke V12 marine diesel engine. The measured signal is then converted into a crank angle degree (CAD) signal that shows the behavior of the combustion strokes of firing cylinders at a particular engine load. The CAD signals are considered an input feature to the designed network for classification of engine loads. For verification, we considered five classes of engine load, and the trained network classifies these classes with an accuracy of 99.4%.
Collapse
Affiliation(s)
- Syed Maaz Shahid
- School of Electrical Engineering, University of Ulsan, Ulsan 44610, Korea
| | - Sunghoon Ko
- Hyundai Heavy Industries, Ulsan 44032, Korea
| | - Sungoh Kwon
- School of Electrical Engineering, University of Ulsan, Ulsan 44610, Korea.
| |
Collapse
|
7
|
Eichner J, Topf F, Dräger A, Wrzodek C, Wanke D, Zell A. TFpredict and SABINE: sequence-based prediction of structural and functional characteristics of transcription factors. PLoS One 2013; 8:e82238. [PMID: 24349230 PMCID: PMC3861411 DOI: 10.1371/journal.pone.0082238] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2013] [Accepted: 10/21/2013] [Indexed: 11/18/2022] Open
Abstract
One of the key mechanisms of transcriptional control are the specific connections between transcription factors (TF) and cis-regulatory elements in gene promoters. The elucidation of these specific protein-DNA interactions is crucial to gain insights into the complex regulatory mechanisms and networks underlying the adaptation of organisms to dynamically changing environmental conditions. As experimental techniques for determining TF binding sites are expensive and mostly performed for selected TFs only, accurate computational approaches are needed to analyze transcriptional regulation in eukaryotes on a genome-wide level. We implemented a four-step classification workflow which for a given protein sequence (1) discriminates TFs from other proteins, (2) determines the structural superclass of TFs, (3) identifies the DNA-binding domains of TFs and (4) predicts their cis-acting DNA motif. While existing tools were extended and adapted for performing the latter two prediction steps, the first two steps are based on a novel numeric sequence representation which allows for combining existing knowledge from a BLAST scan with robust machine learning-based classification. By evaluation on a set of experimentally confirmed TFs and non-TFs, we demonstrate that our new protein sequence representation facilitates more reliable identification and structural classification of TFs than previously proposed sequence-derived features. The algorithms underlying our proposed methodology are implemented in the two complementary tools TFpredict and SABINE. The online and stand-alone versions of TFpredict and SABINE are freely available to academics at http://www.cogsys.cs.uni-tuebingen.de/software/TFpredict/ and http://www.cogsys.cs.uni-tuebingen.de/software/SABINE/.
Collapse
Affiliation(s)
- Johannes Eichner
- Center of Bioinformatics Tuebingen (ZBIT), University of Tuebingen, Tübingen, Germany
- * E-mail:
| | - Florian Topf
- Center of Bioinformatics Tuebingen (ZBIT), University of Tuebingen, Tübingen, Germany
| | - Andreas Dräger
- Center of Bioinformatics Tuebingen (ZBIT), University of Tuebingen, Tübingen, Germany
- University of California San Diego, La Jolla, California, United States of America
| | - Clemens Wrzodek
- Center of Bioinformatics Tuebingen (ZBIT), University of Tuebingen, Tübingen, Germany
| | - Dierk Wanke
- Center for Plant Physiology Tuebingen (ZMBP), University of Tuebingen, Tübingen, Germany
| | - Andreas Zell
- Center of Bioinformatics Tuebingen (ZBIT), University of Tuebingen, Tübingen, Germany
| |
Collapse
|
8
|
Liu X, Geng X. A convolutional code-based sequence analysis model and its application. Int J Mol Sci 2013; 14:8393-405. [PMID: 23591850 PMCID: PMC3645750 DOI: 10.3390/ijms14048393] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2013] [Revised: 03/28/2013] [Accepted: 04/10/2013] [Indexed: 11/16/2022] Open
Abstract
A new approach for encoding DNA sequences as input for DNA sequence analysis is proposed using the error correction coding theory of communication engineering. The encoder was designed as a convolutional code model whose generator matrix is designed based on the degeneracy of codons, with a codon treated in the model as an informational unit. The utility of the proposed model was demonstrated through the analysis of twelve prokaryote and nine eukaryote DNA sequences having different GC contents. Distinct differences in code distances were observed near the initiation and termination sites in the open reading frame, which provided a well-regulated characterization of the DNA sequences. Clearly distinguished period-3 features appeared in the coding regions, and the characteristic average code distances of the analyzed sequences were approximately proportional to their GC contents, particularly in the selected prokaryotic organisms, presenting the potential utility as an added taxonomic characteristic for use in studying the relationships of living organisms.
Collapse
Affiliation(s)
- Xiao Liu
- College of Communication Engineering, Chongqing University, 174 ShaPingBa District, Chongqing 400044, China; E-Mail:
| | - Xiaoli Geng
- College of Communication Engineering, Chongqing University, 174 ShaPingBa District, Chongqing 400044, China; E-Mail:
| |
Collapse
|
9
|
Wang Y, Zhou Y, Li Y, Ling Z, Zhu Y, Guo X, Sun H. An improved dimensionality reduction method for meta-transcriptome indexing based diseases classification. BMC SYSTEMS BIOLOGY 2012; 6 Suppl 3:S12. [PMID: 23281712 PMCID: PMC3524076 DOI: 10.1186/1752-0509-6-s3-s12] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Abstract
Background Bacterial 16S Ribosomal RNAs profiling have been widely used in the classification of microbiota associated diseases. Dimensionality reduction is among the keys in mining high-dimensional 16S rRNAs' expression data. High levels of sparsity and redundancy are common in 16S rRNA gene microbial surveys. Traditional feature selection methods are generally restricted to measuring correlated abundances, and are limited in discrimination when so few microbes are actually shared across communities. Results Here we present a Feature Merging and Selection algorithm (FMS) to deal with 16S rRNAs' expression data. By integrating Linear Discriminant Analysis method, FMS can reduce the feature dimension with higher accuracy and preserve the relationship between different features as well. Two 16S rRNAs' expression datasets of pneumonia and dental decay patients were used to test the validity of the algorithm. Combined with SVM, FMS discriminated different classes of both pneumonia and dental caries better than other popular feature selection methods. Conclusions FMS projects data into lower dimension with preservation of enough features, and thus improve the intelligibility of the result. The results showed that FMS is a more valid and reliable methods in feature reduction.
Collapse
Affiliation(s)
- Yin Wang
- College of Life Science and Biotechnology, Shanghai Jiaotong University, 800 Dongchuan Road, Shanghai 200240, China
| | | | | | | | | | | | | |
Collapse
|
10
|
Zheng G, Liu Q, Ding G, Wei C, Li Y. Towards biological characters of interactions between transcription factors and their DNA targets in mammals. BMC Genomics 2012; 13:388. [PMID: 22888987 PMCID: PMC3472306 DOI: 10.1186/1471-2164-13-388] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/29/2012] [Accepted: 06/29/2012] [Indexed: 01/07/2023] Open
Abstract
Background In post-genomic era, the study of transcriptional regulation is pivotal to decode genetic information. Transcription factors (TFs) are central proteins for transcriptional regulation, and interactions between TFs and their DNA targets (TFBSs) are important for downstream genes’ expression. However, the lack of knowledge about interactions between TFs and TFBSs is still baffling people to investigate the mechanism of transcription. Results To expand the knowledge about interactions between TFs and TFBSs, three biological features (sequence feature, structure feature, and evolution feature) were utilized to build TFBS identification models for studying binding preference between TFs and their DNA targets in mammals. Results show that each feature does have fairly well performance to capture TFBSs, and the hybrid model combined all three features is more robust for TFBS identification. Subsequently, correspondence between TFs and their TFBSs was investigated to explore interactions among them in mammals. Results indicate that TFs and TFBSs are reciprocal in sequence, structure, and evolution level. Conclusions Our work demonstrates that, to some extent, TFs and TFBSs have developed a coevolutionary relationship in order to keep their physical binding and maintain their regulatory functions. In summary, our work will help understand transcriptional regulation and interpret binding mechanism between proteins and DNAs.
Collapse
Affiliation(s)
- Guangyong Zheng
- Key Laboratory of Systems Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai 200031, China.
| | | | | | | | | |
Collapse
|
11
|
Hsu JBK, Bretaña NA, Lee TY, Huang HD. Incorporating evolutionary information and functional domains for identifying RNA splicing factors in humans. PLoS One 2011; 6:e27567. [PMID: 22110674 PMCID: PMC3217973 DOI: 10.1371/journal.pone.0027567] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2011] [Accepted: 10/19/2011] [Indexed: 11/19/2022] Open
Abstract
Regulation of pre-mRNA splicing is achieved through the interaction of RNA sequence elements and a variety of RNA-splicing related proteins (splicing factors). The splicing machinery in humans is not yet fully elucidated, partly because splicing factors in humans have not been exhaustively identified. Furthermore, experimental methods for splicing factor identification are time-consuming and lab-intensive. Although many computational methods have been proposed for the identification of RNA-binding proteins, there exists no development that focuses on the identification of RNA-splicing related proteins so far. Therefore, we are motivated to design a method that focuses on the identification of human splicing factors using experimentally verified splicing factors. The investigation of amino acid composition reveals that there are remarkable differences between splicing factors and non-splicing proteins. A support vector machine (SVM) is utilized to construct a predictive model, and the five-fold cross-validation evaluation indicates that the SVM model trained with amino acid composition could provide a promising accuracy (80.22%). Another basic feature, amino acid dipeptide composition, is also examined to yield a similar predictive performance to amino acid composition. In addition, this work presents that the incorporation of evolutionary information and domain information could improve the predictive performance. The constructed models have been demonstrated to effectively classify (73.65% accuracy) an independent data set of human splicing factors. The result of independent testing indicates that in silico identification could be a feasible means of conducting preliminary analyses of splicing factors and significantly reducing the number of potential targets that require further in vivo or in vitro confirmation.
Collapse
Affiliation(s)
- Justin Bo-Kai Hsu
- Institute of Bioinformatics and Systems Biology, National Chiao Tung University, Hsin-Chu, Taiwan
| | - Neil Arvin Bretaña
- Department of Computer Science and Engineering, Yuan Ze University, Taoyuan, Taiwan
| | - Tzong-Yi Lee
- Department of Computer Science and Engineering, Yuan Ze University, Taoyuan, Taiwan
- * E-mail: (T-YL); (H-DH)
| | - Hsien-Da Huang
- Institute of Bioinformatics and Systems Biology, National Chiao Tung University, Hsin-Chu, Taiwan
- Department of Biological Science and Technology, National Chiao Tung University, Hsin-Chu, Taiwan
- Core Facility for Structural Bioinformatics, National Chiao Tung University, Hsin-Chu, Taiwan
- * E-mail: (T-YL); (H-DH)
| |
Collapse
|
12
|
On parameters of the human genome. J Theor Biol 2011; 288:92-104. [DOI: 10.1016/j.jtbi.2011.07.021] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2011] [Revised: 06/28/2011] [Accepted: 07/21/2011] [Indexed: 02/06/2023]
|
13
|
Wu YB, Dai J, Yang XL, Li SJ, Zhao SL, Sheng QH, Tang JS, Zheng GY, Li YX, Wu JR, Zeng R. Concurrent quantification of proteome and phosphoproteome to reveal system-wide association of protein phosphorylation and gene expression. Mol Cell Proteomics 2009; 8:2809-26. [PMID: 19674963 DOI: 10.1074/mcp.m900293-mcp200] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2023] Open
Abstract
Reversible phosphorylation of proteins is an important process modulating cellular activities from upstream, which mainly involves sequential phosphorylation of signaling molecules, to downstream where phosphorylation of transcription factors regulates gene expression. In this study, we combined quantitative labeling with multidimensional liquid chromatography-mass spectrometry to monitor the proteome and phosphoproteome changes in the initial period of adipocyte differentiation. The phosphorylation level of a specific protein may be regulated by a kinase or phosphatase without involvement of gene expression or as a phenomenon that accompanies the alteration of its gene expression. Concurrent quantification of phosphopeptides and non-phosphorylated peptides makes it possible to differentiate cellular phosphorylation changes at these two levels. Furthermore, on the system level, certain proteins were predicted as the targeted gene products regulated by identified transcription factors. Among them, several proteins showed significant expression changes along with the phosphorylation alteration of their transcription factors. This is to date the first work to concurrently quantify proteome and phosphoproteome changes during the initial period of adipocyte differentiation, providing an approach to reveal the system-wide association of protein phosphorylation and gene expression.
Collapse
Affiliation(s)
- Yi-Bo Wu
- Key Laboratory of Systems Biology, Institute of Biochemistry and Cell Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai 200031, China
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
14
|
Identification of protein functions using a machine-learning approach based on sequence-derived properties. Proteome Sci 2009; 7:27. [PMID: 19664241 PMCID: PMC2731080 DOI: 10.1186/1477-5956-7-27] [Citation(s) in RCA: 36] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2009] [Accepted: 08/09/2009] [Indexed: 02/07/2023] Open
Abstract
Background Predicting the function of an unknown protein is an essential goal in bioinformatics. Sequence similarity-based approaches are widely used for function prediction; however, they are often inadequate in the absence of similar sequences or when the sequence similarity among known protein sequences is statistically weak. This study aimed to develop an accurate prediction method for identifying protein function, irrespective of sequence and structural similarities. Results A highly accurate prediction method capable of identifying protein function, based solely on protein sequence properties, is described. This method analyses and identifies specific features of the protein sequence that are highly correlated with certain protein functions and determines the combination of protein sequence features that best characterises protein function. Thirty-three features that represent subtle differences in local regions and full regions of the protein sequences were introduced. On the basis of 484 features extracted solely from the protein sequence, models were built to predict the functions of 11 different proteins from a broad range of cellular components, molecular functions, and biological processes. The accuracy of protein function prediction using random forests with feature selection ranged from 94.23% to 100%. The local sequence information was found to have a broad range of applicability in predicting protein function. Conclusion We present an accurate prediction method using a machine-learning approach based solely on protein sequence properties. The primary contribution of this paper is to propose new PNPRD features representing global and/or local differences in sequences, based on positively and/or negatively charged residues, to assist in predicting protein function. In addition, we identified a compact and useful feature subset for predicting the function of various proteins. Our results indicate that sequence-based classifiers can provide good results among a broad range of proteins, that the proposed features are useful in predicting several functions, and that the combination of our and traditional features may support the creation of a discriminative feature set for specific protein functions.
Collapse
|
15
|
Zheng G, Tu K, Yang Q, Xiong Y, Wei C, Xie L, Zhu Y, Li Y. ITFP: an integrated platform of mammalian transcription factors. Bioinformatics 2008; 24:2416-7. [PMID: 18713790 DOI: 10.1093/bioinformatics/btn439] [Citation(s) in RCA: 113] [Impact Index Per Article: 7.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Investigation of transcription factors (TFs) and their downstream regulated genes (targets) is a significant issue in post-genome era, which can provide a brand new vision for some vital biological process. However, information of TFs and their targets in mammalian is far from sufficient. Here, we developed an integrated TF platform (ITFP), which included abundant TFs and their targets of mammalian. In current release, ITFP includes 4105 putative TFs and 69 496 potential TF-target pairs for human, 3134 putative TFs and 37 040 potential TF-target pairs for mouse, and 1114 putative TFs and 18 055 potential TF-target pairs for rat. In short, ITFP will serve as an important resource for the research community of transcription and provide strong support for regulatory network study.
Collapse
Affiliation(s)
- Guangyong Zheng
- School of Life Sciences, Fudan University, Shanghai 200433, China
| | | | | | | | | | | | | | | |
Collapse
|