1
|
Multi-ethnic imputation system (MI-System): a genotype imputation server for high-dimensional data. J Biomed Inform 2023:104423. [PMID: 37308034 DOI: 10.1016/j.jbi.2023.104423] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2022] [Revised: 05/11/2023] [Accepted: 06/09/2023] [Indexed: 06/14/2023]
Abstract
OBJECTIVE Genotype imputation is a commonly used technique that infers un-typed variants into a study's genotype data, allowing better identification of causal variants in disease studies. However, due to overrepresentation of Caucasian studies, there's a lack of understanding of genetic basis of health-outcomes in other ethnic populations. Therefore, facilitating imputation of missing key-predictor-variants that can potentially improve a risk health-outcome prediction model, specifically for Asian ancestry, is of utmost relevance. METHODS We aimed to construct an imputation and analysis web-platform, that primarily facilitates, but is not limited to, genotype imputation on East-Asians. The goal is to provide a collaborative imputation platform for researchers in the public domain, towards rapidly and efficiently conducting accurate genotype imputation. RESULTS We present an online genotype imputation platform, Multi-ethnic Imputation System (MI-System) (https://misystem.cgm.ntu.edu.tw/), that offers users 3 established pipelines, SHAPEIT2-IMPUTE2, SHAPEIT4-IMPUTE5, and Beagle5.1 for conducting imputation analyses. In addition to 1000 Genomes and Hapmap3, a new customized Taiwan Biobank (TWB) reference panel, specifically created for Taiwanese-Chinese ancestry is provided. MI-System further offers functions to create customized reference panels to be used for imputation, conduct quality control, split whole genome data into chromosomes, and convert genome builds. CONCLUSION Users can upload their genotype data and perform imputation with minimum effort and resources. The utility functions further can be utilized to preprocess user uploaded data with easy clicks. MI-System potentially contributes to Asian-population genetics research, while eliminating the requirement for high performing computational resources and bioinformatics expertise. It will enable an increased pace of research and provide a knowledge-base for genetic carriers of complex diseases, therefore greatly enhancing patient-driven research. STATEMENT OF SIGNIFICANCE Multi-ethnic Imputation System (MI-System), primarily facilitates, but is not limited to, imputation on East-Asians, through 3 established prephasing-imputation pipelines, SHAPEIT2-IMPUTE2, SHAPEIT4-IMPUTE5, and Beagle5.1, where users can upload their genotype data and perform imputation and other utility functions with minimum effort and resources. A new customized Taiwan Biobank (TWB) reference panel, specifically created for Taiwanese-Chinese ancestry is provided. Utility functions include (a) create customized reference panels, (b) conduct quality control, (c) split whole genome data into chromosomes, and (d) convert genome builds. Users can also combine 2 reference panels using the system and use combined panels as reference to conduct imputation using MI-System.
Collapse
|
2
|
DeepAProt: Deep learning based abiotic stress protein sequence classification and identification tool in cereals. FRONTIERS IN PLANT SCIENCE 2023; 13:1008756. [PMID: 36714750 PMCID: PMC9877618 DOI: 10.3389/fpls.2022.1008756] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/01/2022] [Accepted: 11/14/2022] [Indexed: 06/18/2023]
Abstract
The impact of climate change has been alarming for the crop growth. The extreme weather conditions can stress the crops and reduce the yield of major crops belonging to Poaceae family too, that sustains 50% of the world's food calorie and 20% of protein intake. Computational approaches, such as artificial intelligence-based techniques have become the forefront of prediction-based data interpretation and plant stress responses. In this study, we proposed a novel activation function, namely, Gaussian Error Linear Unit with Sigmoid (SIELU) which was implemented in the development of a Deep Learning (DL) model along with other hyper parameters for classification of unknown abiotic stress protein sequences from crops of Poaceae family. To develop this models, data pertaining to four different abiotic stress (namely, cold, drought, heat and salinity) responsive proteins of the crops belonging to poaceae family were retrieved from public domain. It was observed that efficiency of the DL models with our proposed novel SIELU activation function outperformed the models as compared to GeLU activation function, SVM and RF with 95.11%, 80.78%, 94.97%, and 81.69% accuracy for cold, drought, heat and salinity, respectively. Also, a web-based tool, named DeepAProt (http://login1.cabgrid.res.in:5500/) was developed using flask API, along with its mobile app. This server/App will provide researchers a convenient tool, which is rapid and economical in identification of proteins for abiotic stress management in crops Poaceae family, in endeavour of higher production for food security and combating hunger, ensuring UN SDG goal 2.0.
Collapse
|
3
|
In Silico Prediction and Insights Into the Structural Basis of Drug Induced Nephrotoxicity. Front Pharmacol 2022; 12:793332. [PMID: 35082675 PMCID: PMC8785686 DOI: 10.3389/fphar.2021.793332] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2021] [Accepted: 11/23/2021] [Indexed: 12/11/2022] Open
Abstract
Drug induced nephrotoxicity is a major clinical challenge, and it is always associated with higher costs for the pharmaceutical industry and due to detection during the late stages of drug development. It is desirable for improving the health outcomes for patients to distinguish nephrotoxic structures at an early stage of drug development. In this study, we focused on in silico prediction and insights into the structural basis of drug induced nephrotoxicity, based on reliable data on human nephrotoxicity. We collected 565 diverse chemical structures, including 287 nephrotoxic drugs on humans in the real world, and 278 non-nephrotoxic approved drugs. Several different machine learning and deep learning algorithms were employed for in silico model building. Then, a consensus model was developed based on three best individual models (RFR_QNPR, XGBOOST_QNPR, and CNF). The consensus model performed much better than individual models on internal validation and it achieved prediction accuracy of 86.24% external validation. The results of analysis of molecular properties differences between nephrotoxic and non-nephrotoxic structures indicated that several key molecular properties differ significantly, including molecular weight (MW), molecular polar surface area (MPSA), AlogP, number of hydrogen bond acceptors (nHBA), molecular solubility (LogS), the number of rotatable bonds (nRotB), and the number of aromatic rings (nAR). These molecular properties may be able to play an important part in the identification of nephrotoxic chemicals. Finally, 87 structural alerts for chemical nephrotoxicity were mined with f-score and positive rate analysis of substructures from Klekota-Roth fingerprint (KRFP). These structural alerts can well identify nephrotoxic drug structures in the data set. The in silico models and the structural alerts could be freely accessed via https://ochem.eu/article/140251 and http://www.sapredictor.cn, respectively. We hope the results should provide useful tools for early nephrotoxicity estimation in drug development.
Collapse
|
4
|
iRice-MS: An integrated XGBoost model for detecting multitype post-translational modification sites in rice. Brief Bioinform 2021; 23:6447435. [PMID: 34864888 DOI: 10.1093/bib/bbab486] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2021] [Revised: 10/05/2021] [Accepted: 10/23/2021] [Indexed: 12/13/2022] Open
Abstract
Post-translational modification (PTM) refers to the covalent and enzymatic modification of proteins after protein biosynthesis, which orchestrates a variety of biological processes. Detecting PTM sites in proteome scale is one of the key steps to in-depth understanding their regulation mechanisms. In this study, we presented an integrated method based on eXtreme Gradient Boosting (XGBoost), called iRice-MS, to identify 2-hydroxyisobutyrylation, crotonylation, malonylation, ubiquitination, succinylation and acetylation in rice. For each PTM-specific model, we adopted eight feature encoding schemes, including sequence-based features, physicochemical property-based features and spatial mapping information-based features. The optimal feature set was identified from each encoding, and their respective models were established. Extensive experimental results show that iRice-MS always display excellent performance on 5-fold cross-validation and independent dataset test. In addition, our novel approach provides the superiority to other existing tools in terms of AUC value. Based on the proposed model, a web server named iRice-MS was established and is freely accessible at http://lin-group.cn/server/iRice-MS.
Collapse
|
5
|
SeamDock: An Interactive and Collaborative Online Docking Resource to Assist Small Compound Molecular Docking. Front Mol Biosci 2021; 8:716466. [PMID: 34604303 PMCID: PMC8484321 DOI: 10.3389/fmolb.2021.716466] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2021] [Accepted: 07/26/2021] [Indexed: 12/02/2022] Open
Abstract
In silico assessment of protein receptor interactions with small ligands is now part of the standard pipeline for drug discovery, and numerous tools and protocols have been developed for this purpose. With the SeamDock web server, we propose a new approach to facilitate access to small molecule docking for nonspecialists, including students. The SeamDock online service integrates different docking tools in a common framework that allows ligand global and/or local docking and a hierarchical approach combining the two for easy interaction site identification. This service does not require advanced computer knowledge, and it works without the installation of any programs with the exception of a common web browser. The use of the Seamless framework linking the RPBS calculation server to the user’s browser allows the user to navigate smoothly and interactively on the SeamDock web page. A major effort has been put into the 3D visualization of ligand, receptor, and docking poses and their interactions with the receptor. The advanced visualization features combined with the seamless library allow a user to share with an unlimited number of collaborators, a docking session, and its full visualization states. As a result, SeamDock can be seen as a free, simple, didactic, evolving online docking resource best suited for education and training.
Collapse
|
6
|
iDHS-Deep: an integrated tool for predicting DNase I hypersensitive sites by deep neural network. Brief Bioinform 2021; 22:6158360. [PMID: 33751027 DOI: 10.1093/bib/bbab047] [Citation(s) in RCA: 20] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/25/2020] [Revised: 01/28/2021] [Accepted: 01/29/2021] [Indexed: 01/09/2023] Open
Abstract
DNase I hypersensitive site (DHS) refers to the hypersensitive region of chromatin for the DNase I enzyme. It is an important part of the noncoding region and contains a variety of regulatory elements, such as promoter, enhancer, and transcription factor-binding site, etc. Moreover, the related locus of disease (or trait) are usually enriched in the DHS regions. Therefore, the detection of DHS region is of great significance. In this study, we develop a deep learning-based algorithm to identify whether an unknown sequence region would be potential DHS. The proposed method showed high prediction performance on both training datasets and independent datasets in different cell types and developmental stages, demonstrating that the method has excellent superiority in the identification of DHSs. Furthermore, for the convenience of related wet-experimental researchers, the user-friendly web-server iDHS-Deep was established at http://lin-group.cn/server/iDHS-Deep/, by which users can easily distinguish DHS and non-DHS and obtain the corresponding developmental stage ofDHS.
Collapse
|
7
|
Corrigendum: Web-gLV: A Web Based Platform for Lotka-Volterra Based Modeling and Simulation of Microbial Populations. Front Microbiol 2021; 11:605308. [PMID: 33488546 PMCID: PMC7820940 DOI: 10.3389/fmicb.2020.605308] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2020] [Accepted: 12/04/2020] [Indexed: 11/13/2022] Open
Abstract
[This corrects the article DOI: 10.3389/fmicb.2019.00288.].
Collapse
|
8
|
iPromoter-5mC: A Novel Fusion Decision Predictor for the Identification of 5-Methylcytosine Sites in Genome-Wide DNA Promoters. Front Cell Dev Biol 2020; 8:614. [PMID: 32850787 PMCID: PMC7399635 DOI: 10.3389/fcell.2020.00614] [Citation(s) in RCA: 20] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2020] [Accepted: 06/22/2020] [Indexed: 11/29/2022] Open
Abstract
The hypomethylation of the whole cancer genome and the hypermethylation of the promoter of specific tumor suppressor genes are the important reasons for the rapid proliferation of cancer cells. Therefore, obtaining the distribution of 5-methylcytosine (5mC) in promoters is a key step to further understand the relationship between promoter methylation and mRNA gene expression regulation. Large-scale detection of DNA 5mC through wet experiments is still time-consuming and laborious. Therefore, it is urgent to design a method for identifying the 5mC site of genome-wide DNA promoters. Based on promoter methylation data of the small cell lung cancer (SCLC) from the database named cancer cell line Encyclopedia (CCLE), we built a fusion decision predictor called iPromoter-5mC for identifying methylation modification sites in promoters using deep neural network (DNN). One-Hot Encoding (One-hot) was used to encode the promoter samples for the classification. The method achieves average AUC of 0.957 on the independent testing dataset, indicating that our predictor is robust and reliable. A user-friendly web-server called iPromoter-5mC could be freely accessible at http://www.jci-bioinfo.cn/iPromoter-5mC, which will provide simple and effective means for users to study promoter 5mC modification. The source code of the proposed methods is freely available for academic research at https://github.com/zlwuxi/iPromoter-5mC.
Collapse
|
9
|
PSI-MOUSE: Predicting Mouse Pseudouridine Sites From Sequence and Genome-Derived Features. Evol Bioinform Online 2020; 16:1176934320925752. [PMID: 32565674 PMCID: PMC7285933 DOI: 10.1177/1176934320925752] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2020] [Accepted: 03/30/2020] [Indexed: 12/04/2022] Open
Abstract
Pseudouridine (Ψ) is the first discovered and the most prevalent posttranscriptional modification, which has been widely studied during the past decades. Pseudouridine was observed in almost all kinds of RNAs and shown to have important biological functions. Currently, the time-consuming and high-cost procedures of experimental approaches limit its uses in real-life Ψ site detection. Alternatively, by taking advantage of the explosive growth of Ψ sequencing data, the computational methods may provide a more cost-effective avenue. To date, the existing mouse Ψ site predictors were all developed based on sequence-derived features, and their performance can be further improved by adding the domain knowledge derived feature. Therefore, it is highly desirable to propose a genomic feature-based computational method to increase the accuracy and efficiency of the identification of Ψ RNA modification in the mouse transcriptome. In our study, a predictive framework PSI-MOUSE was built. Besides the conventional sequence-based features, PSI-MOUSE first introduced 38 additional genomic features derived from the mouse genome, which achieved a satisfactory improvement in the prediction performance, compared with other existing models. Moreover, PSI-MOUSE also features in automatically annotating the putative Ψ sites with diverse types of posttranscriptional regulations (RNA-binding protein [RBP]-binding regions, miRNA-RNA interactions, and splicing sites), which can serve as a useful research tool for the study of Ψ RNA modification in the mouse genome. Finally, 3282 experimentally validated mouse Ψ sites were also collected in a database with customized query functions. For the convenience of academic users, a website was built to provide a user-friendly interface for the query and analysis on the database. The website is freely accessible at www.xjtlu.edu.cn/biologicalsciences/psimouse and http://psimouse.rnamd.com. We introduced the genome-derived features to mouse for the first time, and we achieved a good performance in mouse Ψ site prediction. Compared with the existing state-of-art methods, our newly developed approach PSI-MOUSE obtained a substantial improvement in prediction accuracy, marking the reliable contributions of genomic features for the prediction of RNA modifications in a species other than human.
Collapse
|
10
|
PedMiner: a tool for linkage analysis-based identification of disease-associated variants using family based whole-exome sequencing data. Brief Bioinform 2020; 22:5835558. [PMID: 32393981 PMCID: PMC8138824 DOI: 10.1093/bib/bbaa077] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/01/2020] [Revised: 04/13/2020] [Accepted: 04/17/2020] [Indexed: 12/22/2022] Open
Abstract
With the advances of next-generation sequencing technology, the field of disease research has been revolutionized. However, pinpointing the disease-causing variants from millions of revealed variants is still a tough task. Here, we have reviewed the existing linkage analysis tools and presented PedMiner, a web-based application designed to narrow down candidate variants from family based whole-exome sequencing (WES) data through linkage analysis. PedMiner integrates linkage analysis, variant annotation and prioritization in one automated pipeline. It provides graphical visualization of the linked regions along with comprehensive annotation of variants and genes within these linked regions. This efficient and comprehensive application will be helpful for the scientific community working on Mendelian inherited disorders using family based WES data.
Collapse
|
11
|
iRO-PsekGCC: Identify DNA Replication Origins Based on Pseudo k-Tuple GC Composition. Front Genet 2019; 10:842. [PMID: 31620165 PMCID: PMC6759546 DOI: 10.3389/fgene.2019.00842] [Citation(s) in RCA: 29] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2019] [Accepted: 08/13/2019] [Indexed: 11/22/2022] Open
Abstract
Identification of replication origins is playing a key role in understanding the mechanism of DNA replication. This task is of great significance in DNA sequence analysis. Because of its importance, some computational approaches have been introduced. Among these predictors, the iRO-3wPseKNC predictor is the first discriminative method that is able to correctly identify the entire replication origins. For further improving its predictive performance, we proposed the Pseudo k-tuple GC Composition (PsekGCC) approach to capture the "GC asymmetry bias" of yeast species by considering both the GC skew and the sequence order effects of k-tuple GC Composition (k-GCC) in this study. Based on PseKGCC, we proposed a new predictor called iRO-PsekGCC to identify the DNA replication origins. Rigorous jackknife test on two yeast species benchmark datasets (Saccharomyces cerevisiae, Pichia pastoris) indicated that iRO-PsekGCC outperformed iRO-3wPseKNC. It can be anticipated that iRO-PsekGCC will be a useful tool for DNA replication origin identification. Availability and implementation: The web-server for the iRO-PsekGCC predictor was established, and it can be accessed at http://bliulab.net/iRO-PsekGCC/.
Collapse
|
12
|
iDNA6mA-Rice: A Computational Tool for Detecting N6-Methyladenine Sites in Rice. Front Genet 2019; 10:793. [PMID: 31552096 PMCID: PMC6746913 DOI: 10.3389/fgene.2019.00793] [Citation(s) in RCA: 47] [Impact Index Per Article: 9.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2019] [Accepted: 07/26/2019] [Indexed: 01/08/2023] Open
Abstract
DNA N6-methyladenine (6mA) is a dominant DNA modification form and involved in many biological functions. The accurate genome-wide identification of 6mA sites may increase understanding of its biological functions. Experimental methods for 6mA detection in eukaryotes genome are laborious and expensive. Therefore, it is necessary to develop computational methods to identify 6mA sites on a genomic scale, especially for plant genomes. Based on this consideration, the study aims to develop a machine learning-based method of predicting 6mA sites in the rice genome. We initially used mono-nucleotide binary encoding to formulate positive and negative samples. Subsequently, the machine learning algorithm named Random Forest was utilized to perform the classification for identifying 6mA sites. Our proposed method could produce an area under the receiver operating characteristic curve of 0.964 with an overall accuracy of 0.917, as indicated by the fivefold cross-validation test. Furthermore, an independent dataset was established to assess the generalization ability of our method. Finally, an area under the receiver operating characteristic curve of 0.981 was obtained, suggesting that the proposed method had good performance of predicting 6mA sites in the rice genome. For the convenience of retrieving 6mA sites, on the basis of the computational method, we built a freely accessible web server named iDNA6mA-Rice at http://lin-group.cn/server/iDNA6mA-Rice.
Collapse
|
13
|
Web-gLV: A Web Based Platform for Lotka-Volterra Based Modeling and Simulation of Microbial Populations. Front Microbiol 2019; 10:288. [PMID: 30846976 PMCID: PMC6394339 DOI: 10.3389/fmicb.2019.00288] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2018] [Accepted: 02/04/2019] [Indexed: 12/12/2022] Open
Abstract
The affordability of high throughput DNA sequencing has allowed us to explore the dynamics of microbial populations in various ecosystems. Mathematical modeling and simulation of such microbiome time series data can help in getting better understanding of bacterial communities. In this paper, we present Web-gLV-a GUI based interactive platform for generalized Lotka-Volterra (gLV) based modeling and simulation of microbial populations. The tool can be used to generate the mathematical models with automatic estimation of parameters and use them to predict future trajectories using numerical simulations. We also demonstrate the utility of our tool on few publicly available datasets. The case studies demonstrate the ease with which the current tool can be used by biologists to model bacterial populations and simulate their dynamics to get biological insights. We expect Web-gLV to be a valuable contribution in the field of ecological modeling and metagenomic systems biology.
Collapse
|
14
|
iRNAm5C-PseDNC: identifying RNA 5-methylcytosine sites by incorporating physical-chemical properties into pseudo dinucleotide composition. Oncotarget 2018; 8:41178-41188. [PMID: 28476023 PMCID: PMC5522291 DOI: 10.18632/oncotarget.17104] [Citation(s) in RCA: 146] [Impact Index Per Article: 24.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2017] [Accepted: 03/15/2017] [Indexed: 01/24/2023] Open
Abstract
Occurring at cytosine (C) of RNA, 5-methylcytosine (m5C) is an important post-transcriptional modification (PTCM). The modification plays significant roles in biological processes by regulating RNA metabolism in both eukaryotes and prokaryotes. It may also, however, cause cancers and other major diseases. Given an uncharacterized RNA sequence that contains many C residues, can we identify which one of them can be of m5C modification, and which one cannot? It is no doubt a crucial problem, particularly with the explosive growth of RNA sequences in the postgenomic age. Unfortunately, so far no user-friendly web-server whatsoever has been developed to address such a problem. To meet the increasingly high demand from most experimental scientists working in the area of drug development, we have developed a new predictor called iRNAm5C-PseDNC by incorporating ten types of physical-chemical properties into pseudo dinucleotide composition via the auto/cross-covariance approach. Rigorous jackknife tests show that its anticipated accuracy is quite high. For most experimental scientists’ convenience, a user-friendly web-server for the predictor has been provided at http://www.jci-bioinfo.cn/iRNAm5C-PseDNC along with a step-by-step user guide, by which users can easily obtain their desired results without the need to go through the complicated mathematical equations involved. It has not escaped our notice that the approach presented here can also be used to deal with many other problems in genome analysis.
Collapse
|
15
|
iRNA-AI: identifying the adenosine to inosine editing sites in RNA sequences. Oncotarget 2018; 8:4208-4217. [PMID: 27926534 PMCID: PMC5354824 DOI: 10.18632/oncotarget.13758] [Citation(s) in RCA: 199] [Impact Index Per Article: 33.2] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2016] [Accepted: 11/23/2016] [Indexed: 01/14/2023] Open
Abstract
Catalyzed by adenosine deaminase (ADAR), the adenosine to inosine (A-to-I) editing in RNA is not only involved in various important biological processes, but also closely associated with a series of major diseases. Therefore, knowledge about the A-to-I editing sites in RNA is crucially important for both basic research and drug development. Given an uncharacterized RNA sequence that contains many adenosine (A) residues, can we identify which one of them can be of A-to-I editing, and which one cannot? Unfortunately, so far no computational method whatsoever has been developed to address such an important problem based on the RNA sequence information alone. To fill this empty area, we have proposed a predictor called iRNA-AI by incorporating the chemical properties of nucleotides and their sliding occurrence density distribution along a RNA sequence into the general form of pseudo nucleotide composition (PseKNC). It has been shown by the rigorous jackknife test and independent dataset test that the performance of the proposed predictor is quite promising. For the convenience of most experimental scientists, a user-friendly web-server for iRNA-AI has been established at http://lin.uestc.edu.cn/server/iRNA-AI/, by which users can easily get their desired results without the need to go through the mathematical details.
Collapse
|
16
|
The visualCMAT: A web-server to select and interpret correlated mutations/co-evolving residues in protein families. J Bioinform Comput Biol 2017; 16:1840005. [PMID: 29361894 DOI: 10.1142/s021972001840005x] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
The visualCMAT web-server was designed to assist experimental research in the fields of protein/enzyme biochemistry, protein engineering, and drug discovery by providing an intuitive and easy-to-use interface to the analysis of correlated mutations/co-evolving residues. Sequence and structural information describing homologous proteins are used to predict correlated substitutions by the Mutual information-based CMAT approach, classify them into spatially close co-evolving pairs, which either form a direct physical contact or interact with the same ligand (e.g. a substrate or a crystallographic water molecule), and long-range correlations, annotate and rank binding sites on the protein surface by the presence of statistically significant co-evolving positions. The results of the visualCMAT are organized for a convenient visual analysis and can be downloaded to a local computer as a content-rich all-in-one PyMol session file with multiple layers of annotation corresponding to bioinformatic, statistical and structural analyses of the predicted co-evolution, or further studied online using the built-in interactive analysis tools. The online interactivity is implemented in HTML5 and therefore neither plugins nor Java are required. The visualCMAT web-server is integrated with the Mustguseal web-server capable of constructing large structure-guided sequence alignments of protein families and superfamilies using all available information about their structures and sequences in public databases. The visualCMAT web-server can be used to understand the relationship between structure and function in proteins, implemented at selecting hotspots and compensatory mutations for rational design and directed evolution experiments to produce novel enzymes with improved properties, and employed at studying the mechanism of selective ligand's binding and allosteric communication between topologically independent sites in protein structures. The web-server is freely available at https://biokinet.belozersky.msu.ru/visualcmat and there are no login requirements.
Collapse
|
17
|
iUbiq-Lys: prediction of lysine ubiquitination sites in proteins by extracting sequence evolution information via a gray system model. J Biomol Struct Dyn 2014; 33:1731-42. [PMID: 25248923 DOI: 10.1080/07391102.2014.968875] [Citation(s) in RCA: 126] [Impact Index Per Article: 12.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
Abstract
As one of the most important posttranslational modifications (PTMs), ubiquitination plays an important role in regulating varieties of biological processes, such as signal transduction, cell division, apoptosis, and immune response. Ubiquitination is also named "lysine ubiquitination" because it occurs when an ubiquitin is covalently attached to lysine (K) residues of targeting proteins. Given an uncharacterized protein sequence that contains many lysine residues, which one of them is the ubiquitination site, and which one is of non-ubiquitination site? With the avalanche of protein sequences generated in the postgenomic age, it is highly desired for both basic research and drug development to develop an automated method for rapidly and accurately annotating the ubiquitination sites in proteins. In view of this, a new predictor called "iUbiq-Lys" was developed based on the evolutionary information, gray system model, as well as the general form of pseudo-amino acid composition. It was demonstrated via the rigorous cross-validations that the new predictor remarkably outperformed all its counterparts. As a web-server, iUbiq-Lys is accessible to the public at http://www.jci-bioinfo.cn/iUbiq-Lys . For the convenience of most experimental scientists, we have further provided a protocol of step-by-step guide, by which users can easily get their desired results without the need to follow the complicated mathematics that were presented in this paper just for the integrity of its development process.
Collapse
|