51
|
Abstract
Chemometrics play a critical role in biosensors-based detection, analysis, and diagnosis. Nowadays, as a branch of artificial intelligence (AI), machine learning (ML) have achieved impressive advances. However, novel advanced ML methods, especially deep learning, which is famous for image analysis, facial recognition, and speech recognition, has remained relatively elusive to the biosensor community. Herein, how ML can be beneficial to biosensors is systematically discussed. The advantages and drawbacks of most popular ML algorithms are summarized on the basis of sensing data analysis. Specially, deep learning methods such as convolutional neural network (CNN) and recurrent neural network (RNN) are emphasized. Diverse ML-assisted electrochemical biosensors, wearable electronics, SERS and other spectra-based biosensors, fluorescence biosensors and colorimetric biosensors are comprehensively discussed. Furthermore, biosensor networks and multibiosensor data fusion are introduced. This review will nicely bridge ML with biosensors, and greatly expand chemometrics for detection, analysis, and diagnosis.
Collapse
Affiliation(s)
- Feiyun Cui
- Department of Chemical Engineering, Worcester Polytechnic Institute, 100 Institute Road, Worcester, Massachusetts 01609, United States
| | - Yun Yue
- Department of Electrical & Computer Engineering, Worcester Polytechnic Institute, Worcester, Massachusetts 01609, United States
| | - Yi Zhang
- Department of Biomedical Engineering, University of Connecticut, Storrs, Connecticut 06269, United States
| | - Ziming Zhang
- Department of Electrical & Computer Engineering, Worcester Polytechnic Institute, Worcester, Massachusetts 01609, United States
| | - H. Susan Zhou
- Department of Chemical Engineering, Worcester Polytechnic Institute, 100 Institute Road, Worcester, Massachusetts 01609, United States
| |
Collapse
|
52
|
HACS1 signaling adaptor protein recognizes a motif in the paired immunoglobulin receptor B cytoplasmic domain. Commun Biol 2020; 3:672. [PMID: 33188360 PMCID: PMC7666139 DOI: 10.1038/s42003-020-01397-z] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2019] [Accepted: 10/22/2020] [Indexed: 12/30/2022] Open
Abstract
Hematopoietic adaptor containing SH3 and SAM domains-1 (HACS1) is a signaling protein with two juxtaposed protein–protein interaction domains and an intrinsically unstructured region that spans half the sequence. Here, we describe the interaction between the HACS1 SH3 domain and a sequence near the third immunoreceptor tyrosine-based inhibition motif (ITIM3) of the paired immunoglobulin receptor B (PIRB). From surface plasmon resonance binding assays using a mouse and human PIRB ITIM3 phosphopeptides as ligands, the HACS1 SH3 domain and SHP2 N-terminal SH2 domain demonstrated comparable affinities in the micromolar range. Since the PIRB ITIM3 sequence represents an atypical ligand for an SH3 domain, we determined the NMR structure of the HACS1 SH3 domain and performed a chemical shift mapping study. This study showed that the binding site on the HACS1 SH3 domain for PIRB shares many of the same amino acids found in a canonical binding cleft normally associated with polyproline ligands. Molecular modeling suggests that the respective binding sites in PIRB ITIM3 for the HACS1 SH3 domain and the SHP2 SH2 domain are too close to permit simultaneous binding. As a result, the HACS1-PIRB partnership has the potential to amalgamate signaling pathways that influence both immune and neuronal cell fate. Kwan et al. show the interaction between the HACS1 SH3 domain and a sequence near the third immunoreceptor tyrosine-based inhibition motif of the Paired immunoglobulin receptor B (PIRB). This study suggests that the HACS1-PIRB partnership has the potential to unite signaling pathways that regulate both immune and neuronal cell fate.
Collapse
|
53
|
Chui AJ, Griswold AR, Taabazuing CY, Orth EL, Gai K, Rao SD, Ball DP, Hsiao JC, Bachovchin DA. Activation of the CARD8 Inflammasome Requires a Disordered Region. Cell Rep 2020; 33:108264. [PMID: 33053349 PMCID: PMC7594595 DOI: 10.1016/j.celrep.2020.108264] [Citation(s) in RCA: 23] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2020] [Revised: 06/25/2020] [Accepted: 09/22/2020] [Indexed: 12/23/2022] Open
Abstract
Several cytosolic pattern-recognition receptors (PRRs) form multiprotein complexes called canonical inflammasomes in response to intracellular danger signals. Canonical inflammasomes recruit and activate caspase-1 (CASP1), which in turn cleaves and activates inflammatory cytokines and gasdermin D (GSDMD), inducing pyroptotic cell death. Inhibitors of the dipeptidyl peptidases DPP8 and DPP9 (DPP8/9) activate both the human NLRP1 and CARD8 inflammasomes. NLRP1 and CARD8 have different N-terminal regions but have similar C-terminal regions that undergo autoproteolysis to generate two non-covalently associated fragments. Here, we show that DPP8/9 inhibition activates a proteasomal degradation pathway that targets disordered and misfolded proteins for destruction. CARD8’s N terminus contains a disordered region of ~160 amino acids that is recognized and destroyed by this degradation pathway, thereby freeing its C-terminal fragment to activate CASP1 and induce pyroptosis. Thus, CARD8 serves as an alarm to signal the activation of a degradation pathway for disordered and misfolded proteins. Inflammasomes are multiprotein complexes that detect intracellular danger signals and stimulate powerful immune responses. DPP8/9 inhibitors activate the CARD8 inflammasome through an unknown mechanism. Here, Chui et al. show that DPP8/9 inhibitors induce the degradation of many disordered and misfolded proteins. CARD8 has an N-terminal disordered region that is degraded upon DPP8/9 inhibition, triggering inflammasome formation.
Collapse
Affiliation(s)
- Ashley J Chui
- Tri-Institutional PhD Program in Chemical Biology, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA
| | - Andrew R Griswold
- Weill Cornell/Rockefeller/Sloan Kettering Tri-Institutional MD-PhD Program, New York, NY 10065, USA
| | - Cornelius Y Taabazuing
- Chemical Biology Program, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA
| | - Elizabeth L Orth
- Tri-Institutional PhD Program in Chemical Biology, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA
| | - Kuo Gai
- Chemical Biology Program, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA
| | - Sahana D Rao
- Tri-Institutional PhD Program in Chemical Biology, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA
| | - Daniel P Ball
- Chemical Biology Program, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA
| | - Jeffrey C Hsiao
- Pharmacology Program of the Weill Cornell Graduate School of Medical Sciences, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA
| | - Daniel A Bachovchin
- Tri-Institutional PhD Program in Chemical Biology, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA; Chemical Biology Program, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA; Pharmacology Program of the Weill Cornell Graduate School of Medical Sciences, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA.
| |
Collapse
|
54
|
Pan Y, Zhou S, Guan J. Computationally identifying hot spots in protein-DNA binding interfaces using an ensemble approach. BMC Bioinformatics 2020; 21:384. [PMID: 32938375 PMCID: PMC7495898 DOI: 10.1186/s12859-020-03675-3] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
BACKGROUND Protein-DNA interaction governs a large number of cellular processes, and it can be altered by a small fraction of interface residues, i.e., the so-called hot spots, which account for most of the interface binding free energy. Accurate prediction of hot spots is critical to understand the principle of protein-DNA interactions. There are already some computational methods that can accurately and efficiently predict a large number of hot residues. However, the insufficiency of experimentally validated hot-spot residues in protein-DNA complexes and the low diversity of the employed features limit the performance of existing methods. RESULTS Here, we report a new computational method for effectively predicting hot spots in protein-DNA binding interfaces. This method, called PreHots (the abbreviation of Predicting Hotspots), adopts an ensemble stacking classifier that integrates different machine learning classifiers to generate a robust model with 19 features selected by a sequential backward feature selection algorithm. To this end, we constructed two new and reliable datasets (one benchmark for model training and one independent dataset for validation), which totally consist of 123 hot spots and 137 non-hot spots from 89 protein-DNA complexes. The data were manually collected from the literature and existing databases with a strict process of redundancy removal. Our method achieves a sensitivity of 0.813 and an AUC score of 0.868 in 10-fold cross-validation on the benchmark dataset, and a sensitivity of 0.818 and an AUC score of 0.820 on the independent test dataset. The results show that our approach outperforms the existing ones. CONCLUSIONS PreHots, which is based on stack ensemble of boosting algorithms, can reliably predict hot spots at the protein-DNA binding interface on a large scale. Compared with the existing methods, PreHots can achieve better prediction performance. Both the webserver of PreHots and the datasets are freely available at: http://dmb.tongji.edu.cn/tools/PreHots/ .
Collapse
Affiliation(s)
- Yuliang Pan
- Department of Computer Science and Technology, Tongji University, No. 4800 Caoan Road, Shanghai, 201804, China
| | - Shuigeng Zhou
- Shanghai Key Laboratory of Intelligent Information Processing, and School of Computer Science, Fudan University, No. 220 Handan Road, Shanghai, 200433, China
| | - Jihong Guan
- Department of Computer Science and Technology, Tongji University, No. 4800 Caoan Road, Shanghai, 201804, China.
| |
Collapse
|
55
|
Hiraoka M, Ishikawa A, Matsuzawa F, Aikawa SI, Sakurai A. A variant in the RP1L1 gene in a family with occult macular dystrophy in a predicted intrinsically disordered region. Ophthalmic Genet 2020; 41:599-605. [PMID: 32940107 DOI: 10.1080/13816810.2020.1821383] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
Abstract
SIGNIFICANCE The responsible genetic variants for occult macular dystrophy (OMD) were found at the predicted intrinsically disordered region (IDR) of the RP1L1 gene. PURPOSE We examined the phenotypes and genotypes of family members from OMD. In addition, the genetic characteristics of the RP1L1 gene in OMD were investigated. METHODS Whole-exome sequencing was applied on two affected family members, and Sanger sequencing was performed on three members. The structural property of RP1L1 and pathogenic variants was analyzed using predictor of natural disordered regions (PONDR). RESULTS Two affected members showed moderate visual impairment and relative central scotoma. The spectral domain optical coherence tomography (SD-OCT) images showed an absence of the interdigitation zone (IZ) and ellipsoid zone (EZ) in one case, and an obscure EZ line in the other case. A RP1L1 variant (c.3593 C > T, p.Ser1198Phe) was identified in two affected members but not in the unaffected member. The PONDR analysis showed that the region from p.1189 to p.1248 could be predicted to be an IDR in the RP1L1 molecule. And the p. Ser1198Phe variant showed significant reduction of PONDR score. CONCLUSIONS Although, the major pathogenic variant of OMD is p.Arg45Trp, multiple reports indicate that the region between p.1194 and p.1201 is another hot spot of OMD. The PONDR analysis predicted that the RP1L1 molecule is one of the intrinsically disordered proteins. It is speculated that the region around p.1200 is essential for the normal function of the RP1L1 molecule, and the missense variants of that area cause the development of OMD.
Collapse
Affiliation(s)
- Miki Hiraoka
- Department of Ophthalmology, Health Sciences University of Hokkaido , Sapporo, Hokkaido, Japan
| | - Aki Ishikawa
- Department of Medical Genetics and Genomics, Sapporo Medical University , Sapporo, Hokkaido Japan
| | | | | | - Akihiro Sakurai
- Department of Medical Genetics and Genomics, Sapporo Medical University , Sapporo, Hokkaido Japan
| |
Collapse
|
56
|
ODiNPred: comprehensive prediction of protein order and disorder. Sci Rep 2020; 10:14780. [PMID: 32901090 PMCID: PMC7479119 DOI: 10.1038/s41598-020-71716-1] [Citation(s) in RCA: 45] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2020] [Accepted: 08/10/2020] [Indexed: 12/13/2022] Open
Abstract
Structural disorder is widespread in eukaryotic proteins and is vital for their function in diverse biological processes. It is therefore highly desirable to be able to predict the degree of order and disorder from amino acid sequence. It is, however, notoriously difficult to predict the degree of local flexibility within structured domains and the presence and nuances of localized rigidity within intrinsically disordered regions. To identify such instances, we used the CheZOD database, which encompasses accurate, balanced, and continuous-valued quantification of protein (dis)order at amino acid resolution based on NMR chemical shifts. To computationally forecast the spectrum of protein disorder in the most comprehensive manner possible, we constructed the sequence-based protein order/disorder predictor ODiNPred, trained on an expanded version of CheZOD. ODiNPred applies a deep neural network comprising 157 unique sequence features to 1325 protein sequences together with the experimental NMR chemical shift data. Cross-validation for 117 protein sequences shows that ODiNPred better predicts the continuous variation in order along the protein sequence, suggesting that contemporary predictors are limited by the quality of training data. The inclusion of evolutionary features reduces the performance gap between ODiNPred and its peers, but analysis shows that it retains greater accuracy for the more challenging prediction of intermediate disorder.
Collapse
|
57
|
Draberova H, Janusova S, Knizkova D, Semberova T, Pribikova M, Ujevic A, Harant K, Knapkova S, Hrdinka M, Fanfani V, Stracquadanio G, Drobek A, Ruppova K, Stepanek O, Draber P. Systematic analysis of the IL-17 receptor signalosome reveals a robust regulatory feedback loop. EMBO J 2020; 39:e104202. [PMID: 32696476 PMCID: PMC7459424 DOI: 10.15252/embj.2019104202] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2019] [Revised: 06/13/2020] [Accepted: 06/17/2020] [Indexed: 12/24/2022] Open
Abstract
IL-17 mediates immune protection from fungi and bacteria, as well as it promotes autoimmune pathologies. However, the regulation of the signal transduction from the IL-17 receptor (IL-17R) remained elusive. We developed a novel mass spectrometry-based approach to identify components of the IL-17R complex followed by analysis of their roles using reverse genetics. Besides the identification of linear ubiquitin chain assembly complex (LUBAC) as an important signal transducing component of IL-17R, we established that IL-17 signaling is regulated by a robust negative feedback loop mediated by TBK1 and IKKε. These kinases terminate IL-17 signaling by phosphorylating the adaptor ACT1 leading to the release of the essential ubiquitin ligase TRAF6 from the complex. NEMO recruits both kinases to the IL-17R complex, documenting that NEMO has an unprecedented negative function in IL-17 signaling, distinct from its role in NF-κB activation. Our study provides a comprehensive view of the molecular events of the IL-17 signal transduction and its regulation.
Collapse
Affiliation(s)
- Helena Draberova
- Laboratory of Immunity & Cell CommunicationBIOCEVFirst Faculty of MedicineCharles UniversityVestecCzech Republic
- Laboratory of Adaptive ImmunityInstitute of Molecular Genetics of the Czech Academy of SciencesPragueCzech Republic
| | - Sarka Janusova
- Laboratory of Adaptive ImmunityInstitute of Molecular Genetics of the Czech Academy of SciencesPragueCzech Republic
| | - Daniela Knizkova
- Laboratory of Immunity & Cell CommunicationBIOCEVFirst Faculty of MedicineCharles UniversityVestecCzech Republic
- Laboratory of Adaptive ImmunityInstitute of Molecular Genetics of the Czech Academy of SciencesPragueCzech Republic
| | - Tereza Semberova
- Laboratory of Immunity & Cell CommunicationBIOCEVFirst Faculty of MedicineCharles UniversityVestecCzech Republic
- Laboratory of Adaptive ImmunityInstitute of Molecular Genetics of the Czech Academy of SciencesPragueCzech Republic
| | - Michaela Pribikova
- Laboratory of Immunity & Cell CommunicationBIOCEVFirst Faculty of MedicineCharles UniversityVestecCzech Republic
- Laboratory of Adaptive ImmunityInstitute of Molecular Genetics of the Czech Academy of SciencesPragueCzech Republic
| | - Andrea Ujevic
- Laboratory of Immunity & Cell CommunicationBIOCEVFirst Faculty of MedicineCharles UniversityVestecCzech Republic
- Laboratory of Adaptive ImmunityInstitute of Molecular Genetics of the Czech Academy of SciencesPragueCzech Republic
| | - Karel Harant
- Laboratory of Mass SpectrometryBIOCEVFaculty of ScienceCharles UniversityPragueCzech Republic
| | - Sofija Knapkova
- Department of HaematooncologyUniversity Hospital OstravaOstravaCzech Republic
- Faculty of MedicineUniversity of OstravaOstravaCzech Republic
| | - Matous Hrdinka
- Department of HaematooncologyUniversity Hospital OstravaOstravaCzech Republic
- Faculty of MedicineUniversity of OstravaOstravaCzech Republic
| | - Viola Fanfani
- Institute of Quantitative Biology, Biochemistry, and BiotechnologySynthSysSchool of Biological SciencesUniversity of EdinburghEdinburghUK
| | - Giovanni Stracquadanio
- Institute of Quantitative Biology, Biochemistry, and BiotechnologySynthSysSchool of Biological SciencesUniversity of EdinburghEdinburghUK
| | - Ales Drobek
- Laboratory of Adaptive ImmunityInstitute of Molecular Genetics of the Czech Academy of SciencesPragueCzech Republic
| | - Klara Ruppova
- Laboratory of Adaptive ImmunityInstitute of Molecular Genetics of the Czech Academy of SciencesPragueCzech Republic
| | - Ondrej Stepanek
- Laboratory of Adaptive ImmunityInstitute of Molecular Genetics of the Czech Academy of SciencesPragueCzech Republic
| | - Peter Draber
- Laboratory of Immunity & Cell CommunicationBIOCEVFirst Faculty of MedicineCharles UniversityVestecCzech Republic
- Laboratory of Adaptive ImmunityInstitute of Molecular Genetics of the Czech Academy of SciencesPragueCzech Republic
| |
Collapse
|
58
|
Hernández-Segura T, Pastor N. Identification of an α-MoRF in the Intrinsically Disordered Region of the Escargot Transcription Factor. ACS OMEGA 2020; 5:18331-18341. [PMID: 32743208 PMCID: PMC7392517 DOI: 10.1021/acsomega.0c02051] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 05/03/2020] [Accepted: 07/02/2020] [Indexed: 06/11/2023]
Abstract
Molecular recognition features (MoRFs) are common in intrinsically disordered proteins (IDPs) and intrinsically disordered regions (IDRs). MoRFs are in constant order-disorder structural transitions and adopt well-defined structures once they are bound to their targets. Here, we study Escargot (Esg), a transcription factor in Drosophila melanogaster that regulates multiple cellular functions, and consists of a disordered N-terminal domain and a group of zinc fingers at its C-terminal domain. We analyzed the N-terminal domain of Esg with disorder predictors and identified a region of 45 amino acids with high probability to form ordered structures, which we named S2. Through 54 μs of molecular dynamics (MD) simulations using CHARMM36 and implicit solvent (generalized Born/surface area (GBSA)), we characterized the conformational landscape of S2 and found an α-MoRF of ∼16 amino acids stabilized by key contacts within the helix. To test the importance of these contacts in the stability of the α-MoRF, we evaluated the effect of point mutations that would impair these interactions, running 24 μs of MD for each mutation. The mutations had mild effects on the MoRF, and in some cases, led to gain of residual structure through long-range contacts of the α-MoRF and the rest of the S2 region. As this could be an effect of the force field and solvent model we used, we benchmarked our simulation protocol by carrying out 32 μs of MD for the (AAQAA)3 peptide. The results of the benchmark indicate that the global amount of helix in shorter peptides like (AAQAA)3 is reasonably predicted. Careful analysis of the runs of S2 and its mutants suggests that the mutation to hydrophobic residues may have nucleated long-range hydrophobic and aromatic interactions that stabilize the MoRF. Finally, we have identified a set of residues that stabilize an α-MoRF in a region still without functional annotations in Esg.
Collapse
Affiliation(s)
- Teresa Hernández-Segura
- Laboratorio
de Dinámica de Proteínas, Centro de Investigación
en Dinámica Celular-IICBA, Universidad
Autónoma del Estado de Morelos, Av. Universidad 1001, Chamilpa, 62209 Cuernavaca, México
- Doctorado
en Ciencias CIDC-IICBA, Universidad Autónoma
del Estado de Morelos, Cuernavaca 62209, Morelos, México
| | - Nina Pastor
- Laboratorio
de Dinámica de Proteínas, Centro de Investigación
en Dinámica Celular-IICBA, Universidad
Autónoma del Estado de Morelos, Av. Universidad 1001, Chamilpa, 62209 Cuernavaca, México
| |
Collapse
|
59
|
Pei J, Kinch LN, Otwinowski Z, Grishin NV. Mutation severity spectrum of rare alleles in the human genome is predictive of disease type. PLoS Comput Biol 2020; 16:e1007775. [PMID: 32413045 PMCID: PMC7255613 DOI: 10.1371/journal.pcbi.1007775] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2019] [Revised: 05/28/2020] [Accepted: 03/06/2020] [Indexed: 12/19/2022] Open
Abstract
The human genome harbors a variety of genetic variations. Single-nucleotide changes that alter amino acids in protein-coding regions are one of the major causes of human phenotypic variation and diseases. These single-amino acid variations (SAVs) are routinely found in whole genome and exome sequencing. Evaluating the functional impact of such genomic alterations is crucial for diagnosis of genetic disorders. We developed DeepSAV, a deep-learning convolutional neural network to differentiate disease-causing and benign SAVs based on a variety of protein sequence, structural and functional properties. Our method outperforms most stand-alone programs, and the version incorporating population and gene-level information (DeepSAV+PG) has similar predictive power as some of the best available. We transformed DeepSAV scores of rare SAVs in the human population into a quantity termed "mutation severity measure" for each human protein-coding gene. It reflects a gene's tolerance to deleterious missense mutations and serves as a useful tool to study gene-disease associations. Genes implicated in cancer, autism, and viral interaction are found by this measure as intolerant to mutations, while genes associated with a number of other diseases are scored as tolerant. Among known disease-associated genes, those that are mutation-intolerant are likely to function in development and signal transduction pathways, while those that are mutation-tolerant tend to encode metabolic and mitochondrial proteins.
Collapse
Affiliation(s)
- Jimin Pei
- Howard Hughes Medical Institute, University of Texas Southwestern Medical Center, Dallas, Texas, United States of America
| | - Lisa N. Kinch
- Howard Hughes Medical Institute, University of Texas Southwestern Medical Center, Dallas, Texas, United States of America
| | - Zbyszek Otwinowski
- Departments of Biophysics and Biochemistry, University of Texas Southwestern Medical Center, Dallas, Texas, United States of America
| | - Nick V. Grishin
- Howard Hughes Medical Institute, University of Texas Southwestern Medical Center, Dallas, Texas, United States of America
- Departments of Biophysics and Biochemistry, University of Texas Southwestern Medical Center, Dallas, Texas, United States of America
- * E-mail:
| |
Collapse
|
60
|
SLX4 interacts with RTEL1 to prevent transcription-mediated DNA replication perturbations. Nat Struct Mol Biol 2020; 27:438-449. [PMID: 32398829 DOI: 10.1038/s41594-020-0419-3] [Citation(s) in RCA: 26] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2019] [Accepted: 03/17/2020] [Indexed: 12/20/2022]
Abstract
The SLX4 tumor suppressor is a scaffold that plays a pivotal role in several aspects of genome protection, including homologous recombination, interstrand DNA crosslink repair and the maintenance of common fragile sites and telomeres. Here, we unravel an unexpected direct interaction between SLX4 and the DNA helicase RTEL1, which, until now, were viewed as having independent and antagonistic functions. We identify cancer and Hoyeraal-Hreidarsson syndrome-associated mutations in SLX4 and RTEL1, respectively, that abolish SLX4-RTEL1 complex formation. We show that both proteins are recruited to nascent DNA, tightly co-localize with active RNA pol II, and that SLX4, in complex with RTEL1, promotes FANCD2/RNA pol II co-localization. Importantly, disrupting the SLX4-RTEL1 interaction leads to DNA replication defects in unstressed cells, which are rescued by inhibiting transcription. Our data demonstrate that SLX4 and RTEL1 interact to prevent replication-transcription conflicts and provide evidence that this is independent of the nuclease scaffold function of SLX4.
Collapse
|
61
|
Niemeyer M, Moreno Castillo E, Ihling CH, Iacobucci C, Wilde V, Hellmuth A, Hoehenwarter W, Samodelov SL, Zurbriggen MD, Kastritis PL, Sinz A, Calderón Villalobos LIA. Flexibility of intrinsically disordered degrons in AUX/IAA proteins reinforces auxin co-receptor assemblies. Nat Commun 2020; 11:2277. [PMID: 32385295 PMCID: PMC7210949 DOI: 10.1038/s41467-020-16147-2] [Citation(s) in RCA: 24] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2019] [Accepted: 04/17/2020] [Indexed: 12/31/2022] Open
Abstract
Cullin RING-type E3 ubiquitin ligases SCFTIR1/AFB1-5 and their AUX/IAA targets perceive the phytohormone auxin. The F-box protein TIR1 binds a surface-exposed degron in AUX/IAAs promoting their ubiquitylation and rapid auxin-regulated proteasomal degradation. Here, by adopting biochemical, structural proteomics and in vivo approaches we unveil how flexibility in AUX/IAAs and regions in TIR1 affect their conformational ensemble allowing surface accessibility of degrons. We resolve TIR1·auxin·IAA7 and TIR1·auxin·IAA12 complex topology, and show that flexible intrinsically disordered regions (IDRs) in the degron’s vicinity, cooperatively position AUX/IAAs on TIR1. We identify essential residues at the TIR1 N- and C-termini, which provide non-native interaction interfaces with IDRs and the folded PB1 domain of AUX/IAAs. We thereby establish a role for IDRs in modulating auxin receptor assemblies. By securing AUX/IAAs on two opposite surfaces of TIR1, IDR diversity supports locally tailored positioning for targeted ubiquitylation, and might provide conformational flexibility for a multiplicity of functional states. Auxin-mediated recruitment of AUX/IAAs by the F-box protein TIR1 prompts rapid AUX/IAA ubiquitylation and degradation. By resolving auxin receptor topology, the authors show that intrinsically disordered regions near the degrons of two Aux/IAA proteins reinforce complex assembly and position Aux/IAAs for ubiquitylation.
Collapse
Affiliation(s)
- Michael Niemeyer
- Molecular Signal Processing Department, Leibniz Institute of Plant Biochemistry (IPB), Weinberg 3, 06120, Halle (Saale), Germany
| | - Elena Moreno Castillo
- Molecular Signal Processing Department, Leibniz Institute of Plant Biochemistry (IPB), Weinberg 3, 06120, Halle (Saale), Germany
| | - Christian H Ihling
- Department of Pharmaceutical Chemistry & Bioanalytics, Institute of Pharmacy, Martin Luther University Halle-Wittenberg, Charles Tanford Protein Center, Kurt-Mothes-Straße 3a, 06120, Halle (Saale), Germany
| | - Claudio Iacobucci
- Department of Pharmaceutical Chemistry & Bioanalytics, Institute of Pharmacy, Martin Luther University Halle-Wittenberg, Charles Tanford Protein Center, Kurt-Mothes-Straße 3a, 06120, Halle (Saale), Germany
| | - Verona Wilde
- Molecular Signal Processing Department, Leibniz Institute of Plant Biochemistry (IPB), Weinberg 3, 06120, Halle (Saale), Germany
| | - Antje Hellmuth
- Molecular Signal Processing Department, Leibniz Institute of Plant Biochemistry (IPB), Weinberg 3, 06120, Halle (Saale), Germany
| | - Wolfgang Hoehenwarter
- Proteome Analytics, Leibniz Institute of Plant Biochemistry (IPB), Weinberg 3, 06120, Halle (Saale), Germany
| | - Sophia L Samodelov
- Institute of Synthetic Biology & Cluster of Excellence on Plant Science (CEPLAS), Heinrich-Heine University of Düsseldorf, Universitätsstrasse 1, 40225, Düsseldorf, Germany
| | - Matias D Zurbriggen
- Institute of Synthetic Biology & Cluster of Excellence on Plant Science (CEPLAS), Heinrich-Heine University of Düsseldorf, Universitätsstrasse 1, 40225, Düsseldorf, Germany
| | - Panagiotis L Kastritis
- ZIK HALOMEM & Institute of Biochemistry and Biotechnology, Martin Luther University Halle-Wittenberg, Biozentrum, Weinbergweg 22, 06120, Halle (Saale), Germany
| | - Andrea Sinz
- Department of Pharmaceutical Chemistry & Bioanalytics, Institute of Pharmacy, Martin Luther University Halle-Wittenberg, Charles Tanford Protein Center, Kurt-Mothes-Straße 3a, 06120, Halle (Saale), Germany
| | - Luz Irina A Calderón Villalobos
- Molecular Signal Processing Department, Leibniz Institute of Plant Biochemistry (IPB), Weinberg 3, 06120, Halle (Saale), Germany.
| |
Collapse
|
62
|
Julien M, Miron S, Carreira A, Theillet FX, Zinn-Justin S. 1H, 13C and 15N backbone resonance assignment of the human BRCA2 N-terminal region. BIOMOLECULAR NMR ASSIGNMENTS 2020; 14:79-85. [PMID: 31900740 DOI: 10.1007/s12104-019-09924-8] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/21/2019] [Accepted: 12/20/2019] [Indexed: 06/10/2023]
Abstract
The Breast Cancer susceptibility protein 2 (BRCA2) is involved in mechanisms that maintain genome stability, including DNA repair, replication and cell division. These functions are ensured by the folded C-terminal DNA binding domain of BRCA2 but also by its large regions predicted to be disordered. Several studies have shown that disordered regions of BRCA2 are subjected to phosphorylation, thus regulating BRCA2 interactions through the cell cycle. The N-terminal region of BRCA2 contains two highly conserved clusters of phosphorylation sites between amino acids 75 and 210. Upon phosphorylation by CDK, the cluster 1 is known to become a docking site for the kinase PLK1. The cluster 2 is phosphorylated by PLK1 at least at two positions. Both of these phosphorylation clusters are important for mitosis progression, in particular for chromosome segregation and cytokinesis. In order to identify the phosphorylated residues and to characterize the phosphorylation sites preferences and their functional consequences within BRCA2 N-terminus, we have produced and analyzed the BRCA2 fragment from amino acid 48 to amino acid 284 (BRCA248-284). Here, we report the assignment of 1H, 15N, 13CO, 13Cα and 13Cβ NMR chemical shifts of this region. Analysis of these chemical shifts confirmed that BRCA248-284 shows no stable fold: it is intrinsically disordered, with only short, transient α-helices.
Collapse
Affiliation(s)
- Manon Julien
- Institute for Integrative Biology of the Cell (I2BC), CEA, CNRS, Univ Paris-Sud, Université Paris-Saclay, Gif-sur-Yvette Cedex, France
- Paris Sud University, Paris-Saclay University CNRS, UMR3348, 91405, Orsay, France
| | - Simona Miron
- Institute for Integrative Biology of the Cell (I2BC), CEA, CNRS, Univ Paris-Sud, Université Paris-Saclay, Gif-sur-Yvette Cedex, France
| | - Aura Carreira
- Paris Sud University, Paris-Saclay University CNRS, UMR3348, 91405, Orsay, France
- Institut Curie, PSL Research University, UMR3348, 91405, Orsay, France
- CNRS, UMR3348, 91405, Orsay, France
| | - François-Xavier Theillet
- Institute for Integrative Biology of the Cell (I2BC), CEA, CNRS, Univ Paris-Sud, Université Paris-Saclay, Gif-sur-Yvette Cedex, France
| | - Sophie Zinn-Justin
- Institute for Integrative Biology of the Cell (I2BC), CEA, CNRS, Univ Paris-Sud, Université Paris-Saclay, Gif-sur-Yvette Cedex, France.
| |
Collapse
|
63
|
Lv X, Chen J, Lu Y, Chen Z, Xiao N, Yang Y. Accurately Predicting Mutation-Caused Stability Changes from Protein Sequences Using Extreme Gradient Boosting. J Chem Inf Model 2020; 60:2388-2395. [PMID: 32203653 DOI: 10.1021/acs.jcim.0c00064] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
Accurately predicting the impact of point mutation on protein stability has crucial roles in protein design and engineering. In this study, we proposed a novel method (BoostDDG) to predict stability changes upon point mutations from protein sequences based on the extreme gradient boosting. We extracted features comprehensively from evolutional information and predicted structures and performed feature selection by a strategy of sequential forward selection. The features and parameters were optimized by homologue-based cross-validation to avoid overfitting. Finally, we found that 14 features from six groups led to the highest Pearson correlation coefficient (PCC) of 0.535, which is consistent with the 0.540 on an independent test. Our method was indicated to consistently outperform other sequence-based methods on three precompiled test sets, and 7363 variants on two proteins (PTEN and TPMT). These results highlighted that BoostDDG is a powerful tool for predicting stability changes upon point mutations from protein sequences.
Collapse
Affiliation(s)
- Xuan Lv
- State Key Laboratory of High-Performance Computing, School of Computer Science, National University of Defense Technology, Changsha, Hunan 410073, China
| | - Jianwen Chen
- School of Data and Computer Science, Sun Yat-sen University, Guangzhou, Guangdong 510275, China
| | - Yutong Lu
- School of Data and Computer Science, Sun Yat-sen University, Guangzhou, Guangdong 510275, China
| | - Zhiguang Chen
- School of Data and Computer Science, Sun Yat-sen University, Guangzhou, Guangdong 510275, China
| | - Nong Xiao
- State Key Laboratory of High-Performance Computing, School of Computer Science, National University of Defense Technology, Changsha, Hunan 410073, China.,School of Data and Computer Science, Sun Yat-sen University, Guangzhou, Guangdong 510275, China
| | - Yuedong Yang
- School of Data and Computer Science, Sun Yat-sen University, Guangzhou, Guangdong 510275, China.,Key Laboratory of Machine Intelligence and Advanced Computing, Sun Yat-sen University, Ministry of Education, Guangzhou, Guangdong 510275, China
| |
Collapse
|
64
|
Hanson J, Paliwal KK, Litfin T, Zhou Y. SPOT-Disorder2: Improved Protein Intrinsic Disorder Prediction by Ensembled Deep Learning. GENOMICS PROTEOMICS & BIOINFORMATICS 2020; 17:645-656. [PMID: 32173600 PMCID: PMC7212484 DOI: 10.1016/j.gpb.2019.01.004] [Citation(s) in RCA: 87] [Impact Index Per Article: 21.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/26/2018] [Revised: 01/18/2019] [Accepted: 02/15/2019] [Indexed: 01/13/2023]
Abstract
Intrinsically disordered or unstructured proteins (or regions in proteins) have been found to be important in a wide range of biological functions and implicated in many diseases. Due to the high cost and low efficiency of experimental determination of intrinsic disorder and the exponential increase of unannotated protein sequences, developing complementary computational prediction methods has been an active area of research for several decades. Here, we employed an ensemble of deep Squeeze-and-Excitation residual inception and long short-term memory (LSTM) networks for predicting protein intrinsic disorder with input from evolutionary information and predicted one-dimensional structural properties. The method, called SPOT-Disorder2, offers substantial and consistent improvement not only over our previous technique based on LSTM networks alone, but also over other state-of-the-art techniques in three independent tests with different ratios of disordered to ordered amino acid residues, and for sequences with either rich or limited evolutionary information. More importantly, semi-disordered regions predicted in SPOT-Disorder2 are more accurate in identifying molecular recognition features (MoRFs) than methods directly designed for MoRFs prediction. SPOT-Disorder2 is available as a web server and as a standalone program at https://sparks-lab.org/server/spot-disorder2/.
Collapse
Affiliation(s)
- Jack Hanson
- Signal Processing Laboratory, Griffith University, Brisbane 4111, Australia
| | - Kuldip K Paliwal
- Signal Processing Laboratory, Griffith University, Brisbane 4111, Australia
| | - Thomas Litfin
- School of Information and Communication Technology, Griffith University, Gold Coast 4222, Australia
| | - Yaoqi Zhou
- School of Information and Communication Technology, Griffith University, Gold Coast 4222, Australia; Institute for Glycomics, Griffith University, Gold Coast 4222, Australia.
| |
Collapse
|
65
|
Liu Y, Wang X, Liu B. RFPR-IDP: reduce the false positive rates for intrinsically disordered protein and region prediction by incorporating both fully ordered proteins and disordered proteins. Brief Bioinform 2020; 22:2000-2011. [PMID: 32112084 PMCID: PMC7986600 DOI: 10.1093/bib/bbaa018] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022] Open
Abstract
As an important type of proteins, intrinsically disordered proteins/regions (IDPs/IDRs) are related to many crucial biological functions. Accurate prediction of IDPs/IDRs is beneficial to the prediction of protein structures and functions. Most of the existing methods ignore the fully ordered proteins without IDRs during training and test processes. As a result, the corresponding predictors prefer to predict the fully ordered proteins as disordered proteins. Unfortunately, these methods were only evaluated on datasets consisting of disordered proteins without or with only a few fully ordered proteins, and therefore, this problem escapes the attention of the researchers. However, most of the newly sequenced proteins are fully ordered proteins in nature. These predictors fail to accurately predict the ordered and disordered proteins in real-world applications. In this regard, we propose a new method called RFPR-IDP trained with both fully ordered proteins and disordered proteins, which is constructed based on the combination of convolution neural network (CNN) and bidirectional long short-term memory (BiLSTM). The experimental results show that although the existing predictors perform well for predicting the disordered proteins, they tend to predict the fully ordered proteins as disordered proteins. In contrast, the RFPR-IDP predictor can correctly predict the fully ordered proteins and outperform the other 10 state-of-the-art methods when evaluated on a test dataset with both fully ordered proteins and disordered proteins. The web server and datasets of RFPR-IDP are freely available at http://bliulab.net/RFPR-IDP/server.
Collapse
Affiliation(s)
- Yumeng Liu
- School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen, Guangdong 518055, China
| | - Xiaolong Wang
- School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen, Guangdong 518055, China
| | - Bin Liu
- School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen, Guangdong 518055, China.,School of Computer Science and Technology, Beijing Institute of Technology, Beijing 100081, China.,Advanced Research Institute of Multidisciplinary Science, Beijing Institute of Technology, Beijing 100081, China
| |
Collapse
|
66
|
Zhu L, Zheng H. Biomedical event extraction with a novel combination strategy based on hybrid deep neural networks. BMC Bioinformatics 2020; 21:47. [PMID: 32028883 PMCID: PMC7006190 DOI: 10.1186/s12859-020-3376-2] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2019] [Accepted: 01/20/2020] [Indexed: 11/10/2022] Open
Abstract
Background Biomedical event extraction is a fundamental and in-demand technology that has attracted substantial interest from many researchers. Previous works have heavily relied on manual designed features and external NLP packages in which the feature engineering is large and complex. Additionally, most of the existing works use the pipeline process that breaks down a task into simple sub-tasks but ignores the interaction between them. To overcome these limitations, we propose a novel event combination strategy based on hybrid deep neural networks to settle the task in a joint end-to-end manner. Results We adapted our method to several annotated corpora of biomedical event extraction tasks. Our method achieved state-of-the-art performance with noticeable overall F1 score improvement compared to that of existing methods for all of these corpora. Conclusions The experimental results demonstrated that our method is effective for biomedical event extraction. The combination strategy can reconstruct complex events from the output of deep neural networks, while the deep neural networks effectively capture the feature representation from the raw text. The biomedical event extraction implementation is available online at http://www.predictor.xin/event_extraction.
Collapse
Affiliation(s)
- Lvxing Zhu
- School of Computer Science and Technology, University of Science and Technology of China, Huangshan Road, Hefei, 230026, People's Republic of China
| | - Haoran Zheng
- School of Computer Science and Technology, University of Science and Technology of China, Huangshan Road, Hefei, 230026, People's Republic of China. .,Anhui Key Laboratory of Software Engineering in Computing and Communication, University of Science and Technology of China, Huangshan Road, Hefei, 230026, People's Republic of China. .,Anhui Province Key Lab. of Big Data Analysis and Application, University of Science and Technology of China, Huangshan Road, Hefei, 230026, People's Republic of China.
| |
Collapse
|
67
|
Torrisi M, Pollastri G, Le Q. Deep learning methods in protein structure prediction. Comput Struct Biotechnol J 2020; 18:1301-1310. [PMID: 32612753 PMCID: PMC7305407 DOI: 10.1016/j.csbj.2019.12.011] [Citation(s) in RCA: 110] [Impact Index Per Article: 27.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2019] [Revised: 12/19/2019] [Accepted: 12/20/2019] [Indexed: 01/01/2023] Open
Abstract
Protein Structure Prediction is a central topic in Structural Bioinformatics. Since the '60s statistical methods, followed by increasingly complex Machine Learning and recently Deep Learning methods, have been employed to predict protein structural information at various levels of detail. In this review, we briefly introduce the problem of protein structure prediction and essential elements of Deep Learning (such as Convolutional Neural Networks, Recurrent Neural Networks and basic feed-forward Neural Networks they are founded on), after which we discuss the evolution of predictive methods for one-dimensional and two-dimensional Protein Structure Annotations, from the simple statistical methods of the early days, to the computationally intensive highly-sophisticated Deep Learning algorithms of the last decade. In the process, we review the growth of the databases these algorithms are based on, and how this has impacted our ability to leverage knowledge about evolution and co-evolution to achieve improved predictions. We conclude this review outlining the current role of Deep Learning techniques within the wider pipelines to predict protein structures and trying to anticipate what challenges and opportunities may arise next.
Collapse
Affiliation(s)
- Mirko Torrisi
- School of Computer Science, University College Dublin, Ireland
| | | | - Quan Le
- Centre for Applied Data Analytics Research, University College Dublin, Ireland
| |
Collapse
|
68
|
Katuwawala A, Oldfield CJ, Kurgan L. DISOselect: Disorder predictor selection at the protein level. Protein Sci 2020; 29:184-200. [PMID: 31642118 PMCID: PMC6933862 DOI: 10.1002/pro.3756] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2019] [Revised: 10/16/2019] [Accepted: 10/17/2019] [Indexed: 12/27/2022]
Abstract
The intense interest in the intrinsically disordered proteins in the life science community, together with the remarkable advancements in predictive technologies, have given rise to the development of a large number of computational predictors of intrinsic disorder from protein sequence. While the growing number of predictors is a positive trend, we have observed a considerable difference in predictive quality among predictors for individual proteins. Furthermore, variable predictor performance is often inconsistent between predictors for different proteins, and the predictor that shows the best predictive performance depends on the unique properties of each protein sequence. We propose a computational approach, DISOselect, to estimate the predictive performance of 12 selected predictors for individual proteins based on their unique sequence-derived properties. This estimation informs the users about the expected predictive quality for a selected disorder predictor and can be used to recommend methods that are likely to provide the best quality predictions. Our solution does not depend on the results of any disorder predictor; the estimations are made based solely on the protein sequence. Our solution significantly improves predictive performance, as judged with a test set of 1,000 proteins, when compared to other alternatives. We have empirically shown that by using the recommended methods the overall predictive performance for a given set of proteins can be improved by a statistically significant margin. DISOselect is freely available for non-commercial users through the webserver at http://biomine.cs.vcu.edu/servers/DISOselect/.
Collapse
Affiliation(s)
- Akila Katuwawala
- Department of Computer ScienceVirginia Commonwealth UniversityRichmondVirginia
| | | | - Lukasz Kurgan
- Department of Computer ScienceVirginia Commonwealth UniversityRichmondVirginia
| |
Collapse
|
69
|
Khanh Le NQ, Nguyen QH, Chen X, Rahardja S, Nguyen BP. Classification of adaptor proteins using recurrent neural networks and PSSM profiles. BMC Genomics 2019; 20:966. [PMID: 31874633 PMCID: PMC6929330 DOI: 10.1186/s12864-019-6335-4] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2019] [Accepted: 11/25/2019] [Indexed: 12/16/2022] Open
Abstract
BACKGROUND Adaptor proteins are carrier proteins that play a crucial role in signal transduction. They commonly consist of several modular domains, each having its own binding activity and operating by forming complexes with other intracellular-signaling molecules. Many studies determined that the adaptor proteins had been implicated in a variety of human diseases. Therefore, creating a precise model to predict the function of adaptor proteins is one of the vital tasks in bioinformatics and computational biology. Few computational biology studies have been conducted to predict the protein functions, and in most of those studies, position specific scoring matrix (PSSM) profiles had been used as the features to be fed into the neural networks. However, the neural networks could not reach the optimal result because the sequential information in PSSMs has been lost. This study proposes an innovative approach by incorporating recurrent neural networks (RNNs) and PSSM profiles to resolve this problem. RESULTS Compared to other state-of-the-art methods which had been applied successfully in other problems, our method achieves enhancement in all of the common measurement metrics. The area under the receiver operating characteristic curve (AUC) metric in prediction of adaptor proteins in the cross-validation and independent datasets are 0.893 and 0.853, respectively. CONCLUSIONS This study opens a research path that can promote the use of RNNs and PSSM profiles in bioinformatics and computational biology. Our approach is reproducible by scientists that aim to improve the performance results of different protein function prediction problems. Our source code and datasets are available at https://github.com/ngphubinh/adaptors.
Collapse
Affiliation(s)
- Nguyen Quoc Khanh Le
- Professional Master Program in Artificial Intelligence in Medicine, Taipei Medical University, Keelung Road, Da'an Distric, Taipei City 106, Taiwan (R.O.C.)
| | - Quang H Nguyen
- School of Information and Communication Technology, Hanoi University of Science and Technology, 1 Dai Co Viet, Hanoi 100000, Vietnam
| | - Xuan Chen
- Beijing Genomics Institute, 21 Hongan 3rd Street, Shenzhen 518083, China
| | - Susanto Rahardja
- School of Marine Science and Technology, Northwestern Polytechnical University, 127 West Youyi Road, Xi'an 710072, China.
| | - Binh P Nguyen
- School of Mathematics and Statistics, Victoria University of Wellington, Gate 7, Kelburn Parade, Wellington 6140, New Zealand
| |
Collapse
|
70
|
Chen S, Sun Z, Lin L, Liu Z, Liu X, Chong Y, Lu Y, Zhao H, Yang Y. To Improve Protein Sequence Profile Prediction through Image Captioning on Pairwise Residue Distance Map. J Chem Inf Model 2019; 60:391-399. [PMID: 31800243 DOI: 10.1021/acs.jcim.9b00438] [Citation(s) in RCA: 22] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Protein sequence profile prediction aims to generate multiple sequences from structural information to advance the protein design. Protein sequence profile can be computationally predicted by energy-based or fragment-based methods. By integrating these methods with neural networks, our previous method, SPIN2, has achieved a sequence recovery rate of 34%. However, SPIN2 employed only one-dimensional (1D) structural properties that are not sufficient to represent three-dimensional (3D) structures. In this study, we represented 3D structures by 2D maps of pairwise residue distances and developed a new method (SPROF) to predict protein sequence profiles based on an image captioning learning frame. To our best knowledge, this is the first method to employ a 2D distance map for predicting protein properties. SPROF achieved 39.8% in sequence recovery of residues on the independent test set, representing a 5.2% improvement over SPIN2. We also found the sequence recovery increased with the number of their neighbored residues in 3D structural space, indicating that our method can effectively learn long-range information from the 2D distance map. Thus, such network architecture using a 2D distance map is expected to be useful for other 3D structure-based applications, such as binding site prediction, protein function prediction, and protein interaction prediction. The online server and the source code is available at http://biomed.nscc-gz.cn and https://github.com/biomed-AI/SPROF , respectively.
Collapse
Affiliation(s)
- Sheng Chen
- School of Data and Computer Science , Sun Yat-sen University , Guangzhou 510000 , China
| | - Zhe Sun
- School of Data and Computer Science , Sun Yat-sen University , Guangzhou 510000 , China
| | - Lihua Lin
- School of Data and Computer Science , Sun Yat-sen University , Guangzhou 510000 , China
| | - Zifeng Liu
- Third Affiliated Hospital of Sun Yat-sen University , Guangzhou 510000 , China
| | - Xun Liu
- Third Affiliated Hospital of Sun Yat-sen University , Guangzhou 510000 , China
| | - Yutian Chong
- Third Affiliated Hospital of Sun Yat-sen University , Guangzhou 510000 , China
| | - Yutong Lu
- School of Data and Computer Science , Sun Yat-sen University , Guangzhou 510000 , China
| | - Huiying Zhao
- Sun Yat-sen Memorial Hospital , Sun Yat-sen University , Guangzhou 510000 , China
| | - Yuedong Yang
- School of Data and Computer Science , Sun Yat-sen University , Guangzhou 510000 , China.,Key Laboratory of Machine Intelligence and Advanced Computing (Sun Yat-sen University) of the Ministry of Education , Guangzhou 510000 , China
| |
Collapse
|
71
|
Rodriguez G, Orris B, Majumdar A, Bhat S, Stivers JT. Macromolecular crowding induces compaction and DNA binding in the disordered N-terminal domain of hUNG2. DNA Repair (Amst) 2019; 86:102764. [PMID: 31855846 DOI: 10.1016/j.dnarep.2019.102764] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2019] [Revised: 11/25/2019] [Accepted: 12/04/2019] [Indexed: 11/15/2022]
Abstract
Many human DNA repair proteins have disordered domains at their N- or C-termini with poorly defined biological functions. We recently reported that the partially structured N-terminal domain (NTD) of human uracil DNA glycosylase 2 (hUNG2), functions to enhance DNA translocation in crowded environments and also targets the enzyme to single-stranded/double-stranded DNA junctions. To understand the structural basis for these effects we now report high-resolution heteronuclear NMR studies of the isolated NTD in the presence and absence of an inert macromolecular crowding agent (PEG8K). Compared to dilute buffer, we find that crowding reduces the degrees of freedom for the structural ensemble, increases the order of a PCNA binding motif and dramatically promotes binding of the NTD for DNA through a conformational selection mechanism. These findings shed new light on the function of this disordered domain in the context of the crowded nuclear environment.
Collapse
Affiliation(s)
- Gaddiel Rodriguez
- Department of Pharmacology and Molecular Sciences, The Johns Hopkins University School of Medicine, 725 North Wolfe Street, Baltimore, MD 21205, United States
| | - Benjamin Orris
- Department of Pharmacology and Molecular Sciences, The Johns Hopkins University School of Medicine, 725 North Wolfe Street, Baltimore, MD 21205, United States
| | - Ananya Majumdar
- Biomolecular NMR Center, Johns Hopkins University, Baltimore, MD 21218, United States
| | - Shridhar Bhat
- Department of Pharmacology and Molecular Sciences, The Johns Hopkins University School of Medicine, 725 North Wolfe Street, Baltimore, MD 21205, United States
| | - James T Stivers
- Department of Pharmacology and Molecular Sciences, The Johns Hopkins University School of Medicine, 725 North Wolfe Street, Baltimore, MD 21205, United States.
| |
Collapse
|
72
|
Chen Z, Liu X, Li F, Li C, Marquez-Lago T, Leier A, Akutsu T, Webb GI, Xu D, Smith AI, Li L, Chou KC, Song J. Large-scale comparative assessment of computational predictors for lysine post-translational modification sites. Brief Bioinform 2019; 20:2267-2290. [PMID: 30285084 PMCID: PMC6954452 DOI: 10.1093/bib/bby089] [Citation(s) in RCA: 78] [Impact Index Per Article: 15.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2018] [Revised: 08/17/2018] [Accepted: 08/18/2018] [Indexed: 12/22/2022] Open
Abstract
Lysine post-translational modifications (PTMs) play a crucial role in regulating diverse functions and biological processes of proteins. However, because of the large volumes of sequencing data generated from genome-sequencing projects, systematic identification of different types of lysine PTM substrates and PTM sites in the entire proteome remains a major challenge. In recent years, a number of computational methods for lysine PTM identification have been developed. These methods show high diversity in their core algorithms, features extracted and feature selection techniques and evaluation strategies. There is therefore an urgent need to revisit these methods and summarize their methodologies, to improve and further develop computational techniques to identify and characterize lysine PTMs from the large amounts of sequence data. With this goal in mind, we first provide a comprehensive survey on a large collection of 49 state-of-the-art approaches for lysine PTM prediction. We cover a variety of important aspects that are crucial for the development of successful predictors, including operating algorithms, sequence and structural features, feature selection, model performance evaluation and software utility. We further provide our thoughts on potential strategies to improve the model performance. Second, in order to examine the feasibility of using deep learning for lysine PTM prediction, we propose a novel computational framework, termed MUscADEL (Multiple Scalable Accurate Deep Learner for lysine PTMs), using deep, bidirectional, long short-term memory recurrent neural networks for accurate and systematic mapping of eight major types of lysine PTMs in the human and mouse proteomes. Extensive benchmarking tests show that MUscADEL outperforms current methods for lysine PTM characterization, demonstrating the potential and power of deep learning techniques in protein PTM prediction. The web server of MUscADEL, together with all the data sets assembled in this study, is freely available at http://muscadel.erc.monash.edu/. We anticipate this comprehensive review and the application of deep learning will provide practical guide and useful insights into PTM prediction and inspire future bioinformatics studies in the related fields.
Collapse
Affiliation(s)
- Zhen Chen
- School of Basic Medical Science, Qingdao University, Dengzhou Road, Qingdao, Shandong, China
| | - Xuhan Liu
- Medicinal Chemistry, Leiden Academic Centre for Drug Research,Einsteinweg, Leiden, The Netherlands
| | - Fuyi Li
- Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Faculty of Medicine, Monash University, Melbourne, VIC, Australia
- ARC Centre of Excellence in Advanced Molecular Imaging, Monash University, Melbourne, VIC, Australia
| | - Chen Li
- Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Faculty of Medicine, Monash University, Melbourne, VIC, Australia
- Institute of Molecular Systems Biology, ETH Zürich,Auguste-Piccard-Hof, Zürich, Switzerland
| | - Tatiana Marquez-Lago
- Department of Genetics, School of Medicine, University of Alabama at Birmingham, AL, USA
- Department of Cell, Developmental and Integrative Biology, School of Medicine, University of Alabama at Birmingham, AL, USA
| | - André Leier
- Department of Genetics, School of Medicine, University of Alabama at Birmingham, AL, USA
- Department of Cell, Developmental and Integrative Biology, School of Medicine, University of Alabama at Birmingham, AL, USA
| | - Tatsuya Akutsu
- Bioinformatics Center, Institute for Chemical Research,Kyoto University, Uji, Kyoto, Japan
| | - Geoffrey I Webb
- Monash Centre for Data Science, Faculty of Information Technology, Monash University, Melbourne, VIC, Australia
| | - Dakang Xu
- Faculty of Medical Laboratory Science, Ruijin Hospital, School of Medicine, Shanghai Jiao Tong University, Shanghai, China
- Department of Molecular and Translational Science, Faculty of Medicine, Hudson Institute of Medical Research, Monash University, Melbourne, VIC, Australia
| | - Alexander Ian Smith
- Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Faculty of Medicine, Monash University, Melbourne, VIC, Australia
- ARC Centre of Excellence in Advanced Molecular Imaging, Monash University, Melbourne, VIC, Australia
| | - Lei Li
- School of Basic Medical Science, Qingdao University, Dengzhou Road, Qingdao, Shandong, China
| | - Kuo-Chen Chou
- Gordon Life Science Institute, Boston, MA, USA
- Center for Informational Biology, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, China
| | - Jiangning Song
- Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Faculty of Medicine, Monash University, Melbourne, VIC, Australia
- ARC Centre of Excellence in Advanced Molecular Imaging, Monash University, Melbourne, VIC, Australia
- Monash Centre for Data Science, Faculty of Information Technology, Monash University, Melbourne, VIC, Australia
| |
Collapse
|
73
|
Kandathil SM, Greener JG, Jones DT. Recent developments in deep learning applied to protein structure prediction. Proteins 2019; 87:1179-1189. [PMID: 31589782 PMCID: PMC6899861 DOI: 10.1002/prot.25824] [Citation(s) in RCA: 46] [Impact Index Per Article: 9.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2019] [Revised: 09/26/2019] [Accepted: 09/27/2019] [Indexed: 12/29/2022]
Abstract
Although many structural bioinformatics tools have been using neural network models for a long time, deep neural network (DNN) models have attracted considerable interest in recent years. Methods employing DNNs have had a significant impact in recent CASP experiments, notably in CASP12 and especially CASP13. In this article, we offer a brief introduction to some of the key principles and properties of DNN models and discuss why they are naturally suited to certain problems in structural bioinformatics. We also briefly discuss methodological improvements that have enabled these successes. Using the contact prediction task as an example, we also speculate why DNN models are able to produce reasonably accurate predictions even in the absence of many homologues for a given target sequence, a result that can at first glance appear surprising given the lack of input information. We end on some thoughts about how and why these types of models can be so effective, as well as a discussion on potential pitfalls.
Collapse
Affiliation(s)
- Shaun M Kandathil
- Department of Computer Science, University College London, London, UK.,Biomedical Data Science Laboratory, The Francis Crick Institute, London, UK
| | - Joe G Greener
- Department of Computer Science, University College London, London, UK.,Biomedical Data Science Laboratory, The Francis Crick Institute, London, UK
| | - David T Jones
- Department of Computer Science, University College London, London, UK.,Biomedical Data Science Laboratory, The Francis Crick Institute, London, UK
| |
Collapse
|
74
|
Liu JH, Yang JY, Hsu DW, Lai YH, Li YP, Tsai YR, Hou MH. Crystal Structure-Based Exploration of Arginine-Containing Peptide Binding in the ADP-Ribosyltransferase Domain of the Type III Effector XopAI Protein. Int J Mol Sci 2019; 20:ijms20205085. [PMID: 31615004 PMCID: PMC6829252 DOI: 10.3390/ijms20205085] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2019] [Revised: 10/11/2019] [Accepted: 10/12/2019] [Indexed: 02/07/2023] Open
Abstract
Plant pathogens secrete proteins called effectors into the cells of their host to modulate the host immune response against colonization. Effectors can either modify or arrest host target proteins to sabotage the signaling pathway, and therefore are considered potential drug targets for crop disease control. In earlier research, the Xanthomonas type III effector XopAI was predicted to be a member of the arginine-specific mono-ADP-ribosyltransferase family. However, the crystal structure of XopAI revealed an altered active site that is unsuitable to bind the cofactor NAD+, but with the capability to capture an arginine-containing peptide from XopAI itself. The arginine peptide consists of residues 60 through 69 of XopAI, and residue 62 (R62) is key to determining the protein–peptide interaction. The crystal structure and the molecular dynamics simulation results indicate that specific arginine recognition is mediated by hydrogen bonds provided by the backbone oxygen atoms from residues W154, T155, and T156, and a salt bridge provided by the E265 sidechain. In addition, a protruding loop of XopAI adopts dynamic conformations in response to arginine peptide binding and is probably involved in target protein recognition. These data suggest that XopAI binds to its target protein by the peptide-binding ability, and therefore, it promotes disease progression. Our findings reveal an unexpected and intriguing function of XopAI and pave the way for further investigation on the role of XopAI in pathogen invasion.
Collapse
Affiliation(s)
- Jyung-Hurng Liu
- Institute of Genomics and Bioinformatics, National Chung Hsing University (NCHU), Taichung 40227, Taiwan.
- Department of Life Science, NCHU, Taichung 40227, Taiwan.
- Graduate Institute of Biotechnology, NCHU, Taichung 40227, Taiwan.
- PhD Program in Medical Biotechnology, NCHU, Taichung 40227, Taiwan.
| | - Jun-Yi Yang
- Graduate Institute of Biotechnology, NCHU, Taichung 40227, Taiwan.
- Graduate Institute of Biochemistry, NCHU, Taichung 40227, Taiwan.
| | - Duen-Wei Hsu
- Department of Biotechnology, National Kaohsiung Normal University, Kaohsiung 80201, Taiwan.
| | - Yi-Hua Lai
- Department of Life Science, NCHU, Taichung 40227, Taiwan.
| | - Yun-Pei Li
- Institute of Genomics and Bioinformatics, National Chung Hsing University (NCHU), Taichung 40227, Taiwan.
| | - Yi-Rung Tsai
- Institute of Genomics and Bioinformatics, National Chung Hsing University (NCHU), Taichung 40227, Taiwan.
| | - Ming-Hon Hou
- Institute of Genomics and Bioinformatics, National Chung Hsing University (NCHU), Taichung 40227, Taiwan.
- Department of Life Science, NCHU, Taichung 40227, Taiwan.
- Graduate Institute of Biotechnology, NCHU, Taichung 40227, Taiwan.
- PhD Program in Medical Biotechnology, NCHU, Taichung 40227, Taiwan.
| |
Collapse
|
75
|
Tobias-Santos V, Guerra-Almeida D, Mury F, Ribeiro L, Berni M, Araujo H, Logullo C, Feitosa NM, de Souza-Menezes J, Pessoa Costa E, Nunes-da-Fonseca R. Multiple Roles of the Polycistronic Gene Tarsal-less/Mille-Pattes/Polished-Rice During Embryogenesis of the Kissing Bug Rhodnius prolixus. Front Ecol Evol 2019. [DOI: 10.3389/fevo.2019.00379] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
|
76
|
A Bi-LSTM Based Ensemble Algorithm for Prediction of Protein Secondary Structure. APPLIED SCIENCES-BASEL 2019. [DOI: 10.3390/app9173538] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
The prediction of protein secondary structure continues to be an active area of research in bioinformatics. In this paper, a Bi-LSTM based ensemble model is developed for the prediction of protein secondary structure. The ensemble model with dual loss function consists of five sub-models, which are finally joined by a Bi-LSTM layer. In contrast to existing ensemble methods, which generally train each sub-model and then join them as a whole, this ensemble model and sub-models can be trained simultaneously and the performance of each model can be observed and compared during the training process. Three independent test sets (e.g., data1199, 513 protein Cuff & Barton set (CB513) and 203 proteins from Critical Appraisals Skills Programme (CASP203)) are employed to test the method. On average, the ensemble model achieved 84.3% in Q 3 accuracy and 81.9% in segment overlap measure ( SOV ) score by using 10-fold cross validation. There is an improvement of up to 1% over some state-of-the-art prediction methods of protein secondary structure.
Collapse
|
77
|
Hadley B, Litfin T, Day CJ, Haselhorst T, Zhou Y, Tiralongo J. Nucleotide Sugar Transporter SLC35 Family Structure and Function. Comput Struct Biotechnol J 2019; 17:1123-1134. [PMID: 31462968 PMCID: PMC6709370 DOI: 10.1016/j.csbj.2019.08.002] [Citation(s) in RCA: 46] [Impact Index Per Article: 9.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2019] [Revised: 08/05/2019] [Accepted: 08/05/2019] [Indexed: 12/22/2022] Open
Abstract
The covalent attachment of sugars to growing glycan chains is heavily reliant on a specific family of solute transporters (SLC35), the nucleotide sugar transporters (NSTs) that connect the synthesis of activated sugars in the nucleus or cytosol, to glycosyltransferases that reside in the lumen of the endoplasmic reticulum (ER) and/or Golgi apparatus. This review provides a timely update on recent progress in the NST field, specifically we explore several NSTs of the SLC35 family whose substrate specificity and function have been poorly understood, but where recent significant progress has been made. This includes SLC35 A4, A5 and D3, as well as progress made towards understanding the association of SLC35A2 with SLC35A3 and how this relates to their potential regulation, and how the disruption to the dilysine motif in SLC35B4 causes mislocalisation, calling into question multisubstrate NSTs and their subcellular localisation and function. We also report on the recently described first crystal structure of an NST, the SLC35D2 homolog Vrg-4 from yeast. Using this crystal structure, we have generated a new model of SLC35A1, (CMP-sialic acid transporter, CST), with structural and mechanistic predictions based on all known CST-related data, and includes an overview of reported mutations that alter transport and/or substrate recognition (both de novo and site-directed). We also present a model of the CST-del177 isoform that potentially explains why the human CST isoform remains active while the hamster CST isoform is inactive, and we provide a possible alternate access mechanism that accounts for the CST being functional as either a monomer or a homodimer. Finally we provide an update on two NST crystal structures that were published subsequent to the submission and during review of this report.
Collapse
Affiliation(s)
- Barbara Hadley
- Institute for Glycomics, Griffith University, Gold Coast Campus, Queensland 4222, Australia
| | - Thomas Litfin
- School of Information and Communication Technology, Griffith University, Gold Coast Campus, Queensland 4212, Australia
| | - Chris J. Day
- Institute for Glycomics, Griffith University, Gold Coast Campus, Queensland 4222, Australia
| | - Thomas Haselhorst
- Institute for Glycomics, Griffith University, Gold Coast Campus, Queensland 4222, Australia
| | - Yaoqi Zhou
- Institute for Glycomics, Griffith University, Gold Coast Campus, Queensland 4222, Australia
- School of Information and Communication Technology, Griffith University, Gold Coast Campus, Queensland 4212, Australia
| | - Joe Tiralongo
- Institute for Glycomics, Griffith University, Gold Coast Campus, Queensland 4222, Australia
| |
Collapse
|
78
|
Identification of Intrinsically Disordered Proteins and Regions by Length-Dependent Predictors Based on Conditional Random Fields. MOLECULAR THERAPY-NUCLEIC ACIDS 2019; 17:396-404. [PMID: 31307006 PMCID: PMC6626971 DOI: 10.1016/j.omtn.2019.06.004] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/25/2019] [Revised: 06/06/2019] [Accepted: 06/07/2019] [Indexed: 01/24/2023]
Abstract
Accurate identification of intrinsically disordered proteins/regions (IDPs/IDRs) is critical for predicting protein structure and function. Previous studies have shown that IDRs of different lengths have different characteristics, and several classification-based predictors have been proposed for predicting different types of IDRs. Compared with these classification-based predictors, the previously proposed predictor IDP-CRF exhibits state-of-the-art performance for predicting IDPs/IDRs, which is a sequence labeling model based on conditional random fields (CRFs). Motivated by these methods, we propose a predictor called IDP-FSP, which is an ensemble of three CRF-based predictors called IDP-FSP-L, IDP-FSP-S, and IDP-FSP-G. These three predictors are specially designed to predict long, short, and generic disordered regions, respectively, and they are constructed based on different features. To the best of our knowledge, IDP-FSP is the first predictor that combines a sequence labeling algorithm with IDRs of different lengths. Experimental results using two independent test datasets show that IDP-FSP achieves better or at least comparable predictive performance with 26 existing state-of-the-art methods in this field, proving the effectiveness of IDP-FSP.
Collapse
|
79
|
Bai F, Hong D, Lu Y, Liu H, Xu C, Yao X. Prediction of the Antioxidant Response Elements' Response of Compound by Deep Learning. Front Chem 2019; 7:385. [PMID: 31214568 PMCID: PMC6554289 DOI: 10.3389/fchem.2019.00385] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2018] [Accepted: 05/14/2019] [Indexed: 11/13/2022] Open
Abstract
The antioxidant response elements (AREs) play a significant role in occurrence of oxidative stress and may cause multitudinous toxicity effects in the pathogenesis of a variety of diseases. Determining if one compound can activate AREs is crucial for the assessment of potential risk of compound. Here, a series of predictive models by applying multiple deep learning algorithms including deep neural networks (DNN), convolution neural networks (CNN), recurrent neural networks (RNN), and highway networks (HN) were constructed and validated based on Tox21 challenge dataset and applied to predict whether the compounds are the activators or inactivators of AREs. The built models were evaluated by various of statistical parameters, such as sensitivity, specificity, accuracy, Matthews correlation coefficient (MCC) and receiver operating characteristic (ROC) curve. The DNN prediction model based on fingerprint features has best prediction ability, with accuracy of 0.992, 0.914, and 0.917 for the training set, test set, and validation set, respectively. Consequently, these robust models can be adopted to predict the ARE response of molecules fast and accurately, which is of great significance for the evaluation of safety of compounds in the process of drug discovery and development.
Collapse
Affiliation(s)
- Fang Bai
- School of Pharmacy, Lanzhou University, Lanzhou, China
| | - Ding Hong
- School of Information Science and Engineering, Lanzhou University, Lanzhou, China
| | - Yingying Lu
- State Key Laboratory of Applied Organic Chemistry, Department of Chemistry, Lanzhou University, Lanzhou, China
| | - Huanxiang Liu
- School of Pharmacy, Lanzhou University, Lanzhou, China
| | - Cunlu Xu
- School of Information Science and Engineering, Lanzhou University, Lanzhou, China
| | - Xiaojun Yao
- State Key Laboratory of Applied Organic Chemistry, Department of Chemistry, Lanzhou University, Lanzhou, China
| |
Collapse
|
80
|
Lee Y, Pei J, Baumhardt JM, Chook YM, Grishin NV. Structural prerequisites for CRM1-dependent nuclear export signaling peptides: accessibility, adapting conformation, and the stability at the binding site. Sci Rep 2019; 9:6627. [PMID: 31036839 PMCID: PMC6488578 DOI: 10.1038/s41598-019-43004-0] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2019] [Accepted: 04/11/2019] [Indexed: 01/08/2023] Open
Abstract
Nuclear export signal (NES) motifs function as essential regulators of the subcellular location of proteins by interacting with the major nuclear exporter protein, CRM1. Prediction of NES is of great interest in many aspects of research including cancer, but currently available methods, which are mostly based on the sequence-based approaches, have been suffered from high false positive rates since the NES consensus patterns are quite commonly observed in protein sequences. Therefore, finding a feature that can distinguish real NES motifs from false positives is desired to improve the prediction power, but it is quite challenging when only using the sequence. Here, we provide a comprehensive table for the validated cargo proteins, containing the location of the NES consensus patterns with the disordered propensity plots, known protein domain information, and the predicted secondary structures. It could be useful for determining the most plausible NES region in the context of the whole protein sequence and suggests possibilities for some non-binders of the annotated regions. In addition, using the currently available crystal structures of CRM1 bound to various classes of NES peptides, we adopted, for the first time, the structure-based prediction of the NES motifs bound to the CRM1's binding groove. Combining sequence-based and structure-based predictions, we suggest a novel and more straight-forward approach to identify CRM1-binding NES sequences by analysis of their structural prerequisites and energetic evaluation of the stability at the CRM1's binding site.
Collapse
Affiliation(s)
- Yoonji Lee
- Department of Biophysics, University of Texas Southwestern Medical Center, Dallas, TX, 75390, USA
| | - Jimin Pei
- Howard Hughes Medical Institute, University of Texas Southwestern Medical Center, Dallas, TX, 75390, USA
| | - Jordan M Baumhardt
- Department of Pharmacology, University of Texas Southwestern Medical Center, Dallas, TX, 75390, USA
| | - Yuh Min Chook
- Department of Pharmacology, University of Texas Southwestern Medical Center, Dallas, TX, 75390, USA
| | - Nick V Grishin
- Department of Biophysics, University of Texas Southwestern Medical Center, Dallas, TX, 75390, USA.
- Howard Hughes Medical Institute, University of Texas Southwestern Medical Center, Dallas, TX, 75390, USA.
| |
Collapse
|
81
|
Katuwawala A, Peng Z, Yang J, Kurgan L. Computational Prediction of MoRFs, Short Disorder-to-order Transitioning Protein Binding Regions. Comput Struct Biotechnol J 2019; 17:454-462. [PMID: 31007871 PMCID: PMC6453775 DOI: 10.1016/j.csbj.2019.03.013] [Citation(s) in RCA: 47] [Impact Index Per Article: 9.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2019] [Revised: 03/22/2019] [Accepted: 03/23/2019] [Indexed: 12/28/2022] Open
Abstract
Molecular recognition features (MoRFs) are short protein-binding regions that undergo disorder-to-order transitions (induced folding) upon binding protein partners. These regions are abundant in nature and can be predicted from protein sequences based on their distinctive sequence signatures. This first-of-its-kind survey covers 14 MoRF predictors and six related methods for the prediction of short protein-binding linear motifs, disordered protein-binding regions and semi-disordered regions. We show that the development of MoRF predictors has accelerated in the recent years. These predictors depend on machine learning-derived models that were generated using training datasets where MoRFs are annotated using putative disorder. Our analysis reveals that they generate accurate predictions. We identified eight methods that offer area under the ROC curve (AUC) ≥ 0.7 on experimentally-validated test datasets. We show that modern MoRF predictors accurately find experimentally annotated MoRFs even though they were trained using the putative disorder annotations. They are relatively highly-cited, particularly the methods available as webservers that on average secure three times more citations than methods without this option. MoRF predictions contribute to the experimental discovery of protein-protein interactions, annotation of protein functions and computational analysis of a variety of proteomes, protein families, and pathways. We outline future development and application directions for these tools, stressing the importance to develop novel tools that would target interactions of disordered regions with other types of partners.
Collapse
Affiliation(s)
- Akila Katuwawala
- Department of Computer Science, Virginia Commonwealth University, USA
| | - Zhenling Peng
- Center for Applied Mathematics, Tianjin University, Tianjin, China
| | - Jianyi Yang
- School of Mathematical Sciences, Nankai University, Tianjin, China
| | - Lukasz Kurgan
- Department of Computer Science, Virginia Commonwealth University, USA
| |
Collapse
|
82
|
Nielsen JT, Mulder FAA. Quality and bias of protein disorder predictors. Sci Rep 2019; 9:5137. [PMID: 30914747 PMCID: PMC6435736 DOI: 10.1038/s41598-019-41644-w] [Citation(s) in RCA: 64] [Impact Index Per Article: 12.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2018] [Accepted: 03/13/2019] [Indexed: 02/03/2023] Open
Abstract
Disorder in proteins is vital for biological function, yet it is challenging to characterize. Therefore, methods for predicting protein disorder from sequence are fundamental. Currently, predictors are trained and evaluated using data from X-ray structures or from various biochemical or spectroscopic data. However, the prediction accuracy of disordered predictors is not calibrated, nor is it established whether predictors are intrinsically biased towards one of the extremes of the order-disorder axis. We therefore generated and validated a comprehensive experimental benchmarking set of site-specific and continuous disorder, using deposited NMR chemical shift data. This novel experimental data collection is fully appropriate and represents the full spectrum of disorder. We subsequently analyzed the performance of 26 widely-used disorder prediction methods and found that these vary noticeably. At the same time, a distinct bias for over-predicting order was identified for some algorithms. Our analysis has important implications for the validity and the interpretation of protein disorder, as utilized, for example, in assessing the content of disorder in proteomes.
Collapse
Affiliation(s)
- Jakob T Nielsen
- Interdisciplinary Nanoscience Center (iNANO), Aarhus University, Gustav Wieds Vej 14, 8000, Aarhus C, Denmark.
- Department of Chemistry, Aarhus University, Langelandsgade 140, 8000, Aarhus C, Denmark.
| | - Frans A A Mulder
- Interdisciplinary Nanoscience Center (iNANO), Aarhus University, Gustav Wieds Vej 14, 8000, Aarhus C, Denmark.
- Department of Chemistry, Aarhus University, Langelandsgade 140, 8000, Aarhus C, Denmark.
| |
Collapse
|
83
|
Dean S, Moreira-Leite F, Gull K. Basalin is an evolutionarily unconstrained protein revealed via a conserved role in flagellum basal plate function. eLife 2019; 8:42282. [PMID: 30810527 PMCID: PMC6392502 DOI: 10.7554/elife.42282] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2018] [Accepted: 02/11/2019] [Indexed: 01/15/2023] Open
Abstract
Most motile flagella have an axoneme that contains nine outer microtubule doublets and a central pair (CP) of microtubules. The CP coordinates the flagellar beat and defects in CP projections are associated with motility defects and human disease. The CP nucleate near a ‘basal plate’ at the distal end of the transition zone (TZ). Here, we show that the trypanosome TZ protein ‘basalin’ is essential for building the basal plate, and its loss is associated with CP nucleation defects, inefficient recruitment of CP assembly factors to the TZ, and flagellum paralysis. Guided by synteny, we identified a highly divergent basalin ortholog in the related Leishmania species. Basalins are predicted to be highly unstructured, suggesting they may act as ‘hubs’ facilitating many protein-protein interactions. This raises the general concept that proteins involved in cytoskeletal functions and appearing organism-specific, may have highly divergent and cryptic orthologs in other species.
Collapse
Affiliation(s)
- Samuel Dean
- Sir William Dunn School of Pathology, University of Oxford, Oxford, United Kingdom
| | - Flavia Moreira-Leite
- Sir William Dunn School of Pathology, University of Oxford, Oxford, United Kingdom
| | - Keith Gull
- Sir William Dunn School of Pathology, University of Oxford, Oxford, United Kingdom
| |
Collapse
|
84
|
Features of a novel protein, rusticalin, from the ascidian Styela rustica reveal ancestral horizontal gene transfer event. Mob DNA 2019; 10:4. [PMID: 30675192 PMCID: PMC6339383 DOI: 10.1186/s13100-019-0146-7] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2018] [Accepted: 01/02/2019] [Indexed: 12/18/2022] Open
Abstract
Background The transfer of genetic material from non-parent organisms is called horizontal gene transfer (HGT). One of the most conclusive cases of HGT in metazoans was previously described for the cellulose synthase gene in ascidians. Results In this study we identified a new protein, rusticalin, from the ascidian Styela rustica and presented evidence for its likely origin by HGT. Discernible homologues of rusticalin were found in placozoans, coral, and basal Chordates. Rusticalin was predicted to consist of two distinct regions, an N-terminal domain and a C-terminal domain. The N-terminal domain comprises two cysteine-rich repeats and shows remote similarity to the tick carboxypeptidase inhibitor. The C-terminal domain shares significant sequence similarity with bacterial MD peptidases and bacteriophage A500 L-alanyl-D-glutamate peptidase. A possible transfer of the C-terminal domain by bacteriophage was confirmed by an analysis of noncoding sequences of C. intestinalis rusticalin-like gene, which was found to contain a sequence similar to the bacteriophage A500 recombination site. Moreover, a sequence similar to the bacteriophage recombination site was found to be adjacent to the cellulose synthase catalytic subunit gene in the genome of Streptomices sp., the donor of ascidian cellulose synthase. Conclusions The C-terminal domain of rusticalin and rusticalin-like proteins is likely to be horizontally transferred by the bacteriophage A500. A common mechanism involving bacteriophage mediated gene transfer can be proposed for at least two HGT events in ascidians.
Collapse
|
85
|
Chen Z, He N, Huang Y, Qin WT, Liu X, Li L. Integration of A Deep Learning Classifier with A Random Forest Approach for Predicting Malonylation Sites. GENOMICS PROTEOMICS & BIOINFORMATICS 2019; 16:451-459. [PMID: 30639696 PMCID: PMC6411950 DOI: 10.1016/j.gpb.2018.08.004] [Citation(s) in RCA: 40] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/26/2018] [Revised: 06/20/2018] [Accepted: 08/08/2018] [Indexed: 12/27/2022]
Abstract
As a newly-identified protein post-translational modification, malonylation is involved in a variety of biological functions. Recognizing malonylation sites in substrates represents an initial but crucial step in elucidating the molecular mechanisms underlying protein malonylation. In this study, we constructed a deep learning (DL) network classifier based on long short-term memory (LSTM) with word embedding (LSTMWE) for the prediction of mammalian malonylation sites. LSTMWE performs better than traditional classifiers developed with common pre-defined feature encodings or a DL classifier based on LSTM with a one-hot vector. The performance of LSTMWE is sensitive to the size of the training set, but this limitation can be overcome by integration with a traditional machine learning (ML) classifier. Accordingly, an integrated approach called LEMP was developed, which includes LSTMWE and the random forest classifier with a novel encoding of enhanced amino acid content. LEMP performs not only better than the individual classifiers but also superior to the currently-available malonylation predictors. Additionally, it demonstrates a promising performance with a low false positive rate, which is highly useful in the prediction application. Overall, LEMP is a useful tool for easily identifying malonylation sites with high confidence. LEMP is available at http://www.bioinfogo.org/lemp.
Collapse
Affiliation(s)
- Zhen Chen
- School of Basic Medicine, Qingdao University, Qingdao 266021, China
| | - Ningning He
- School of Basic Medicine, Qingdao University, Qingdao 266021, China
| | - Yu Huang
- School of Data Science and Software Engineering, Qingdao University, Qingdao 266021, China
| | - Wen Tao Qin
- Department of Biochemistry, Schulich School of Medicine and Dentistry, University of Western Ontario, London, Ontario N6A 5C1, Canada
| | - Xuhan Liu
- Department of Information Technology, Beijing Oriental Yamei Gene Technology Institute Co. Ltd., Beijing 100078, China.
| | - Lei Li
- School of Basic Medicine, Qingdao University, Qingdao 266021, China; School of Data Science and Software Engineering, Qingdao University, Qingdao 266021, China; Qingdao Cancer Institute, Qingdao University, Qingdao 266021, China.
| |
Collapse
|
86
|
Jung Y, El-Manzalawy Y, Dobbs D, Honavar VG. Partner-specific prediction of RNA-binding residues in proteins: A critical assessment. Proteins 2018; 87:198-211. [PMID: 30536635 PMCID: PMC6389706 DOI: 10.1002/prot.25639] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2018] [Revised: 10/10/2018] [Accepted: 11/29/2018] [Indexed: 01/06/2023]
Abstract
RNA-protein interactions play essential roles in regulating gene expression. While some RNA-protein interactions are "specific", that is, the RNA-binding proteins preferentially bind to particular RNA sequence or structural motifs, others are "non-RNA specific." Deciphering the protein-RNA recognition code is essential for comprehending the functional implications of these interactions and for developing new therapies for many diseases. Because of the high cost of experimental determination of protein-RNA interfaces, there is a need for computational methods to identify RNA-binding residues in proteins. While most of the existing computational methods for predicting RNA-binding residues in RNA-binding proteins are oblivious to the characteristics of the partner RNA, there is growing interest in methods for partner-specific prediction of RNA binding sites in proteins. In this work, we assess the performance of two recently published partner-specific protein-RNA interface prediction tools, PS-PRIP, and PRIdictor, along with our own new tools. Specifically, we introduce a novel metric, RNA-specificity metric (RSM), for quantifying the RNA-specificity of the RNA binding residues predicted by such tools. Our results show that the RNA-binding residues predicted by previously published methods are oblivious to the characteristics of the putative RNA binding partner. Moreover, when evaluated using partner-agnostic metrics, RNA partner-specific methods are outperformed by the state-of-the-art partner-agnostic methods. We conjecture that either (a) the protein-RNA complexes in PDB are not representative of the protein-RNA interactions in nature, or (b) the current methods for partner-specific prediction of RNA-binding residues in proteins fail to account for the differences in RNA partner-specific versus partner-agnostic protein-RNA interactions, or both.
Collapse
Affiliation(s)
- Yong Jung
- Bioinformatics and Genomics Graduate Program, Pennsylvania State University, University Park, Pennsylvania.,Artificial Intelligence Research Laboratory, Pennsylvania State University, University Park, Pennsylvania.,The Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, Pennsylvania
| | - Yasser El-Manzalawy
- Artificial Intelligence Research Laboratory, Pennsylvania State University, University Park, Pennsylvania.,Clinical and Translational Sciences Institute, Pennsylvania State University, University Park, Pennsylvania.,College of Information Sciences and Technology, Pennsylvania State University, Pennsylvania
| | - Drena Dobbs
- Bioinformatics and Computational Biology Program, Iowa State University, Ames, Iowa.,Department of Genetics, Development, and Cell Biology, Iowa State University, Ames, Iowa
| | - Vasant G Honavar
- Bioinformatics and Genomics Graduate Program, Pennsylvania State University, University Park, Pennsylvania.,Artificial Intelligence Research Laboratory, Pennsylvania State University, University Park, Pennsylvania.,Institute for Cyberscience, Pennsylvania State University, University Park, Pennsylvania.,Clinical and Translational Sciences Institute, Pennsylvania State University, University Park, Pennsylvania.,The Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, Pennsylvania.,College of Information Sciences and Technology, Pennsylvania State University, Pennsylvania
| |
Collapse
|
87
|
Mediated nuclear import and export of TAZ and the underlying molecular requirements. Nat Commun 2018; 9:4966. [PMID: 30470756 PMCID: PMC6251892 DOI: 10.1038/s41467-018-07450-0] [Citation(s) in RCA: 72] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2018] [Accepted: 10/26/2018] [Indexed: 12/14/2022] Open
Abstract
Nucleocytoplasmic distribution of Yap/TAZ is regulated by the Hippo pathway and the cytoskeleton. While interactions with cytosolic and nuclear “retention factors” (14–3–3 and TEAD) are known to control their localization, fundamental aspects of Yap/TAZ shuttling remain undefined. It is unclear if translocation occurs only by passive diffusion or via mediated transport, and neither the potential nuclear localization and efflux signals (NLS, NES) nor their putative regulation have been identified. Here we show that TAZ cycling is a mediated process and identify the underlying NLS and NES. The C-terminal NLS, representing a new class of import motifs, is necessary and sufficient for efficient nuclear uptake via a RAN-independent mechanism. RhoA activity directly stimulates this import. The NES lies within the TEAD-binding domain and can be masked by TEAD, thereby preventing efflux. Thus, we describe a RhoA-regulated NLS, a TEAD-regulated NES and propose an improved model of nucleocytoplasmic TAZ shuttling beyond "retention". The transcriptional co-factors Yap and TAZ are regulated by Hippo signalling and mechanical forces via their nucleocytoplasmic shuttling. Here the authors identify a RhoA-regulated C-terminal nuclear localization signal and a TEAD-regulated N-terminal nuclear export signal of TAZ in an epithelial cell line.
Collapse
|
88
|
Abstract
In order to solve the problem that, in complex and wide traffic scenes, the accuracy and speed of multi-object detection can hardly be balanced by the existing object detection algorithms that are based on deep learning and big data, we improve the object detection framework SSD (Single Shot Multi-box Detector) and propose a new detection framework AP-SSD (Adaptive Perceive). We design a feature extraction convolution kernel library composed of multi-shape Gabor and color Gabor and then we train and screen the optimal feature extraction convolution kernel to replace the low-level convolution kernel of the original network to improve the detection accuracy. After that, we combine the single image detection framework with convolution long-term and short-term memory networks and by using the Bottle Neck-LSTM memory layer to refine and propagate the feature mapping between frames, we realize the temporal association of network frame-level information, reduce the calculation cost, succeed in tracking and identifying the targets affected by strong interference in video and reduce the missed alarm rate and false alarm rate by adding an adaptive threshold strategy. Moreover, we design a dynamic region amplification network framework to improve the detection and recognition accuracy of low-resolution small objects. Therefore, experiments on the improved AP-SSD show that this new algorithm can achieve better detection results when small objects, multiple objects, cluttered background and large-area occlusion are involved, thus ensuring this algorithm a good engineering application prospect.
Collapse
|
89
|
Hanson J, Paliwal K, Zhou Y. Accurate Single-Sequence Prediction of Protein Intrinsic Disorder by an Ensemble of Deep Recurrent and Convolutional Architectures. J Chem Inf Model 2018; 58:2369-2376. [DOI: 10.1021/acs.jcim.8b00636] [Citation(s) in RCA: 50] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/31/2023]
Affiliation(s)
- Jack Hanson
- Signal Processing Laboratory, Griffith University, Brisbane, Queensland 4122, Australia
| | - Kuldip Paliwal
- Signal Processing Laboratory, Griffith University, Brisbane, Queensland 4122, Australia
| | - Yaoqi Zhou
- Institute for Glycomics and School of Information and Communication Technology, Griffith University, Southport, Queensland 4222, Australia
| |
Collapse
|
90
|
Chakraborty C, Clayton C. Stress susceptibility in Trypanosoma brucei lacking the RNA-binding protein ZC3H30. PLoS Negl Trop Dis 2018; 12:e0006835. [PMID: 30273340 PMCID: PMC6181440 DOI: 10.1371/journal.pntd.0006835] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2017] [Revised: 10/11/2018] [Accepted: 09/11/2018] [Indexed: 01/17/2023] Open
Abstract
Trypanosomes rely on post-transcriptional mechanisms and mRNA-binding proteins for control of gene expression. Trypanosoma brucei ZC3H30 is an mRNA-binding protein that is expressed in both the bloodstream form (which grows in mammals) and the procyclic form (which grows in the tsetse fly midgut). Attachment of ZC3H30 to an mRNA causes degradation of that mRNA. Cells lacking ZC3H30 showed no growth defect under normal culture conditions; but they were more susceptible than wild-type cells to heat shock, starvation, and treatment with DTT, arsenite or ethanol. Transcriptomes of procyclic-form trypanosomes lacking ZC3H30 were indistinguishable from those of cells in which ZC3H30 had been re-expressed, but un-stressed bloodstream forms lacking ZC3H30 had about 2-fold more HSP70 mRNA. Results from pull-downs suggested that ZC3H30 mRNA binding may not be very specific. ZC3H30 was found in stress-induced granules and co-purified with another stress granule protein, Tb927.8.3820; but RNAi targeting Tb927.8.3820 did not affect either ZC3H30 granule association or stress resistance. The conservation of the ZC3H30 gene in both monogenetic and digenetic kinetoplastids, combined with the increased stress susceptibility of cells lacking it, suggests that ZC3H30 confers a selective advantage in the wild, where the parasites are subject to temperature fluctuations and immune attack in both the insect and mammalian hosts.
Collapse
Affiliation(s)
| | - Christine Clayton
- Zentrum für Molekular Biologie, Universität Heidelberg, Heidelberg, Germany
- * E-mail:
| |
Collapse
|
91
|
Luo TJ, Zhou CL, Chao F. Exploring spatial-frequency-sequential relationships for motor imagery classification with recurrent neural network. BMC Bioinformatics 2018; 19:344. [PMID: 30268089 PMCID: PMC6162908 DOI: 10.1186/s12859-018-2365-1] [Citation(s) in RCA: 32] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2018] [Accepted: 09/10/2018] [Indexed: 11/27/2022] Open
Abstract
BACKGROUND Conventional methods of motor imagery brain computer interfaces (MI-BCIs) suffer from the limited number of samples and simplified features, so as to produce poor performances with spatial-frequency features and shallow classifiers. METHODS Alternatively, this paper applies a deep recurrent neural network (RNN) with a sliding window cropping strategy (SWCS) to signal classification of MI-BCIs. The spatial-frequency features are first extracted by the filter bank common spatial pattern (FB-CSP) algorithm, and such features are cropped by the SWCS into time slices. By extracting spatial-frequency-sequential relationships, the cropped time slices are then fed into RNN for classification. In order to overcome the memory distractions, the commonly used gated recurrent unit (GRU) and long-short term memory (LSTM) unit are applied to the RNN architecture, and experimental results are used to determine which unit is more suitable for processing EEG signals. RESULTS Experimental results on common BCI benchmark datasets show that the spatial-frequency-sequential relationships outperform all other competing spatial-frequency methods. In particular, the proposed GRU-RNN architecture achieves the lowest misclassification rates on all BCI benchmark datasets. CONCLUSION By introducing spatial-frequency-sequential relationships with cropping time slice samples, the proposed method gives a novel way to construct and model high accuracy and robustness MI-BCIs based on limited trials of EEG signals.
Collapse
Affiliation(s)
- Tian-jian Luo
- Department of Cognitive Science, School of Information Science and Engineering, Xiamen University, 422 Siming South Road, Siming District, Xiamen, 361005 China
| | - Chang-le Zhou
- Department of Cognitive Science, School of Information Science and Engineering, Xiamen University, 422 Siming South Road, Siming District, Xiamen, 361005 China
| | - Fei Chao
- Department of Cognitive Science, School of Information Science and Engineering, Xiamen University, 422 Siming South Road, Siming District, Xiamen, 361005 China
- Department of Computer Science, Institute of Mathematics, Physics and Computer Science, Aberystwyth University, Aberystwyth, Wales, SY23 3DB UK
| |
Collapse
|
92
|
Singh J, Hanson J, Heffernan R, Paliwal K, Yang Y, Zhou Y. Detecting Proline and Non-Proline Cis Isomers in Protein Structures from Sequences Using Deep Residual Ensemble Learning. J Chem Inf Model 2018; 58:2033-2042. [PMID: 30118602 DOI: 10.1021/acs.jcim.8b00442] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
It has been long established that cis conformations of amino acid residues play many biologically important roles despite their rare occurrence in protein structure. Because of this rarity, few methods have been developed for predicting cis isomers from protein sequences, most of which are based on outdated datasets and lack the means for independent testing. In this work, using a database of >10000 high-resolution protein structures, we update the statistics of cis isomers and develop a sequence-based prediction technique using an ensemble of residual convolutional and long short-term memory bidirectional recurrent neural networks that allow learning from the whole protein sequence. We show that ensembling eight neural network models yields maximum Matthews correlation coefficient values of approximately 0.35 for cis-Pro isomers and 0.1 for cis-nonPro residues. The method should be useful for prioritizing functionally important residues in cis isomers for experimental validations and improving the sampling of rare protein conformations for ab initio protein structure prediction.
Collapse
Affiliation(s)
- Jaswinder Singh
- Signal Processing Laboratory , Griffith University , Brisbane , QLD 4122 , Australia
| | - Jack Hanson
- Signal Processing Laboratory , Griffith University , Brisbane , QLD 4122 , Australia
| | - Rhys Heffernan
- Signal Processing Laboratory , Griffith University , Brisbane , QLD 4122 , Australia
| | - Kuldip Paliwal
- Signal Processing Laboratory , Griffith University , Brisbane , QLD 4122 , Australia
| | - Yuedong Yang
- Institute for Glycomics and School of Information and Communication Technology , Griffith University , Southport , QLD 4222 , Australia.,School of Data and Computer Science , Sun Yat-Sen University , Guangzhou , Guangdong 510006 , China
| | - Yaoqi Zhou
- Institute for Glycomics and School of Information and Communication Technology , Griffith University , Southport , QLD 4222 , Australia
| |
Collapse
|
93
|
Liu Y, Wang X, Liu B. IDP⁻CRF: Intrinsically Disordered Protein/Region Identification Based on Conditional Random Fields. Int J Mol Sci 2018; 19:E2483. [PMID: 30135358 PMCID: PMC6164615 DOI: 10.3390/ijms19092483] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2018] [Revised: 08/14/2018] [Accepted: 08/18/2018] [Indexed: 12/16/2022] Open
Abstract
Accurate prediction of intrinsically disordered proteins/regions is one of the most important tasks in bioinformatics, and some computational predictors have been proposed to solve this problem. How to efficiently incorporate the sequence-order effect is critical for constructing an accurate predictor because disordered region distributions show global sequence patterns. In order to capture these sequence patterns, several sequence labelling models have been applied to this field, such as conditional random fields (CRFs). However, these methods suffer from certain disadvantages. In this study, we proposed a new computational predictor called IDP⁻CRF, which is trained on an updated benchmark dataset based on the MobiDB database and the DisProt database, and incorporates more comprehensive sequence-based features, including PSSMs (position-specific scoring matrices), kmer, predicted secondary structures, and relative solvent accessibilities. Experimental results on the benchmark dataset and two independent datasets show that IDP⁻CRF outperforms 25 existing state-of-the-art methods in this field, demonstrating that IDP⁻CRF is a very useful tool for identifying IDPs/IDRs (intrinsically disordered proteins/regions). We anticipate that IDP⁻CRF will facilitate the development of protein sequence analysis.
Collapse
Affiliation(s)
- Yumeng Liu
- School of Computer Science and Technology, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen 518055, Guangdong, China.
| | - Xiaolong Wang
- School of Computer Science and Technology, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen 518055, Guangdong, China.
| | - Bin Liu
- School of Computer Science and Technology, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen 518055, Guangdong, China.
| |
Collapse
|
94
|
Yamada KD, Kinoshita K. De novo profile generation based on sequence context specificity with the long short-term memory network. BMC Bioinformatics 2018; 19:272. [PMID: 30021530 PMCID: PMC6052547 DOI: 10.1186/s12859-018-2284-1] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2018] [Accepted: 07/11/2018] [Indexed: 11/24/2022] Open
Abstract
Background Long short-term memory (LSTM) is one of the most attractive deep learning methods to learn time series or contexts of input data. Increasing studies, including biological sequence analyses in bioinformatics, utilize this architecture. Amino acid sequence profiles are widely used for bioinformatics studies, such as sequence similarity searches, multiple alignments, and evolutionary analyses. Currently, many biological sequences are becoming available, and the rapidly increasing amount of sequence data emphasizes the importance of scalable generators of amino acid sequence profiles. Results We employed the LSTM network and developed a novel profile generator to construct profiles without any assumptions, except for input sequence context. Our method could generate better profiles than existing de novo profile generators, including CSBuild and RPS-BLAST, on the basis of profile-sequence similarity search performance with linear calculation costs against input sequence size. In addition, we analyzed the effects of the memory power of LSTM and found that LSTM had high potential power to detect long-range interactions between amino acids, as in the case of beta-strand formation, which has been a difficult problem in protein bioinformatics using sequence information. Conclusion We demonstrated the importance of sequence context and the feasibility of LSTM on biological sequence analyses. Our results demonstrated the effectiveness of memories in LSTM and showed that our de novo profile generator, SPBuild, achieved higher performance than that of existing methods for profile prediction of beta-strands, where long-range interactions of amino acids are important and are known to be difficult for the existing window-based prediction methods. Our findings will be useful for the development of other prediction methods related to biological sequences by machine learning methods. Electronic supplementary material The online version of this article (10.1186/s12859-018-2284-1) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Kazunori D Yamada
- Graduate School of Information Sciences, Tohoku University, Sendai, Japan.,Artificial Intelligence Research Center, National Institute of Advanced Industrial Science and Technology (AIST), Tokyo, Japan
| | - Kengo Kinoshita
- Graduate School of Information Sciences, Tohoku University, Sendai, Japan. .,Tohoku Medical Megabank Organization, Tohoku University, Sendai, Japan. .,Institute of Development, Aging, and Cancer, Tohoku University, Sendai, Japan.
| |
Collapse
|
95
|
Zhao Z, Peng Z, Yang J. Improving Sequence-Based Prediction of Protein–Peptide Binding Residues by Introducing Intrinsic Disorder and a Consensus Method. J Chem Inf Model 2018; 58:1459-1468. [DOI: 10.1021/acs.jcim.8b00019] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/15/2023]
Affiliation(s)
- Zijuan Zhao
- Center for Applied Mathematics, Tianjin University, Tianjin 300072, China
| | - Zhenling Peng
- Center for Applied Mathematics, Tianjin University, Tianjin 300072, China
| | - Jianyi Yang
- School of Mathematical Sciences, Nankai University, Tianjin 300071, China
| |
Collapse
|
96
|
Fa R, Cozzetto D, Wan C, Jones DT. Predicting human protein function with multi-task deep neural networks. PLoS One 2018; 13:e0198216. [PMID: 29889900 PMCID: PMC5995439 DOI: 10.1371/journal.pone.0198216] [Citation(s) in RCA: 44] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2018] [Accepted: 05/15/2018] [Indexed: 11/19/2022] Open
Abstract
Machine learning methods for protein function prediction are urgently needed, especially now that a substantial fraction of known sequences remains unannotated despite the extensive use of functional assignments based on sequence similarity. One major bottleneck supervised learning faces in protein function prediction is the structured, multi-label nature of the problem, because biological roles are represented by lists of terms from hierarchically organised controlled vocabularies such as the Gene Ontology. In this work, we build on recent developments in the area of deep learning and investigate the usefulness of multi-task deep neural networks (MTDNN), which consist of upstream shared layers upon which are stacked in parallel as many independent modules (additional hidden layers with their own output units) as the number of output GO terms (the tasks). MTDNN learns individual tasks partially using shared representations and partially from task-specific characteristics. When no close homologues with experimentally validated functions can be identified, MTDNN gives more accurate predictions than baseline methods based on annotation frequencies in public databases or homology transfers. More importantly, the results show that MTDNN binary classification accuracy is higher than alternative machine learning-based methods that do not exploit commonalities and differences among prediction tasks. Interestingly, compared with a single-task predictor, the performance improvement is not linearly correlated with the number of tasks in MTDNN, but medium size models provide more improvement in our case. One of advantages of MTDNN is that given a set of features, there is no requirement for MTDNN to have a bootstrap feature selection procedure as what traditional machine learning algorithms do. Overall, the results indicate that the proposed MTDNN algorithm improves the performance of protein function prediction. On the other hand, there is still large room for deep learning techniques to further enhance prediction ability.
Collapse
Affiliation(s)
- Rui Fa
- The Francis Crick Institute, London, United Kingdom
- Computer Science Department, University College London, London, United Kingdom
| | - Domenico Cozzetto
- The Francis Crick Institute, London, United Kingdom
- Computer Science Department, University College London, London, United Kingdom
| | - Cen Wan
- The Francis Crick Institute, London, United Kingdom
- Computer Science Department, University College London, London, United Kingdom
| | - David T. Jones
- The Francis Crick Institute, London, United Kingdom
- Computer Science Department, University College London, London, United Kingdom
- * E-mail:
| |
Collapse
|
97
|
Bidirectional Long Short-Term Memory Network for Vehicle Behavior Recognition. REMOTE SENSING 2018. [DOI: 10.3390/rs10060887] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
|
98
|
Yang Y, Gao J, Wang J, Heffernan R, Hanson J, Paliwal K, Zhou Y. Sixty-five years of the long march in protein secondary structure prediction: the final stretch? Brief Bioinform 2018; 19:482-494. [PMID: 28040746 PMCID: PMC5952956 DOI: 10.1093/bib/bbw129] [Citation(s) in RCA: 84] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2016] [Revised: 11/15/2016] [Indexed: 11/13/2022] Open
Abstract
Protein secondary structure prediction began in 1951 when Pauling and Corey predicted helical and sheet conformations for protein polypeptide backbone even before the first protein structure was determined. Sixty-five years later, powerful new methods breathe new life into this field. The highest three-state accuracy without relying on structure templates is now at 82-84%, a number unthinkable just a few years ago. These improvements came from increasingly larger databases of protein sequences and structures for training, the use of template secondary structure information and more powerful deep learning techniques. As we are approaching to the theoretical limit of three-state prediction (88-90%), alternative to secondary structure prediction (prediction of backbone torsion angles and Cα-atom-based angles and torsion angles) not only has more room for further improvement but also allows direct prediction of three-dimensional fragment structures with constantly improved accuracy. About 20% of all 40-residue fragments in a database of 1199 non-redundant proteins have <6 Å root-mean-squared distance from the native conformations by SPIDER2. More powerful deep learning methods with improved capability of capturing long-range interactions begin to emerge as the next generation of techniques for secondary structure prediction. The time has come to finish off the final stretch of the long march towards protein secondary structure prediction.
Collapse
Affiliation(s)
- Yuedong Yang
- Insitute for Glycomics and School of Information and Communication Technology, Griffith University, Parklands Drive, Southport, QLD, Australia
| | - Jianzhao Gao
- School of Mathematical Sciences and LPMC, Nankai University, Tianjin, China
| | - Jihua Wang
- Shandong Provincial Key Laboratory of Biophysics, Institute of Biophysics, Dezhou University, Dezhou, China
| | - Rhys Heffernan
- Signal Processing Laboratory, Griffith University, Brisbane, Australia
| | - Jack Hanson
- Signal Processing Laboratory, Griffith University, Brisbane, Australia
| | - Kuldip Paliwal
- Signal Processing Laboratory, Griffith University, Brisbane, Australia
| | - Yaoqi Zhou
- Insitute for Glycomics and School of Information and Communication Technology, Griffith University, Parklands Drive, Southport, QLD, Australia
- Shandong Provincial Key Laboratory of Biophysics, Institute of Biophysics, Dezhou University, Dezhou, China
| |
Collapse
|
99
|
Gao J, Yang Y, Zhou Y. Grid-based prediction of torsion angle probabilities of protein backbone and its application to discrimination of protein intrinsic disorder regions and selection of model structures. BMC Bioinformatics 2018; 19:29. [PMID: 29390958 PMCID: PMC5796405 DOI: 10.1186/s12859-018-2031-7] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2017] [Accepted: 01/17/2018] [Indexed: 12/03/2022] Open
Abstract
BACKGROUND Protein structure can be described by backbone torsion angles: rotational angles about the N-Cα bond (φ) and the Cα-C bond (ψ) or the angle between Cαi-1-Cαi-Cαi + 1 (θ) and the rotational angle about the Cαi-Cαi + 1 bond (τ). Thus, their accurate prediction is useful for structure prediction and model refinement. Early methods predicted torsion angles in a few discrete bins whereas most recent methods have focused on prediction of angles in real, continuous values. Real value prediction, however, is unable to provide the information on probabilities of predicted angles. RESULTS Here, we propose to predict angles in fine grids of 5° by using deep learning neural networks. We found that this grid-based technique can yield 2-6% higher accuracy in predicting angles in the same 5° bin than existing prediction techniques compared. We further demonstrate the usefulness of predicted probabilities at given angle bins in discrimination of intrinsically disorder regions and in selection of protein models. CONCLUSIONS The proposed method may be useful for characterizing protein structure and disorder. The method is available at http://sparks-lab.org/server/SPIDER2/ as a part of SPIDER2 package.
Collapse
Affiliation(s)
- Jianzhao Gao
- School of Mathematical Sciences and LPMC, Nankai University, Tianjin, 300071 People’s Republic of China
| | - Yuedong Yang
- School of Data and Computer Science, Sun Yat-sen University, Guangzhou, 510000 People’s Republic of China
| | - Yaoqi Zhou
- Institute for Glycomics and School of Information and Communication Technology, Griffith University, Parklands Dr, Southport, QLD 4222 Australia
| |
Collapse
|
100
|
Bernardi A, Kirschner KN, Faller R. Structural analysis of human glycoprotein butyrylcholinesterase using atomistic molecular dynamics: The importance of glycosylation site ASN241. PLoS One 2017; 12:e0187994. [PMID: 29190644 PMCID: PMC5708630 DOI: 10.1371/journal.pone.0187994] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2017] [Accepted: 10/30/2017] [Indexed: 11/18/2022] Open
Abstract
Human butyrylcholinesterase (BChE) is a glycoprotein capable of bioscavenging toxic compounds such as organophosphorus (OP) nerve agents. For commercial production of BChE, it is practical to synthesize BChE in non–human expression systems, such as plants or animals. However, the glycosylation profile in these systems is significantly different from the human glycosylation profile, which could result in changes in BChE’s structure and function. From our investigation, we found that the glycan attached to ASN241 is both structurally and functionally important due to its close proximity to the BChE tetramerization domain and the active site gorge. To investigate the effects of populating glycosylation site ASN241, monomeric human BChE glycoforms were simulated with and without site ASN241 glycosylated. Our simulations indicate that the structure and function of human BChE are significantly affected by the absence of glycan 241.
Collapse
Affiliation(s)
- Austen Bernardi
- Department of Chemical Engineering, University of California–Davis, Davis, California, United States of America
| | - Karl N. Kirschner
- Bonn–Rhein–Sieg University of Applied Sciences, Sankt Augustin, Germany
| | - Roland Faller
- Department of Chemical Engineering, University of California–Davis, Davis, California, United States of America
- * E-mail:
| |
Collapse
|