Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Katzman S, Barrett C, Thiltgen G, Karchin R, Karplus K. PREDICT-2ND: a tool for generalized protein local structure prediction. ACTA ACUST UNITED AC 2008;24:2453-9. [PMID: 18757875 DOI: 10.1093/bioinformatics/btn438] [Citation(s) in RCA: 28] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]

For:	Katzman S, Barrett C, Thiltgen G, Karchin R, Karplus K. PREDICT-2ND: a tool for generalized protein local structure prediction. ACTA ACUST UNITED AC 2008;24:2453-9. [PMID: 18757875 DOI: 10.1093/bioinformatics/btn438] [Citation(s) in RCA: 28] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]

Number

Cited by Other Article(s)

Wang L, Sun H, Yue Z, Xia J, Li X. CDMPred: a tool for predicting cancer driver missense mutations with high-quality passenger mutations. PeerJ 2024;12:e17991. [PMID: 39253604 PMCID: PMC11382650 DOI: 10.7717/peerj.17991] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2024] [Accepted: 08/07/2024] [Indexed: 09/11/2024] Open

Villalobos-Alva J, Ochoa-Toledo L, Villalobos-Alva MJ, Aliseda A, Pérez-Escamirosa F, Altamirano-Bustamante NF, Ochoa-Fernández F, Zamora-Solís R, Villalobos-Alva S, Revilla-Monsalve C, Kemper-Valverde N, Altamirano-Bustamante MM. Protein Science Meets Artificial Intelligence: A Systematic Review and a Biochemical Meta-Analysis of an Inter-Field. Front Bioeng Biotechnol 2022;10:788300. [PMID: 35875501 PMCID: PMC9301016 DOI: 10.3389/fbioe.2022.788300] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2021] [Accepted: 05/25/2022] [Indexed: 11/23/2022] Open

Abstract

Proteins are some of the most fascinating and challenging molecules in the universe, and they pose a big challenge for artificial intelligence. The implementation of machine learning/AI in protein science gives rise to a world of knowledge adventures in the workhorse of the cell and proteome homeostasis, which are essential for making life possible. This opens up epistemic horizons thanks to a coupling of human tacit-explicit knowledge with machine learning power, the benefits of which are already tangible, such as important advances in protein structure prediction. Moreover, the driving force behind the protein processes of self-organization, adjustment, and fitness requires a space corresponding to gigabytes of life data in its order of magnitude. There are many tasks such as novel protein design, protein folding pathways, and synthetic metabolic routes, as well as protein-aggregation mechanisms, pathogenesis of protein misfolding and disease, and proteostasis networks that are currently unexplored or unrevealed. In this systematic review and biochemical meta-analysis, we aim to contribute to bridging the gap between what we call binomial artificial intelligence (AI) and protein science (PS), a growing research enterprise with exciting and promising biotechnological and biomedical applications. We undertake our task by exploring "the state of the art" in AI and machine learning (ML) applications to protein science in the scientific literature to address some critical research questions in this domain, including What kind of tasks are already explored by ML approaches to protein sciences? What are the most common ML algorithms and databases used? What is the situational diagnostic of the AI-PS inter-field? What do ML processing steps have in common? We also formulate novel questions such as Is it possible to discover what the rules of protein evolution are with the binomial AI-PS? How do protein folding pathways evolve? What are the rules that dictate the folds? What are the minimal nuclear protein structures? How do protein aggregates form and why do they exhibit different toxicities? What are the structural properties of amyloid proteins? How can we design an effective proteostasis network to deal with misfolded proteins? We are a cross-functional group of scientists from several academic disciplines, and we have conducted the systematic review using a variant of the PICO and PRISMA approaches. The search was carried out in four databases (PubMed, Bireme, OVID, and EBSCO Web of Science), resulting in 144 research articles. After three rounds of quality screening, 93 articles were finally selected for further analysis. A summary of our findings is as follows: regarding AI applications, there are mainly four types: 1) genomics, 2) protein structure and function, 3) protein design and evolution, and 4) drug design. In terms of the ML algorithms and databases used, supervised learning was the most common approach (85%). As for the databases used for the ML models, PDB and UniprotKB/Swissprot were the most common ones (21 and 8%, respectively). Moreover, we identified that approximately 63% of the articles organized their results into three steps, which we labeled pre-process, process, and post-process. A few studies combined data from several databases or created their own databases after the pre-process. Our main finding is that, as of today, there are no research road maps serving as guides to address gaps in our knowledge of the AI-PS binomial. All research efforts to collect, integrate multidimensional data features, and then analyze and validate them are, so far, uncoordinated and scattered throughout the scientific literature without a clear epistemic goal or connection between the studies. Therefore, our main contribution to the scientific literature is to offer a road map to help solve problems in drug design, protein structures, design, and function prediction while also presenting the "state of the art" on research in the AI-PS binomial until February 2021. Thus, we pave the way toward future advances in the synthetic redesign of novel proteins and protein networks and artificial metabolic pathways, learning lessons from nature for the welfare of humankind. Many of the novel proteins and metabolic pathways are currently non-existent in nature, nor are they used in the chemical industry or biomedical field.

Collapse

Affiliation(s)

Jalil Villalobos-Alva Unidad de Investigación en Enfermedades Metabólicas, Centro Médico Nacional Siglo XXI, Instituto Mexicano del Seguro Social, Mexico City, Mexico
Luis Ochoa-Toledo Instituto de Ciencias Aplicadas y Tecnología (ICAT), Universidad Nacional Autónoma de México (UNAM), Mexico City, Mexico
Mario Javier Villalobos-Alva Unidad de Investigación en Enfermedades Metabólicas, Centro Médico Nacional Siglo XXI, Instituto Mexicano del Seguro Social, Mexico City, Mexico
Atocha Aliseda Instituto de Investigaciones Filosóficas, Universidad Nacional Autónoma de México (UNAM), Mexico City, Mexico
Fernando Pérez-Escamirosa Instituto de Ciencias Aplicadas y Tecnología (ICAT), Universidad Nacional Autónoma de México (UNAM), Mexico City, Mexico
Nelly F. Altamirano-Bustamante Instituto Nacional de Pediatría, Mexico City, Mexico
Francine Ochoa-Fernández Unidad de Investigación en Enfermedades Metabólicas, Centro Médico Nacional Siglo XXI, Instituto Mexicano del Seguro Social, Mexico City, Mexico
Ricardo Zamora-Solís Unidad de Investigación en Enfermedades Metabólicas, Centro Médico Nacional Siglo XXI, Instituto Mexicano del Seguro Social, Mexico City, Mexico
Sebastián Villalobos-Alva Unidad de Investigación en Enfermedades Metabólicas, Centro Médico Nacional Siglo XXI, Instituto Mexicano del Seguro Social, Mexico City, Mexico
Cristina Revilla-Monsalve Unidad de Investigación en Enfermedades Metabólicas, Centro Médico Nacional Siglo XXI, Instituto Mexicano del Seguro Social, Mexico City, Mexico
Nicolás Kemper-Valverde Instituto de Ciencias Aplicadas y Tecnología (ICAT), Universidad Nacional Autónoma de México (UNAM), Mexico City, Mexico
Myriam M. Altamirano-Bustamante Unidad de Investigación en Enfermedades Metabólicas, Centro Médico Nacional Siglo XXI, Instituto Mexicano del Seguro Social, Mexico City, Mexico

Collapse

Sidi T, Keasar C. Redundancy-weighting the PDB for detailed secondary structure prediction using deep-learning models. Bioinformatics 2020;36:3733-3738. [PMID: 32186698 DOI: 10.1093/bioinformatics/btaa196] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2019] [Revised: 03/12/2020] [Accepted: 03/16/2020] [Indexed: 11/15/2022] Open

Thomas JMH, Simkovic F, Keegan R, Mayans O, Zhang C, Zhang Y, Rigden DJ. Approaches to ab initio molecular replacement of α-helical transmembrane proteins. Acta Crystallogr D Struct Biol 2017;73:985-996. [PMID: 29199978 PMCID: PMC5713875 DOI: 10.1107/s2059798317016436] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2017] [Accepted: 11/15/2017] [Indexed: 02/06/2023] Open

Protein secondary structure prediction: A survey of the state of the art. J Mol Graph Model 2017;76:379-402. [DOI: 10.1016/j.jmgm.2017.07.015] [Citation(s) in RCA: 50] [Impact Index Per Article: 7.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2017] [Revised: 07/14/2017] [Accepted: 07/17/2017] [Indexed: 11/21/2022]

Beltrandi M, Blocquel D, Erales J, Barbier P, Cavalli A, Longhi S. Insights into the coiled-coil organization of the Hendra virus phosphoprotein from combined biochemical and SAXS studies. Virology 2015;477:42-55. [PMID: 25637789 DOI: 10.1016/j.virol.2014.12.029] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2014] [Revised: 10/21/2014] [Accepted: 12/19/2014] [Indexed: 10/24/2022]

Meier A, Söding J. Context similarity scoring improves protein sequence alignments in the midnight zone. Bioinformatics 2014;31:674-81. [PMID: 25338715 DOI: 10.1093/bioinformatics/btu697] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open

Hussain RH, Zawawi M, Bayfield MA. Conservation of RNA chaperone activity of the human La-related proteins 4, 6 and 7. Nucleic Acids Res 2013;41:8715-25. [PMID: 23887937 PMCID: PMC3794603 DOI: 10.1093/nar/gkt649] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2013] [Revised: 06/28/2013] [Accepted: 07/03/2013] [Indexed: 12/22/2022] Open

Blocquel D, Beltrandi M, Erales J, Barbier P, Longhi S. Biochemical and structural studies of the oligomerization domain of the Nipah virus phosphoprotein: evidence for an elongated coiled-coil homotrimer. Virology 2013;446:162-72. [PMID: 24074578 DOI: 10.1016/j.virol.2013.07.031] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2013] [Revised: 07/08/2013] [Accepted: 07/24/2013] [Indexed: 12/19/2022]

Trotta AP, Need EF, Butler LM, Selth LA, O'Loughlin MA, Coetzee GA, Tilley WD, Buchanan G. Subdomain structure of the co-chaperone SGTA and activity of its androgen receptor client. J Mol Endocrinol 2012;49:57-68. [PMID: 22693264 DOI: 10.1530/jme-11-0152] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 01/31/2023]

Al Rayyan N, Wankhade UD, Bush K, Good DJ. Two single nucleotide polymorphisms in the human nescient helix-loop-helix 2 (NHLH2) gene reduce mRNA stability and DNA binding. Gene 2012;512:134-42. [PMID: 23026212 DOI: 10.1016/j.gene.2012.09.068] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2012] [Revised: 08/07/2012] [Accepted: 09/12/2012] [Indexed: 01/17/2023]

Gront D, Blaszczyk M, Wojciechowski P, Kolinski A. BioShell Threader: protein homology detection based on sequence profiles and secondary structure profiles. Nucleic Acids Res 2012;40:W257-62. [PMID: 22693216 PMCID: PMC3394251 DOI: 10.1093/nar/gks555] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022] Open

Qi Y, Oja M, Weston J, Noble WS. A unified multitask architecture for predicting local protein properties. PLoS One 2012;7:e32235. [PMID: 22461885 PMCID: PMC3312883 DOI: 10.1371/journal.pone.0032235] [Citation(s) in RCA: 38] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2011] [Accepted: 01/25/2012] [Indexed: 01/27/2023] Open

Naeeni AR, Conte MR, Bayfield MA. RNA chaperone activity of human La protein is mediated by variant RNA recognition motif. J Biol Chem 2012;287:5472-82. [PMID: 22203678 PMCID: PMC3285324 DOI: 10.1074/jbc.m111.276071] [Citation(s) in RCA: 33] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2011] [Revised: 12/23/2011] [Indexed: 02/05/2023] Open

Vornam B, Gailing O, Derory J, Plomion C, Kremer A, Finkeldey R. Characterisation and natural variation of a dehydrin gene in Quercus petraea (Matt.) Liebl. PLANT BIOLOGY (STUTTGART, GERMANY) 2011;13:881-887. [PMID: 21973280 DOI: 10.1111/j.1438-8677.2011.00446.x] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/26/2023]

Identification of a chemoreceptor zinc-binding domain common to cytoplasmic bacterial chemoreceptors. J Bacteriol 2011;193:4338-45. [PMID: 21725005 DOI: 10.1128/jb.05140-11] [Citation(s) in RCA: 33] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023] Open

Strunk T, Hamacher K, Hoffgaard F, Engelhardt H, Zillig MD, Faist K, Wenzel W, Pfeifer F. Structural model of the gas vesicle protein GvpA and analysis of GvpA mutants in vivo. Mol Microbiol 2011;81:56-68. [PMID: 21542854 DOI: 10.1111/j.1365-2958.2011.07669.x] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]

Samayoa J, Yildiz FH, Karplus K. Identification of prokaryotic small proteins using a comparative genomic approach. ACTA ACUST UNITED AC 2011;27:1765-71. [PMID: 21551138 DOI: 10.1093/bioinformatics/btr275] [Citation(s) in RCA: 33] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]

Abstract

MOTIVATION

Accurate prediction of genes encoding small proteins (on the order of 50 amino acids or less) remains an elusive open problem in bioinformatics. Some of the best methods for gene prediction use either sequence composition analysis or sequence similarity to a known protein coding sequence. These methods often fail for small proteins, however, either due to a lack of experimentally verified small protein coding genes or due to the limited statistical significance of statistics on small sequences. Our approach is based upon the hypothesis that true small proteins will be under selective pressure for encoding the particular amino acid sequence, for ease of translation by the ribosome and for structural stability. This stability can be achieved either independently or as part of a larger protein complex. Given this assumption, it follows that small proteins should display conserved local protein structure properties much like larger proteins. Our method incorporates neural-net predictions for three local structure alphabets within a comparative genomic approach using a genomic alignment of 22 closely related bacteria genomes to generate predictions for whether or not a given open reading frame (ORF) encodes for a small protein.

RESULTS

We have applied this method to the complete genome for Escherichia coli strain K12 and looked at how well our method performed on a set of 60 experimentally verified small proteins from this organism. Out of a total of 11 407 possible ORFs, we found that 6 of the top 10 and 27 of the top 100 predictions belonged to the set of 60 experimentally verified small proteins. We found 35 of all the true small proteins within the top 200 predictions. We compared our method to Glimmer, using a default Glimmer protocol and a modified small ORF Glimmer protocol with a lower minimum size cutoff. The default Glimmer protocol identified 16 of the true small proteins (all in the top 200 predictions), but failed to predict on 34 due to size cutoffs. The small ORF Glimmer protocol made predictions for all the experimentally verified small proteins but only contained 9 of the 60 true small proteins within the top 200 predictions.

CONTACT

jsamayoa@jhu.edu

Collapse

Identification of missense mutation (I12T) in the BSND gene and bioinformatics analysis. J Biomed Biotechnol 2011;2011:304612. [PMID: 21541222 PMCID: PMC3085335 DOI: 10.1155/2011/304612] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2010] [Revised: 12/21/2010] [Accepted: 02/04/2011] [Indexed: 11/18/2022] Open

Sagemark J, Kraulis P, Weigelt J. A software tool to accelerate design of protein constructs for recombinant expression. Protein Expr Purif 2010;72:175-8. [PMID: 20359538 DOI: 10.1016/j.pep.2010.03.020] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2009] [Revised: 03/19/2010] [Accepted: 03/25/2010] [Indexed: 11/26/2022]

Madera M, Calmus R, Thiltgen G, Karplus K, Gough J. Improving protein secondary structure prediction using a simple k-mer model. Bioinformatics 2010;26:596-602. [PMID: 20130034 PMCID: PMC2828123 DOI: 10.1093/bioinformatics/btq020] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/04/2022] Open

Archie J, Karplus K. Applying undertaker cost functions to model quality assessment. Proteins 2009;75:550-5. [PMID: 19004017 DOI: 10.1002/prot.22288] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]

Paluszewski M, Karplus K. Model quality assessment using distance constraints from alignments. Proteins 2009;75:540-9. [PMID: 19003987 DOI: 10.1002/prot.22262] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]

Helles G, Fonseca R. Predicting dihedral angle probability distributions for protein coil residues from primary sequence using neural networks. BMC Bioinformatics 2009;10:338. [PMID: 19835576 PMCID: PMC2771020 DOI: 10.1186/1471-2105-10-338] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2009] [Accepted: 10/16/2009] [Indexed: 11/10/2022] Open

Abstract

Background

Predicting the three-dimensional structure of a protein from its amino acid sequence is currently one of the most challenging problems in bioinformatics. The internal structure of helices and sheets is highly recurrent and help reduce the search space significantly. However, random coil segments make up nearly 40% of proteins and they do not have any apparent recurrent patterns, which complicates overall prediction accuracy of protein structure prediction methods. Luckily, previous work has indicated that coil segments are in fact not completely random in structure and flanking residues do seem to have a significant influence on the dihedral angles adopted by the individual amino acids in coil segments. In this work we attempt to predict a probability distribution of these dihedral angles based on the flanking residues. While attempts to predict dihedral angles of coil segments have been done previously, none have, to our knowledge, presented comparable results for the probability distribution of dihedral angles.

Results

In this paper we develop an artificial neural network that uses an input-window of amino acids to predict a dihedral angle probability distribution for the middle residue in the input-window. The trained neural network shows a significant improvement (4-68%) in predicting the most probable bin (covering a 30° × 30° area of the dihedral angle space) for all amino acids in the data set compared to baseline statistics. An accuracy comparable to that of secondary structure prediction (≈ 80%) is achieved by observing the 20 bins with highest output values.

Conclusion

Many different protein structure prediction methods exist and each uses different tools and auxiliary predictions to help determine the native structure. In this work the sequence is used to predict local context dependent dihedral angle propensities in coil-regions. This predicted distribution can potentially improve tertiary structure prediction methods that are based on sampling the backbone dihedral angles of individual amino acids. The predicted distribution may also help predict local structure fragments used in fragment assembly methods.

Collapse

Lippi M, Frasconi P. Prediction of protein beta-residue contacts by Markov logic networks with grounding-specific weights. ACTA ACUST UNITED AC 2009;25:2326-33. [PMID: 19592394 DOI: 10.1093/bioinformatics/btp421] [Citation(s) in RCA: 38] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]