1
|
Lyon KF, Cai X, Young RJ, Mamun AA, Rajasekaran S, Schiller MR. Minimotif Miner 4: a million peptide minimotifs and counting. Nucleic Acids Res 2019; 46:D465-D470. [PMID: 29140456 PMCID: PMC5753208 DOI: 10.1093/nar/gkx1085] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2017] [Accepted: 11/09/2017] [Indexed: 12/27/2022] Open
Abstract
Minimotif Miner (MnM) is a database and web system for analyzing short functional peptide motifs, termed minimotifs. We present an update to MnM growing the database from ∼300 000 to >1 000 000 minimotif consensus sequences and instances. This growth comes largely from updating data from existing databases and annotation of articles with high-throughput approaches analyzing different types of post-translational modifications. Another update is mapping human proteins and their minimotifs to know human variants from the dbSNP, build 150. Now MnM 4 can be used to generate mechanistic hypotheses about how human genetic variation affect minimotifs and outcomes. One example of the utility of the combined minimotif/SNP tool identifies a loss of function missense SNP in a ubiquitylation minimotif encoded in the excision repair cross-complementing 2 (ERCC2) nucleotide excision repair gene. This SNP reaches genome wide significance for many types of cancer and the variant identified with MnM 4 reveals a more detailed mechanistic hypothesis concerning the role of ERCC2 in cancer. Other updates to the web system include a new architecture with migration of the web system and database to Docker containers for better performance and management. Weblinks:minimotifminer.org and mnm.engr.uconn.edu
Collapse
Affiliation(s)
- Kenneth F Lyon
- Nevada Institute of Personalized Medicine and School of Life Sciences, University of Nevada, Las Vegas, 89154 4004 NV, USA
| | - Xingyu Cai
- Department of Computer Science and Engineering, University of Connecticut, Storrs, CT 06269 2155, USA
| | - Richard J Young
- Nevada Institute of Personalized Medicine and School of Life Sciences, University of Nevada, Las Vegas, 89154 4004 NV, USA
| | - Abdullah-Al Mamun
- Department of Computer Science and Engineering, University of Connecticut, Storrs, CT 06269 2155, USA
| | - Sanguthevar Rajasekaran
- Department of Computer Science and Engineering, University of Connecticut, Storrs, CT 06269 2155, USA
| | - Martin R Schiller
- Nevada Institute of Personalized Medicine and School of Life Sciences, University of Nevada, Las Vegas, 89154 4004 NV, USA
| |
Collapse
|
2
|
van der Lee R, Buljan M, Lang B, Weatheritt RJ, Daughdrill GW, Dunker AK, Fuxreiter M, Gough J, Gsponer J, Jones D, Kim PM, Kriwacki R, Oldfield CJ, Pappu RV, Tompa P, Uversky VN, Wright P, Babu MM. Classification of intrinsically disordered regions and proteins. Chem Rev 2014; 114:6589-631. [PMID: 24773235 PMCID: PMC4095912 DOI: 10.1021/cr400525m] [Citation(s) in RCA: 1410] [Impact Index Per Article: 141.0] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2013] [Indexed: 12/11/2022]
Affiliation(s)
- Robin van der Lee
- MRC
Laboratory of Molecular Biology, Francis Crick Avenue, Cambridge CB2 0QH, United Kingdom
- Centre
for Molecular and Biomolecular Informatics, Radboud University Medical Centre, 6500 HB Nijmegen, The
Netherlands
| | - Marija Buljan
- MRC
Laboratory of Molecular Biology, Francis Crick Avenue, Cambridge CB2 0QH, United Kingdom
| | - Benjamin Lang
- MRC
Laboratory of Molecular Biology, Francis Crick Avenue, Cambridge CB2 0QH, United Kingdom
| | - Robert J. Weatheritt
- MRC
Laboratory of Molecular Biology, Francis Crick Avenue, Cambridge CB2 0QH, United Kingdom
| | - Gary W. Daughdrill
- Department
of Cell Biology, Microbiology, and Molecular Biology, University of South Florida, 3720 Spectrum Boulevard, Suite 321, Tampa, Florida 33612, United States
| | - A. Keith Dunker
- Department
of Biochemistry and Molecular Biology, Indiana
University School of Medicine, Indianapolis, Indiana 46202, United States
| | - Monika Fuxreiter
- MTA-DE
Momentum Laboratory of Protein Dynamics, Department of Biochemistry
and Molecular Biology, University of Debrecen, H-4032 Debrecen, Nagyerdei krt 98, Hungary
| | - Julian Gough
- Department
of Computer Science, University of Bristol, The Merchant Venturers Building, Bristol BS8 1UB, United Kingdom
| | - Joerg Gsponer
- Department
of Biochemistry and Molecular Biology, Centre for High-Throughput
Biology, University of British Columbia, Vancouver, British Columbia V6T 1Z4, Canada
| | - David
T. Jones
- Bioinformatics
Group, Department of Computer Science, University
College London, London, WC1E 6BT, United Kingdom
| | - Philip M. Kim
- Terrence Donnelly Centre for Cellular and Biomolecular Research, Department of Molecular
Genetics, and Department of Computer Science, University
of Toronto, Toronto, Ontario M5S 3E1, Canada
| | - Richard
W. Kriwacki
- Department
of Structural Biology, St. Jude Children’s
Research Hospital, Memphis, Tennessee 38105, United States
| | - Christopher J. Oldfield
- Department
of Biochemistry and Molecular Biology, Indiana
University School of Medicine, Indianapolis, Indiana 46202, United States
| | - Rohit V. Pappu
- Department
of Biomedical Engineering and Center for Biological Systems Engineering, Washington University in St. Louis, St. Louis, Missouri 63130, United States
| | - Peter Tompa
- VIB Department
of Structural Biology, Vrije Universiteit
Brussel, Brussels, Belgium
- Institute
of Enzymology, Research Centre for Natural Sciences, Hungarian Academy of Sciences, Budapest, Hungary
| | - Vladimir N. Uversky
- Department
of Molecular Medicine and USF Health Byrd Alzheimer’s Research
Institute, Morsani College of Medicine, University of South Florida, Tampa, Florida 33612, United States
- Institute for Biological Instrumentation,
Russian Academy of Sciences, Pushchino,
Moscow Region, Russia
| | - Peter
E. Wright
- Department
of Integrative Structural and Computational Biology and Skaggs Institute
of Chemical Biology, The Scripps Research
Institute, 10550 North
Torrey Pines Road, La Jolla, California 92037, United States
| | - M. Madan Babu
- MRC
Laboratory of Molecular Biology, Francis Crick Avenue, Cambridge CB2 0QH, United Kingdom
| |
Collapse
|
3
|
Coelho ED, Arrais JP, Matos S, Pereira C, Rosa N, Correia MJ, Barros M, Oliveira JL. Computational prediction of the human-microbial oral interactome. BMC SYSTEMS BIOLOGY 2014; 8:24. [PMID: 24576332 PMCID: PMC3975954 DOI: 10.1186/1752-0509-8-24] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/27/2013] [Accepted: 02/17/2014] [Indexed: 11/12/2022]
Abstract
BACKGROUND The oral cavity is a complex ecosystem where human chemical compounds coexist with a particular microbiota. However, shifts in the normal composition of this microbiota may result in the onset of oral ailments, such as periodontitis and dental caries. In addition, it is known that the microbial colonization of the oral cavity is mediated by protein-protein interactions (PPIs) between the host and microorganisms. Nevertheless, this kind of PPIs is still largely undisclosed. To elucidate these interactions, we have created a computational prediction method that allows us to obtain a first model of the Human-Microbial oral interactome. RESULTS We collected high-quality experimental PPIs from five major human databases. The obtained PPIs were used to create our positive dataset and, indirectly, our negative dataset. The positive and negative datasets were merged and used for training and validation of a naïve Bayes classifier. For the final prediction model, we used an ensemble methodology combining five distinct PPI prediction techniques, namely: literature mining, primary protein sequences, orthologous profiles, biological process similarity, and domain interactions. Performance evaluation of our method revealed an area under the ROC-curve (AUC) value greater than 0.926, supporting our primary hypothesis, as no single set of features reached an AUC greater than 0.877. After subjecting our dataset to the prediction model, the classified result was filtered for very high confidence PPIs (probability ≥ 1-10-7), leading to a set of 46,579 PPIs to be further explored. CONCLUSIONS We believe this dataset holds not only important pathways involved in the onset of infectious oral diseases, but also potential drug-targets and biomarkers. The dataset used for training and validation, the predictions obtained and the network final network are available at http://bioinformatics.ua.pt/software/oralint.
Collapse
Affiliation(s)
- Edgar D Coelho
- Department of Electronics, Telecommunications and Informatics (DETI), Institute of Electronics and Telematics Engineering of Aveiro (IEETA), University of Aveiro, Aveiro, Portugal
| | - Joel P Arrais
- Department of Informatics Engineering (DEI), University of Coimbra, Coimbra, Portugal
- Centre for Informatics and Systems of the University at Coimbra (CISUC), University of Coimbra, Coimbra, Portugal
| | - Sérgio Matos
- Department of Electronics, Telecommunications and Informatics (DETI), Institute of Electronics and Telematics Engineering of Aveiro (IEETA), University of Aveiro, Aveiro, Portugal
| | - Carlos Pereira
- Centre for Informatics and Systems of the University at Coimbra (CISUC), University of Coimbra, Coimbra, Portugal
- Department of Informatics Engineering and Systems, Polytechnic Institute of Coimbra, Engineering Institute of Coimbra (IPC-ISEC), Coimbra, Portugal
| | - Nuno Rosa
- Department of Health Sciences, Institute of Health Sciences, The Catholic University of Portugal, Viseu, Portugal
| | - Maria José Correia
- Department of Health Sciences, Institute of Health Sciences, The Catholic University of Portugal, Viseu, Portugal
| | - Marlene Barros
- Department of Health Sciences, Institute of Health Sciences, The Catholic University of Portugal, Viseu, Portugal
- Centre for Neurosciences and Cell Biology, University of Coimbra, Coimbra, Portugal
| | - José Luís Oliveira
- Department of Electronics, Telecommunications and Informatics (DETI), Institute of Electronics and Telematics Engineering of Aveiro (IEETA), University of Aveiro, Aveiro, Portugal
| |
Collapse
|
4
|
Secondary structure, a missing component of sequence-based minimotif definitions. PLoS One 2012; 7:e49957. [PMID: 23236358 PMCID: PMC3517595 DOI: 10.1371/journal.pone.0049957] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2012] [Accepted: 10/15/2012] [Indexed: 12/27/2022] Open
Abstract
Minimotifs are short contiguous segments of proteins that have a known biological function. The hundreds of thousands of minimotifs discovered thus far are an important part of the theoretical understanding of the specificity of protein-protein interactions, posttranslational modifications, and signal transduction that occur in cells. However, a longstanding problem is that the different abstractions of the sequence definitions do not accurately capture the specificity, despite decades of effort by many labs. We present evidence that structure is an essential component of minimotif specificity, yet is not used in minimotif definitions. Our analysis of several known minimotifs as case studies, analysis of occurrences of minimotifs in structured and disordered regions of proteins, and review of the literature support a new model for minimotif definitions that includes sequence, structure, and function.
Collapse
|
5
|
Mi T, Rajasekaran S, Merlin JC, Gryk M, Schiller MR. Achieving high accuracy prediction of minimotifs. PLoS One 2012; 7:e45589. [PMID: 23029121 PMCID: PMC3459956 DOI: 10.1371/journal.pone.0045589] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2012] [Accepted: 08/23/2012] [Indexed: 12/04/2022] Open
Abstract
The low complexity of minimotif patterns results in a high false-positive prediction rate, hampering protein function prediction. A multi-filter algorithm, trained and tested on a linear regression model, support vector machine model, and neural network model, using a large dataset of verified minimotifs, vastly improves minimotif prediction accuracy while generating few false positives. An optimal threshold for the best accuracy reaches an overall accuracy above 90%, while a stringent threshold for the best specificity generates less than 1% false positives or even no false positives and still produces more than 90% true positives for the linear regression and neural network models. The minimotif multi-filter with its excellent accuracy represents the state-of-the-art in minimotif prediction and is expected to be very useful to biologists investigating protein function and how missense mutations cause disease.
Collapse
Affiliation(s)
- Tian Mi
- Department of Computer Science and Engineering, University of Connecticut, Storrs, Connecticut, United States of America
| | - Sanguthevar Rajasekaran
- Department of Computer Science and Engineering, University of Connecticut, Storrs, Connecticut, United States of America
- * E-mail: (SR); (MRS)
| | - Jerlin Camilus Merlin
- Department of Computer Science and Engineering, University of Connecticut, Storrs, Connecticut, United States of America
| | - Michael Gryk
- Department of Molecular, Microbial, and Structural Biology, University of Connecticut Health Center, Farmington, Connecticut, United States of America
| | - Martin R. Schiller
- School of Life Sciences, University of Nevada Las Vegas, Las Vegas, Nevada, United States of America
- * E-mail: (SR); (MRS)
| |
Collapse
|
6
|
Merlin JC, Rajasekaran S, Mi T, Schiller MR. Reducing false-positive prediction of minimotifs with a genetic interaction filter. PLoS One 2012; 7:e32630. [PMID: 22403687 PMCID: PMC3293834 DOI: 10.1371/journal.pone.0032630] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2011] [Accepted: 01/29/2012] [Indexed: 11/29/2022] Open
Abstract
Background Minimotifs are short contiguous peptide sequences in proteins that have known functions. At its simplest level, the minimotif sequence is present in a source protein and has an activity relationship with a target, most of which are proteins. While many scientists routinely investigate new minimotif functions in proteins, the major web-based discovery tools have a high rate of false-positive prediction. Any new approach that reduces false-positives will be of great help to biologists. Methods and Findings We have built three filters that use genetic interactions to reduce false-positive minimotif predictions. The basic filter identifies those minimotifs where the source/target protein pairs have a known genetic interaction. The HomoloGene genetic interaction filter extends these predictions to predicted genetic interactions of orthologous proteins and the node-based filter identifies those minimotifs where proteins that have a genetic interaction with the source or target have a genetic interaction. Each filter was evaluated with a test data set containing thousands of true and false-positives. Based on sensitivity and selectivity performance metrics, the basic filter had the best discrimination for true positives, whereas the node-based filter had the highest sensitivity. We have implemented these genetic interaction filters on the Minimotif Miner 2.3 website. The genetic interaction filter is particularly useful for improving predictions of posttranslational modifications such as phosphorylation and proteolytic cleavage sites. Conclusions Genetic interaction data sets can be used to reduce false-positive minimotif predictions. Minimotif prediction in known genetic interactions can help to refine the mechanisms behind the functional connection between genes revealed by genetic experimentation and screens.
Collapse
Affiliation(s)
- Jerlin C. Merlin
- Department of Computer Science and Engineering, University of Connecticut, Storrs, Connecticut, United States of America
| | - Sanguthevar Rajasekaran
- Department of Computer Science and Engineering, University of Connecticut, Storrs, Connecticut, United States of America
- * E-mail: (SR); (MRS)
| | - Tian Mi
- Department of Computer Science and Engineering, University of Connecticut, Storrs, Connecticut, United States of America
| | - Martin R. Schiller
- School of Life Sciences, University of Nevada, Las Vegas, Nevada, United States of America
- * E-mail: (SR); (MRS)
| |
Collapse
|
7
|
Mi T, Merlin JC, Deverasetty S, Gryk MR, Bill TJ, Brooks AW, Lee LY, Rathnayake V, Ross CA, Sargeant DP, Strong CL, Watts P, Rajasekaran S, Schiller MR. Minimotif Miner 3.0: database expansion and significantly improved reduction of false-positive predictions from consensus sequences. Nucleic Acids Res 2012; 40:D252-60. [PMID: 22146221 PMCID: PMC3245078 DOI: 10.1093/nar/gkr1189] [Citation(s) in RCA: 49] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2011] [Revised: 11/14/2011] [Accepted: 11/15/2011] [Indexed: 12/21/2022] Open
Abstract
Minimotif Miner (MnM available at http://minimotifminer.org or http://mnm.engr.uconn.edu) is an online database for identifying new minimotifs in protein queries. Minimotifs are short contiguous peptide sequences that have a known function in at least one protein. Here we report the third release of the MnM database which has now grown 60-fold to approximately 300,000 minimotifs. Since short minimotifs are by their nature not very complex we also summarize a new set of false-positive filters and linear regression scoring that vastly enhance minimotif prediction accuracy on a test data set. This online database can be used to predict new functions in proteins and causes of disease.
Collapse
Affiliation(s)
- Tian Mi
- Department of Computer Science and Engineering, University of Connecticut, Storrs, CT 06269-2155, School of Life Sciences, University of Nevada Las Vegas, 4505 Maryland Pkwy., Las Vegas, NV 89154-4004 and Department of Molecular, Microbial, and Structural Biology, University of Connecticut Health Center, 263 Farmington Ave., Farmington, CT 06030-3305, USA
| | - Jerlin Camilus Merlin
- Department of Computer Science and Engineering, University of Connecticut, Storrs, CT 06269-2155, School of Life Sciences, University of Nevada Las Vegas, 4505 Maryland Pkwy., Las Vegas, NV 89154-4004 and Department of Molecular, Microbial, and Structural Biology, University of Connecticut Health Center, 263 Farmington Ave., Farmington, CT 06030-3305, USA
| | - Sandeep Deverasetty
- Department of Computer Science and Engineering, University of Connecticut, Storrs, CT 06269-2155, School of Life Sciences, University of Nevada Las Vegas, 4505 Maryland Pkwy., Las Vegas, NV 89154-4004 and Department of Molecular, Microbial, and Structural Biology, University of Connecticut Health Center, 263 Farmington Ave., Farmington, CT 06030-3305, USA
| | - Michael R. Gryk
- Department of Computer Science and Engineering, University of Connecticut, Storrs, CT 06269-2155, School of Life Sciences, University of Nevada Las Vegas, 4505 Maryland Pkwy., Las Vegas, NV 89154-4004 and Department of Molecular, Microbial, and Structural Biology, University of Connecticut Health Center, 263 Farmington Ave., Farmington, CT 06030-3305, USA
| | - Travis J. Bill
- Department of Computer Science and Engineering, University of Connecticut, Storrs, CT 06269-2155, School of Life Sciences, University of Nevada Las Vegas, 4505 Maryland Pkwy., Las Vegas, NV 89154-4004 and Department of Molecular, Microbial, and Structural Biology, University of Connecticut Health Center, 263 Farmington Ave., Farmington, CT 06030-3305, USA
| | - Andrew W. Brooks
- Department of Computer Science and Engineering, University of Connecticut, Storrs, CT 06269-2155, School of Life Sciences, University of Nevada Las Vegas, 4505 Maryland Pkwy., Las Vegas, NV 89154-4004 and Department of Molecular, Microbial, and Structural Biology, University of Connecticut Health Center, 263 Farmington Ave., Farmington, CT 06030-3305, USA
| | - Logan Y. Lee
- Department of Computer Science and Engineering, University of Connecticut, Storrs, CT 06269-2155, School of Life Sciences, University of Nevada Las Vegas, 4505 Maryland Pkwy., Las Vegas, NV 89154-4004 and Department of Molecular, Microbial, and Structural Biology, University of Connecticut Health Center, 263 Farmington Ave., Farmington, CT 06030-3305, USA
| | - Viraj Rathnayake
- Department of Computer Science and Engineering, University of Connecticut, Storrs, CT 06269-2155, School of Life Sciences, University of Nevada Las Vegas, 4505 Maryland Pkwy., Las Vegas, NV 89154-4004 and Department of Molecular, Microbial, and Structural Biology, University of Connecticut Health Center, 263 Farmington Ave., Farmington, CT 06030-3305, USA
| | - Christian A. Ross
- Department of Computer Science and Engineering, University of Connecticut, Storrs, CT 06269-2155, School of Life Sciences, University of Nevada Las Vegas, 4505 Maryland Pkwy., Las Vegas, NV 89154-4004 and Department of Molecular, Microbial, and Structural Biology, University of Connecticut Health Center, 263 Farmington Ave., Farmington, CT 06030-3305, USA
| | - David P. Sargeant
- Department of Computer Science and Engineering, University of Connecticut, Storrs, CT 06269-2155, School of Life Sciences, University of Nevada Las Vegas, 4505 Maryland Pkwy., Las Vegas, NV 89154-4004 and Department of Molecular, Microbial, and Structural Biology, University of Connecticut Health Center, 263 Farmington Ave., Farmington, CT 06030-3305, USA
| | - Christy L. Strong
- Department of Computer Science and Engineering, University of Connecticut, Storrs, CT 06269-2155, School of Life Sciences, University of Nevada Las Vegas, 4505 Maryland Pkwy., Las Vegas, NV 89154-4004 and Department of Molecular, Microbial, and Structural Biology, University of Connecticut Health Center, 263 Farmington Ave., Farmington, CT 06030-3305, USA
| | - Paula Watts
- Department of Computer Science and Engineering, University of Connecticut, Storrs, CT 06269-2155, School of Life Sciences, University of Nevada Las Vegas, 4505 Maryland Pkwy., Las Vegas, NV 89154-4004 and Department of Molecular, Microbial, and Structural Biology, University of Connecticut Health Center, 263 Farmington Ave., Farmington, CT 06030-3305, USA
| | - Sanguthevar Rajasekaran
- Department of Computer Science and Engineering, University of Connecticut, Storrs, CT 06269-2155, School of Life Sciences, University of Nevada Las Vegas, 4505 Maryland Pkwy., Las Vegas, NV 89154-4004 and Department of Molecular, Microbial, and Structural Biology, University of Connecticut Health Center, 263 Farmington Ave., Farmington, CT 06030-3305, USA
| | - Martin R. Schiller
- Department of Computer Science and Engineering, University of Connecticut, Storrs, CT 06269-2155, School of Life Sciences, University of Nevada Las Vegas, 4505 Maryland Pkwy., Las Vegas, NV 89154-4004 and Department of Molecular, Microbial, and Structural Biology, University of Connecticut Health Center, 263 Farmington Ave., Farmington, CT 06030-3305, USA
| |
Collapse
|
8
|
Lobanov MY, Galzitskaya OV. Disordered patterns in clustered Protein Data Bank and in eukaryotic and bacterial proteomes. PLoS One 2011; 6:e27142. [PMID: 22073276 PMCID: PMC3208572 DOI: 10.1371/journal.pone.0027142] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2011] [Accepted: 10/11/2011] [Indexed: 11/18/2022] Open
Abstract
We have constructed the clustered Protein Data Bank and obtained clusters of chains of different identity inside each cluster, http://bioinfo.protres.ru/st_pdb/. We have compiled the largest database of disordered patterns (141) from the clustered PDB where identity between chains inside of a cluster is larger or equal to 75% (version of 28 June 2010) by using simple rules of selection. The results of these analyses would help to further our understanding of the physicochemical and structural determinants of intrinsically disordered regions that serve as molecular recognition elements. We have analyzed the occurrence of the selected patterns in 97 eukaryotic and in 26 bacterial proteomes. The disordered patterns appear more often in eukaryotic than in bacterial proteomes. The matrix of correlation coefficients between numbers of proteins where a disordered pattern from the library of 141 disordered patterns appears at least once in 9 kingdoms of eukaryota and 5 phyla of bacteria have been calculated. As a rule, the correlation coefficients are higher inside of the considered kingdom than between them. The patterns with the frequent occurrence in proteomes have low complexity (PPPPP, GGGGG, EEEED, HHHH, KKKKK, SSTSS, QQQQQP), and the type of patterns vary across different proteomes, http://bioinfo.protres.ru/fp/search_new_pattern.html.
Collapse
Affiliation(s)
- Michail Yu. Lobanov
- Group of Bioinformatics, Institute of Protein Research Russian Academy of Sciences, Pushchino, Moscow Region, Russia
| | - Oxana V. Galzitskaya
- Group of Bioinformatics, Institute of Protein Research Russian Academy of Sciences, Pushchino, Moscow Region, Russia
- * E-mail:
| |
Collapse
|