1
|
Tripathi S, Shirnekhi HK, Gorman SD, Chandra B, Baggett DW, Park CG, Somjee R, Lang B, Hosseini SMH, Pioso BJ, Li Y, Iacobucci I, Gao Q, Edmonson MN, Rice SV, Zhou X, Bollinger J, Mitrea DM, White MR, McGrail DJ, Jarosz DF, Yi SS, Babu MM, Mullighan CG, Zhang J, Sahni N, Kriwacki RW. Defining the condensate landscape of fusion oncoproteins. Nat Commun 2023; 14:6008. [PMID: 37770423 PMCID: PMC10539325 DOI: 10.1038/s41467-023-41655-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2022] [Accepted: 09/13/2023] [Indexed: 09/30/2023] Open
Abstract
Fusion oncoproteins (FOs) arise from chromosomal translocations in ~17% of cancers and are often oncogenic drivers. Although some FOs can promote oncogenesis by undergoing liquid-liquid phase separation (LLPS) to form aberrant biomolecular condensates, the generality of this phenomenon is unknown. We explored this question by testing 166 FOs in HeLa cells and found that 58% formed condensates. The condensate-forming FOs displayed physicochemical features distinct from those of condensate-negative FOs and segregated into distinct feature-based groups that aligned with their sub-cellular localization and biological function. Using Machine Learning, we developed a predictor of FO condensation behavior, and discovered that 67% of ~3000 additional FOs likely form condensates, with 35% of those predicted to function by altering gene expression. 47% of the predicted condensate-negative FOs were associated with cell signaling functions, suggesting a functional dichotomy between condensate-positive and -negative FOs. Our Datasets and reagents are rich resources to interrogate FO condensation in the future.
Collapse
Affiliation(s)
- Swarnendu Tripathi
- Department of Structural Biology, St. Jude Children's Research Hospital, Memphis, TN, USA
| | - Hazheen K Shirnekhi
- Department of Structural Biology, St. Jude Children's Research Hospital, Memphis, TN, USA
| | - Scott D Gorman
- Department of Structural Biology, St. Jude Children's Research Hospital, Memphis, TN, USA
- Arrakis Therapeutics, 830 Winter St, Waltham, MA, 02451, USA
| | - Bappaditya Chandra
- Department of Structural Biology, St. Jude Children's Research Hospital, Memphis, TN, USA
| | - David W Baggett
- Department of Structural Biology, St. Jude Children's Research Hospital, Memphis, TN, USA
| | - Cheon-Gil Park
- Department of Structural Biology, St. Jude Children's Research Hospital, Memphis, TN, USA
| | - Ramiz Somjee
- Department of Structural Biology, St. Jude Children's Research Hospital, Memphis, TN, USA
- Rhodes College, Memphis, TN, USA
- Washington University School of Medicine, 660 South Euclid Avenue, St. Louis, MO, 63110, USA
| | - Benjamin Lang
- Department of Structural Biology, St. Jude Children's Research Hospital, Memphis, TN, USA
- Center of Excellence for Data-Driven Discovery, Department of Structural Biology, St. Jude Children's Research Hospital, Memphis, TN, USA
| | - Seyed Mohammad Hadi Hosseini
- Department of Structural Biology, St. Jude Children's Research Hospital, Memphis, TN, USA
- Center of Excellence for Data-Driven Discovery, Department of Structural Biology, St. Jude Children's Research Hospital, Memphis, TN, USA
| | - Brittany J Pioso
- Department of Structural Biology, St. Jude Children's Research Hospital, Memphis, TN, USA
| | - Yongsheng Li
- Livestrong Cancer Institutes, Department of Oncology, Dell Medical School, The University of Texas at Austin, Austin, TX, 78712, USA
| | - Ilaria Iacobucci
- Department of Pathology, St. Jude Children's Research Hospital, Memphis, TN, USA
| | - Qingsong Gao
- Department of Pathology, St. Jude Children's Research Hospital, Memphis, TN, USA
| | - Michael N Edmonson
- Department of Computational Biology, St. Jude Children's Research Hospital, Memphis, TN, USA
| | - Stephen V Rice
- Department of Computational Biology, St. Jude Children's Research Hospital, Memphis, TN, USA
| | - Xin Zhou
- Department of Computational Biology, St. Jude Children's Research Hospital, Memphis, TN, USA
| | - John Bollinger
- Department of Structural Biology, St. Jude Children's Research Hospital, Memphis, TN, USA
| | - Diana M Mitrea
- Department of Structural Biology, St. Jude Children's Research Hospital, Memphis, TN, USA
- Dewpoint Therapeutics, 451 D Street, Suite 104, Boston, MA, 02210, USA
| | - Michael R White
- Department of Structural Biology, St. Jude Children's Research Hospital, Memphis, TN, USA
- IDEXX Laboratories, Inc., One IDEXX Drive, Westbrook, ME, 04092, USA
| | - Daniel J McGrail
- Center for Immunotherapy and Precision Immuno-Oncology, Cleveland Clinic, Cleveland, OH, USA
- Lerner Research Institute, Cleveland Clinic, Cleveland, OH, USA
| | - Daniel F Jarosz
- Department of Chemical and Systems Biology, Stanford University School of Medicine, Stanford, CA, USA
- Department of Developmental Biology, Stanford University School of Medicine, Stanford, CA, USA
| | - S Stephen Yi
- Livestrong Cancer Institutes, Department of Oncology, Dell Medical School, The University of Texas at Austin, Austin, TX, 78712, USA
- Department of Biomedical Engineering, and Oden Institute for Computational Engineering and Sciences, The University of Texas at Austin, Austin, TX, USA
| | - M Madan Babu
- Department of Structural Biology, St. Jude Children's Research Hospital, Memphis, TN, USA
- Center of Excellence for Data-Driven Discovery, Department of Structural Biology, St. Jude Children's Research Hospital, Memphis, TN, USA
| | - Charles G Mullighan
- Department of Pathology, St. Jude Children's Research Hospital, Memphis, TN, USA
| | - Jinghui Zhang
- Department of Computational Biology, St. Jude Children's Research Hospital, Memphis, TN, USA
| | - Nidhi Sahni
- Department of Epigenetics and Molecular Carcinogenesis, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
- Department of Bioinformatics and Computational Biology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
- Program in Quantitative and Computational Biosciences, Baylor College of Medicine, Houston, TX, USA
| | - Richard W Kriwacki
- Department of Structural Biology, St. Jude Children's Research Hospital, Memphis, TN, USA.
- Department of Microbiology, Immunology and Biochemistry, University of Tennessee Health Sciences Center, Memphis, TN, USA.
| |
Collapse
|
2
|
Whole-Genome Sequence Analysis of an Endophytic Fungus Alternaria sp. SPS-2 and Its Biosynthetic Potential of Bioactive Secondary Metabolites. Microorganisms 2022; 10:microorganisms10091789. [PMID: 36144391 PMCID: PMC9503250 DOI: 10.3390/microorganisms10091789] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2022] [Revised: 08/26/2022] [Accepted: 08/31/2022] [Indexed: 11/25/2022] Open
Abstract
As one of the commonly isolated endophytic fungi, Alternaria has been known for the production of numerous secondary metabolites (SMs). However, its detailed genomic features and SM biosynthetic potential have not been extensively studied thus far. The present work focuses on the whole-genome sequencing and assembly of an endophytic strain Alternaria sp. SPS-2 derived from Echrysantha chrysantha Lindl. and gene annotation using various bioinformatic tools. The results of this study suggested that the genome of strain SPS-2 was 33.4 Mb in size with a GC content of 51% and an N50 scaffold of 2.6 Mb, and 9789 protein-coding genes, including 644 CAZyme-encoding genes, were discovered in strain SPS-2 through KEGG enrichment analysis. The antiSMASH results indicated that strain SPS-2 harbored 22 SM biosynthetic gene clusters (BGCs), 14 of which are cryptic and unknown. LS–MS/MS and GNPS-based analyses suggested that this endophytic fungus is a potential producer of bioactive SMs and merits further exploration and development.
Collapse
|
3
|
Porphyromonas gingivalis resistance and virulence: An integrated functional network analysis. Gene 2022; 839:146734. [PMID: 35835406 DOI: 10.1016/j.gene.2022.146734] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2022] [Revised: 06/23/2022] [Accepted: 07/08/2022] [Indexed: 11/23/2022]
Abstract
BACKGROUND The gram-negative bacteria Porphyromonas gingivalis (PG) is the most prevalent cause of periodontal diseases and multidrug-resistant (MDR) infections. Periodontitis and MDR infections are severe due to PG's ability to efflux antimicrobial and virulence factors. This gives rise to colonisation, biofilm development, evasion, and modulation of the host defence system. Despite extensive studies on the MDR efflux pump in other pathogens, little is known about the efflux pump and its association with the virulence factor in PG. Prolonged infection of PG leads to complete loss of teeth and other systemic diseases. This necessitates the development of new therapeutic interventions to prevent and control MDR. OBJECTIVE The study aims to identify the most indispensable proteins that regulate both resistance and virulence in PG, which could therefore be used as a target to fight against the MDR threat to antibiotics. METHODS We have adopted a hierarchical network-based approach to construct a protein interaction network. Firstly, individual networks of four major efflux pump proteins and two virulence regulatory proteins were constructed, followed by integrating them into one. The relationship between proteins was investigated using a combination of centrality scores, k-core network decomposition, and functional annotation, to computationally identify the indispensable proteins. RESULTS Our study identified four topologically significant genes, PG_0538, PG_0539, PG_0285, and PG_1797, as potential pharmacological targets. PG_0539 and PG_1797 were identified to have significant associations between the efflux pump and virulence genes. This type of underpinning research may help in narrowing the drug spectrum used for treating periodontal diseases, and may also be exploited to look into antibiotic resistance and pathogenicity in bacteria other than PG.
Collapse
|
4
|
Structure-Aware Mycobacterium tuberculosis Functional Annotation Uncloaks Resistance, Metabolic, and Virulence Genes. mSystems 2021; 6:e0067321. [PMID: 34726489 PMCID: PMC8562490 DOI: 10.1128/msystems.00673-21] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022] Open
Abstract
Accurate and timely functional genome annotation is essential for translating basic pathogen research into clinically impactful advances. Here, through literature curation and structure-function inference, we systematically update the functional genome annotation of Mycobacterium tuberculosis virulent type strain H37Rv. First, we systematically curated annotations for 589 genes from 662 publications, including 282 gene products absent from leading databases. Second, we modeled 1,711 underannotated proteins and developed a semiautomated pipeline that captured shared function between 400 protein models and structural matches of known function on Protein Data Bank, including drug efflux proteins, metabolic enzymes, and virulence factors. In aggregate, these structure- and literature-derived annotations update 940/1,725 underannotated H37Rv genes and generate hundreds of functional hypotheses. Retrospectively applying the annotation to a recent whole-genome transposon mutant screen provided missing function for 48% (13/27) of underannotated genes altering antibiotic efficacy and 33% (23/69) required for persistence during mouse tuberculosis (TB) infection. Prospective application of the protein models enabled us to functionally interpret novel laboratory generated pyrazinamide (PZA)-resistant mutants of unknown function, which implicated the emerging coenzyme A depletion model of PZA action in the mutants’ PZA resistance. Our findings demonstrate the functional insight gained by integrating structural modeling and systematic literature curation, even for widely studied microorganisms. Functional annotations and protein structure models are available at https://tuberculosis.sdsu.edu/H37Rv in human- and machine-readable formats. IMPORTANCEMycobacterium tuberculosis, the primary causative agent of tuberculosis, kills more humans than any other infectious bacterium. Yet 40% of its genome is functionally uncharacterized, leaving much about the genetic basis of its resistance to antibiotics, capacity to withstand host immunity, and basic metabolism yet undiscovered. Irregular literature curation for functional annotation contributes to this gap. We systematically curated functions from literature and structural similarity for over half of poorly characterized genes, expanding the functionally annotated Mycobacterium tuberculosis proteome. Applying this updated annotation to recent in vivo functional screens added functional information to dozens of clinically pertinent proteins described as having unknown function. Integrating the annotations with a prospective functional screen identified new mutants resistant to a first-line TB drug, supporting an emerging hypothesis for its mode of action. These improvements in functional interpretation of clinically informative studies underscore the translational value of this functional knowledge. Structure-derived annotations identify hundreds of high-confidence candidates for mechanisms of antibiotic resistance, virulence factors, and basic metabolism and other functions key in clinical and basic tuberculosis research. More broadly, they provide a systematic framework for improving prokaryotic reference annotations.
Collapse
|
5
|
Bayer K, Busch K, Kenchington E, Beazley L, Franzenburg S, Michels J, Hentschel U, Slaby BM. Microbial Strategies for Survival in the Glass Sponge Vazella pourtalesii. mSystems 2020; 5:e00473-20. [PMID: 32788407 PMCID: PMC7426153 DOI: 10.1128/msystems.00473-20] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2020] [Accepted: 07/23/2020] [Indexed: 01/04/2023] Open
Abstract
Few studies have explored the microbiomes of glass sponges (Hexactinellida). The present study seeks to elucidate the composition of the microbiota associated with the glass sponge Vazella pourtalesii and the functional strategies of the main symbionts. We combined microscopic approaches with metagenome-guided microbial genome reconstruction and amplicon community profiling toward this goal. Microscopic imaging revealed that the host and microbial cells appeared within dense biomass patches that are presumably syncytial tissue aggregates. Based on abundances in amplicon libraries and metagenomic data, SAR324 bacteria, Crenarchaeota, Patescibacteria, and Nanoarchaeota were identified as abundant members of the V. pourtalesii microbiome; thus, their genomic potentials were analyzed in detail. A general pattern emerged in that the V. pourtalesii symbionts had very small genome sizes, in the range of 0.5 to 2.2 Mb, and low GC contents, even below those of seawater relatives. Based on functional analyses of metagenome-assembled genomes (MAGs), we propose two major microbial strategies: the "givers," namely, Crenarchaeota and SAR324, heterotrophs and facultative anaerobes, produce and partly secrete all required amino acids and vitamins. The "takers," Nanoarchaeota and Patescibacteria, are anaerobes with reduced genomes that tap into the microbial community for resources, e.g., lipids and DNA, likely using pilus-like structures. We posit that the existence of microbial cells in sponge syncytia together with the low-oxygen conditions in the seawater environment are factors that shape the unique compositional and functional properties of the microbial community associated with V. pourtalesii IMPORTANCE We investigated the microbial community of V. pourtalesii that forms globally unique, monospecific sponge grounds under low-oxygen conditions on the Scotian Shelf, where it plays a key role in its vulnerable ecosystem. The microbial community was found to be concentrated within biomass patches and is dominated by small cells (<1 μm). MAG analyses showed consistently small genome sizes and low GC contents, which is unusual compared to known sponge symbionts. These properties, as well as the (facultatively) anaerobic metabolism and a high degree of interdependence between the dominant symbionts regarding amino acid and vitamin synthesis, are likely adaptations to the unique conditions within the syncytial tissue of their hexactinellid host and the low-oxygen environment.
Collapse
Affiliation(s)
- Kristina Bayer
- GEOMAR Helmholtz Centre for Ocean Research Kiel, RD3 Marine Symbioses, Kiel, Germany
| | - Kathrin Busch
- GEOMAR Helmholtz Centre for Ocean Research Kiel, RD3 Marine Symbioses, Kiel, Germany
| | - Ellen Kenchington
- Department of Fisheries and Oceans, Bedford Institute of Oceanography, Dartmouth, Nova Scotia, Canada
| | - Lindsay Beazley
- Department of Fisheries and Oceans, Bedford Institute of Oceanography, Dartmouth, Nova Scotia, Canada
| | - Sören Franzenburg
- Institute for Clinical Molecular Biology, Kiel University, Kiel, Germany
| | - Jan Michels
- Functional Morphology and Biomechanics, Institute of Zoology, Kiel University, Kiel, Germany
| | - Ute Hentschel
- GEOMAR Helmholtz Centre for Ocean Research Kiel, RD3 Marine Symbioses, Kiel, Germany
- Kiel University, Kiel, Germany
| | - Beate M Slaby
- GEOMAR Helmholtz Centre for Ocean Research Kiel, RD3 Marine Symbioses, Kiel, Germany
| |
Collapse
|
6
|
Koo DCE, Bonneau R. Towards region-specific propagation of protein functions. Bioinformatics 2020; 35:1737-1744. [PMID: 30304483 PMCID: PMC6513163 DOI: 10.1093/bioinformatics/bty834] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2018] [Revised: 08/23/2018] [Accepted: 10/08/2018] [Indexed: 01/06/2023] Open
Abstract
MOTIVATION Due to the nature of experimental annotation, most protein function prediction methods operate at the protein-level, where functions are assigned to full-length proteins based on overall similarities. However, most proteins function by interacting with other proteins or molecules, and many functional associations should be limited to specific regions rather than the entire protein length. Most domain-centric function prediction methods depend on accurate domain family assignments to infer relationships between domains and functions, with regions that are unassigned to a known domain-family left out of functional evaluation. Given the abundance of residue-level annotations currently available, we present a function prediction methodology that automatically infers function labels of specific protein regions using protein-level annotations and multiple types of region-specific features. RESULTS We apply this method to local features obtained from InterPro, UniProtKB and amino acid sequences and show that this method improves both the accuracy and region-specificity of protein function transfer and prediction. We compare region-level predictive performance of our method against that of a whole-protein baseline method using proteins with structurally verified binding sites and also compare protein-level temporal holdout predictive performances to expand the variety and specificity of GO terms we could evaluate. Our results can also serve as a starting point to categorize GO terms into region-specific and whole-protein terms and select prediction methods for different classes of GO terms. AVAILABILITY AND IMPLEMENTATION The code and features are freely available at: https://github.com/ek1203/rsfp. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Da Chen Emily Koo
- Department of Biology, Center for Genomics and Systems Biology, New York University, New York, NY, USA
| | - Richard Bonneau
- Department of Biology, Center for Genomics and Systems Biology, New York University, New York, NY, USA.,Center for Computational Biology, Flatiron Institute, Simons Foundation, New York, NY, USA.,Center for Data Science, New York University, New York, NY, USA
| |
Collapse
|
7
|
Halima NB. Analysis of glycoside hydrolases from oat (Avena sativa) seedling extract. Bioinformation 2019; 15:678-688. [PMID: 31787817 PMCID: PMC6859709 DOI: 10.6026/97320630015678] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2019] [Revised: 10/07/2019] [Accepted: 10/12/2019] [Indexed: 11/23/2022] Open
Abstract
The abundance and the diversity of oligo- and polysaccharides provide a wide range of biological roles attributed either to these carbohydrates or to their relevant enzymes, i.e., the glycoside hydrolases (GHs). The biocatalysis by these families of enzymes is highly attractive for the generation of products used in potential applications, e.g., pharmaceuticals and food industries. It is thus very important to extract and characterize such enzymes, particularly from plant tissues. In this study, we characterized novel sequences of class I chitinases from seedlings extract of the common oat (Avena sativa L.) using proteomics and sequence-structure-function analysis. These enzymes, which belong to the GH19 family of protein, were extracted from oat and identified using SDS-PAGE, trypsin digestion, LC-MS-MS, and sequence-structure-function analysis. The amino acid sequences of the oat tryptic peptides were used to identify cDNAs from the Avena sativa databases of the expressed sequence tags (ESTs) and transcriptome shotgun assembly (TSA). Based upon the Avena sativa sequences of ESTs and TSA, at least 4 predicted genes that encoded oat class I chitinases were identified and reported. The structural characterization of the oat sequences of chitinases provided valuable insights to the context.
Collapse
|
8
|
Mitchell AL, Attwood TK, Babbitt PC, Blum M, Bork P, Bridge A, Brown SD, Chang HY, El-Gebali S, Fraser MI, Gough J, Haft DR, Huang H, Letunic I, Lopez R, Luciani A, Madeira F, Marchler-Bauer A, Mi H, Natale DA, Necci M, Nuka G, Orengo C, Pandurangan AP, Paysan-Lafosse T, Pesseat S, Potter SC, Qureshi MA, Rawlings ND, Redaschi N, Richardson LJ, Rivoire C, Salazar GA, Sangrador-Vegas A, Sigrist CJ, Sillitoe I, Sutton GG, Thanki N, Thomas PD, Tosatto SC, Yong SY, Finn RD. InterPro in 2019: improving coverage, classification and access to protein sequence annotations. Nucleic Acids Res 2019; 47:D351-D360. [PMID: 30398656 PMCID: PMC6323941 DOI: 10.1093/nar/gky1100] [Citation(s) in RCA: 980] [Impact Index Per Article: 196.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2018] [Revised: 10/19/2018] [Accepted: 10/22/2018] [Indexed: 12/15/2022] Open
Abstract
The InterPro database (http://www.ebi.ac.uk/interpro/) classifies protein sequences into families and predicts the presence of functionally important domains and sites. Here, we report recent developments with InterPro (version 70.0) and its associated software, including an 18% growth in the size of the database in terms on new InterPro entries, updates to content, the inclusion of an additional entry type, refined modelling of discontinuous domains, and the development of a new programmatic interface and website. These developments extend and enrich the information provided by InterPro, and provide greater flexibility in terms of data access. We also show that InterPro's sequence coverage has kept pace with the growth of UniProtKB, and discuss how our evaluation of residue coverage may help guide future curation activities.
Collapse
Affiliation(s)
- Alex L Mitchell
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Teresa K Attwood
- School of Computer Science, The University of Manchester, Manchester M13 9PL, UK
| | - Patricia C Babbitt
- Department of Bioengineering & Therapeutic Sciences, University of California, San Francisco, CA 94158, USA
| | - Matthias Blum
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Peer Bork
- European Molecular Biology Laboratory, Structural and Computational Biology Unit, Meyerhofstr.1, 69117 Heidelberg, Germany
| | - Alan Bridge
- Swiss-Prot Group, SIB Swiss Institute of Bioinformatics, CMU, 1 rue Michel-Servet, CH-1211 Geneva 4, Switzerland
| | - Shoshana D Brown
- Department of Bioengineering & Therapeutic Sciences, University of California, San Francisco, CA 94158, USA
| | - Hsin-Yu Chang
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Sara El-Gebali
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Matthew I Fraser
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Julian Gough
- Medical Research Council Laboratory of Molecular Biology, Francis Crick Avenue, Cambridge Biomedical Campus, Cambridge CB2 0QH, UK
| | - David R Haft
- J. Craig Venter Institute (JCVI), 9605 Medical Center Drive, Suite 150, Rockville, MD 20850, USA
| | - Hongzhan Huang
- Center for Bioinformatics and Computational Biology, University of Delaware, Newark, DE, USA
| | - Ivica Letunic
- Biobyte Solutions GmbH, Bothestr 142, 69126 Heidelberg, Germany
| | - Rodrigo Lopez
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Aurélien Luciani
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Fabio Madeira
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Aron Marchler-Bauer
- National Center for Biotechnology Information, National Library of Medicine, NIH Bldg, 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Huaiyu Mi
- Division of Bioinformatics, Department of Preventive Medicine, University of Southern California, Los Angeles, CA 90033, USA
| | - Darren A Natale
- Protein Information Resource, Georgetown University Medical Center, Washington, DC, USA
| | - Marco Necci
- Department of Biomedical Sciences, University of Padua, via U. Bassi 58/b, 35131 Padua, Italy
- Department of Agricultural Sciences, University of Udine, via Palladio 8, 33100 Udine, Italy
- Fondazione Edmund Mach, Via E. Mach 1, 38010 S. Michele all’Adige, Italy
| | - Gift Nuka
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Christine Orengo
- Structural and Molecular Biology, University College London, Darwin Building, London WC1E 6BT, UK
| | - Arun P Pandurangan
- Medical Research Council Laboratory of Molecular Biology, Francis Crick Avenue, Cambridge Biomedical Campus, Cambridge CB2 0QH, UK
| | - Typhaine Paysan-Lafosse
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Sebastien Pesseat
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Simon C Potter
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Matloob A Qureshi
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Neil D Rawlings
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Nicole Redaschi
- Swiss-Prot Group, SIB Swiss Institute of Bioinformatics, CMU, 1 rue Michel-Servet, CH-1211 Geneva 4, Switzerland
| | - Lorna J Richardson
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Catherine Rivoire
- Swiss-Prot Group, SIB Swiss Institute of Bioinformatics, CMU, 1 rue Michel-Servet, CH-1211 Geneva 4, Switzerland
| | - Gustavo A Salazar
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Amaia Sangrador-Vegas
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Christian J A Sigrist
- Swiss-Prot Group, SIB Swiss Institute of Bioinformatics, CMU, 1 rue Michel-Servet, CH-1211 Geneva 4, Switzerland
| | - Ian Sillitoe
- Structural and Molecular Biology, University College London, Darwin Building, London WC1E 6BT, UK
| | - Granger G Sutton
- J. Craig Venter Institute (JCVI), 9605 Medical Center Drive, Suite 150, Rockville, MD 20850, USA
| | - Narmada Thanki
- National Center for Biotechnology Information, National Library of Medicine, NIH Bldg, 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Paul D Thomas
- Division of Bioinformatics, Department of Preventive Medicine, University of Southern California, Los Angeles, CA 90033, USA
| | - Silvio C E Tosatto
- Department of Biomedical Sciences, University of Padua, via U. Bassi 58/b, 35131 Padua, Italy
| | - Siew-Yit Yong
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Robert D Finn
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| |
Collapse
|
9
|
Alborzi SZ, Ritchie DW, Devignes MD. Computational discovery of direct associations between GO terms and protein domains. BMC Bioinformatics 2018; 19:413. [PMID: 30453875 PMCID: PMC6245584 DOI: 10.1186/s12859-018-2380-2] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open
Abstract
BACKGROUND Families of related proteins and their different functions may be described systematically using common classifications and ontologies such as Pfam and GO (Gene Ontology), for example. However, many proteins consist of multiple domains, and each domain, or some combination of domains, can be responsible for a particular molecular function. Therefore, identifying which domains should be associated with a specific function is a non-trivial task. RESULTS We describe a general approach for the computational discovery of associations between different sets of annotations by formalising the problem as a bipartite graph enrichment problem in the setting of a tripartite graph. We call this approach "CODAC" (for COmputational Discovery of Direct Associations using Common Neighbours). As one application of this approach, we describe "GODomainMiner" for associating GO terms with protein domains. We used GODomainMiner to predict GO-domain associations between each of the 3 GO ontology namespaces (MF, BP, and CC) and the Pfam, CATH, and SCOP domain classifications. Overall, GODomainMiner yields average enrichments of 15-, 41- and 25-fold GO-domain associations compared to the existing GO annotations in these 3 domain classifications, respectively. CONCLUSIONS These associations could potentially be used to annotate many of the protein chains in the Protein Databank and protein sequences in UniProt whose domain composition is known but which currently lack GO annotation.
Collapse
Affiliation(s)
| | - David W Ritchie
- Université de Lorraine, CNRS, Inria, LORIA, Nancy, F-54500, France
| | | |
Collapse
|
10
|
Zavadil Kokáš F, Bergougnoux V, Majeská Čudejková M. SATrans: New Free Available Software for Annotation of Transcriptome and Functional Analysis of Differentially Expressed Genes. J Comput Biol 2018; 26:117-123. [PMID: 30328709 DOI: 10.1089/cmb.2018.0149] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Recent technological advances have made next-generation sequencing (NGS) a popular and financially accessible technique allowing a broad range of analyses to be done simultaneously. A huge amount of newly generated NGS data, however, require advanced software support to help both in analyzing the data and biologically interpreting the results. In this article, we describe SATrans (Software for Annotation of Transcriptome), a software package providing fast and robust functional annotation of novel sequences obtained from transcriptome sequencing. Moreover, it performs advanced gene ontology analysis of differentially expressed genes, thereby helping to interpret biologically-and in a user-friendly form-the quantitative changes in gene expression. The software is freely available and provides the possibility to work with thousands of sequences using a standard personal computer or notebook running on the Linux operating system.
Collapse
Affiliation(s)
- Filip Zavadil Kokáš
- Department of Molecular Biology, Centre of Region Haná for Biotechnological and Agricultural Research, Palacký University in Olomouc, Olomouc, Czech Republic
| | - Véronique Bergougnoux
- Department of Molecular Biology, Centre of Region Haná for Biotechnological and Agricultural Research, Palacký University in Olomouc, Olomouc, Czech Republic
| | - Mária Majeská Čudejková
- Department of Molecular Biology, Centre of Region Haná for Biotechnological and Agricultural Research, Palacký University in Olomouc, Olomouc, Czech Republic
| |
Collapse
|
11
|
Sheng CW, Jia ZQ, Ozoe Y, Huang QT, Han ZJ, Zhao CQ. Molecular cloning, spatiotemporal and functional expression of GABA receptor subunits RDL1 and RDL2 of the rice stem borer Chilo suppressalis. INSECT BIOCHEMISTRY AND MOLECULAR BIOLOGY 2018; 94:18-27. [PMID: 29408355 DOI: 10.1016/j.ibmb.2018.01.003] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/07/2017] [Revised: 01/19/2018] [Accepted: 01/22/2018] [Indexed: 06/07/2023]
Abstract
Insect γ-aminobutyric acid (GABA) receptor (GABAR) is one of the major targets of insecticides. In the present study, cDNAs (CsRDL1A and CsRDL2S) encoding the two isoforms of RDL subunits were cloned from the rice stem borer Chilo suppressalis. Transcripts of both genes demonstrated similar expression patterns in different tissues and developmental stages, although CsRDL2S was ∼2-fold more abundant than CsRDL1A throughout all development stages. To investigate the function of channels formed by CsRDL subunits, both genes were expressed in Xenopus laevis oocytes singly or in combination in different ratios. Electrophysiological results using a two-electrode voltage clamp demonstrated that GABA activated currents in oocytes injected with both cRNAs. The EC50 value of GABA in activating currents was smaller in oocytes co-injected with CsRDL1A and CsRDL2S than in oocytes injected singly. The IC50 value of the insecticide fluralaner in inhibiting GABA responses was smaller in oocytes co-injected with different cRNAs than in oocytes injected singly. Co-injection also changed the potency of the insecticide dieldrin in oocytes injected singly. These findings suggested that heteromeric GABARs were formed by the co-injections of CsRDL1A and CsRDL2S in oocytes. Although the presence of Ser at the 2'-position in the second transmembrane segment was responsible for the insensitivity of GABARs to dieldrin, this amino acid did not affect the potencies of the insecticides fipronil and fluralaner. These results lead us to hypothesize that C. suppressalis may adapt to insecticide pressure by regulating the expression levels of CsRDL1A and CsRDL2S and the composition of both subunits in GABARs.
Collapse
Affiliation(s)
- Cheng-Wang Sheng
- Department of Entomology, College of Plant Protection, Nanjing Agricultural University, Nanjing, 210095, Jiangsu, China
| | - Zhong-Qiang Jia
- Department of Entomology, College of Plant Protection, Nanjing Agricultural University, Nanjing, 210095, Jiangsu, China
| | - Yoshihisa Ozoe
- Faculty of Life and Environmental Science, Shimane University, Matsue, Shimane, 690-8504, Japan
| | - Qiu-Tang Huang
- Department of Entomology, College of Plant Protection, Nanjing Agricultural University, Nanjing, 210095, Jiangsu, China
| | - Zhao-Jun Han
- Department of Entomology, College of Plant Protection, Nanjing Agricultural University, Nanjing, 210095, Jiangsu, China
| | - Chun-Qing Zhao
- Department of Entomology, College of Plant Protection, Nanjing Agricultural University, Nanjing, 210095, Jiangsu, China.
| |
Collapse
|
12
|
Yang CJ, Chiang JH. Gene ontology concept recognition using named concept: understanding the various presentations of the gene functions in biomedical literature. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2018; 2018:5146423. [PMID: 30376045 PMCID: PMC6204799 DOI: 10.1093/database/bay115] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/21/2018] [Accepted: 09/29/2018] [Indexed: 11/14/2022]
Abstract
OBJECTIVE A major challenge in precision medicine is the development of patient-specific genetic biomarkers or drug targets. The firsthand information of the genes associated with the pathologic pathways of interest is buried in the ocean of biomedical literature. Gene ontology concept recognition (GOCR) is a biomedical natural language processing task used to extract and normalize the mentions of gene ontology (GO), the controlled vocabulary for gene functions across many species, from biomedical text. The previous GOCR systems, using either rule-based or machine-learning methods, treated GO concepts as separate terms and did not have an efficient way of sharing the common synonyms among the concepts. MATERIALS AND METHODS We used the CRAFT corpus in this study. Targeting the compositional structure of the GO, we introduced named concept, the basic conceptual unit which has a conserved name and is used in other complex concepts. Using the named concepts, we separated the GOCR task into dictionary-matching and machine-learning steps. By harvesting the surface names used in the training data, we wildly boosted the synonyms of GO concepts via the connection of the named concepts and then enhanced the capability to recognize more GO concepts in the text. The source code is available athttps://github.com/jeroyang/ncgocr. RESULTS Named concept gene ontology concept recognizer (NCGOCR) achieved 0.804 precision and 0.715 recall by correct recognition of the non-standard mentions of the GO concepts. DISCUSSION The lack of consensus on GO naming causes diversity in the GO mentions in biomedical manuscripts. The high performance is owed to the stability of the composing GO concepts and the lack of variance in the spelling of named concepts. CONCLUSION NCGOCR reduced the arduous work of GO annotation and amended the process of searching for the biomarkers or drug targets, leading to improved biomarker development and greater success in precision medicine.
Collapse
Affiliation(s)
- Chia-Jung Yang
- Department of Computer Science and Information Engineering, National Cheng Kung University, 1, University Road, Tainan City, Taiwan.,Department of Radiology, Taitung Mackay Memorial Hospital, 1, Lane 303, Changsha Street, Taitung City, Taiwan
| | - Jung-Hsien Chiang
- Department of Computer Science and Information Engineering, National Cheng Kung University, 1, University Road, Tainan City, Taiwan
| |
Collapse
|
13
|
Moskalev AА, Kudryavtseva AV, Graphodatsky AS, Beklemisheva VR, Serdyukova NA, Krutovsky KV, Sharov VV, Kulakovskiy IV, Lando AS, Kasianov AS, Kuzmin DA, Putintseva YA, Feranchuk SI, Shaposhnikov MV, Fraifeld VE, Toren D, Snezhkina AV, Sitnik VV. De novo assembling and primary analysis of genome and transcriptome of gray whale Eschrichtius robustus. BMC Evol Biol 2017; 17:258. [PMID: 29297306 PMCID: PMC5751776 DOI: 10.1186/s12862-017-1103-z] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022] Open
Abstract
Background Gray whale, Eschrichtius robustus (E. robustus), is a single member of the family Eschrichtiidae, which is considered to be the most primitive in the class Cetacea. Gray whale is often described as a “living fossil”. It is adapted to extreme marine conditions and has a high life expectancy (77 years). The assembly of a gray whale genome and transcriptome will allow to carry out further studies of whale evolution, longevity, and resistance to extreme environment. Results In this work, we report the first de novo assembly and primary analysis of the E. robustus genome and transcriptome based on kidney and liver samples. The presented draft genome assembly is complete by 55% in terms of a total genome length, but only by 24% in terms of the BUSCO complete gene groups, although 10,895 genes were identified. Transcriptome annotation and comparison with other whale species revealed robust expression of DNA repair and hypoxia-response genes, which is expected for whales. Conclusions This preliminary study of the gray whale genome and transcriptome provides new data to better understand the whale evolution and the mechanisms of their adaptation to the hypoxic conditions. Electronic supplementary material The online version of this article (doi: 10.1186/s12862-017-1103-z) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Alexey А Moskalev
- Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, Moscow, 119991, Russian Federation. .,Institute of Biology of Komi Science Center of Ural Branch of RAS, Syktyvkar, 167982, Russian Federation.
| | - Anna V Kudryavtseva
- Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, Moscow, 119991, Russian Federation
| | - Alexander S Graphodatsky
- Institute of Molecular and Cellular Biology SB RAS, Novosibirsk, 630090, Russian Federation.,Novosibirsk State University, Novosibirsk, 630090, Russian Federation
| | | | - Natalya A Serdyukova
- Institute of Molecular and Cellular Biology SB RAS, Novosibirsk, 630090, Russian Federation
| | - Konstantin V Krutovsky
- Department of Forest Genetics and Forest Tree Breeding, Georg-August University of Göttingen, Göttingen, 37077, Germany.,Vavilov Institute of General Genetics, Russian Academy of Sciences, Moscow, 119991, Russian Federation.,Genome Research and Education Center, Siberian Federal University, Krasnoyarsk, 660036, Russian Federation.,Department of Ecosystem Science and Management, Texas A&M University, College Station, 77843-2138, TX, USA
| | - Vadim V Sharov
- Genome Research and Education Center, Siberian Federal University, Krasnoyarsk, 660036, Russian Federation.,Department of High Performance Computing, Institute of Space and Information Technologies, Siberian Federal University, Krasnoyarsk, 660074, Russian Federation
| | - Ivan V Kulakovskiy
- Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, Moscow, 119991, Russian Federation.,Vavilov Institute of General Genetics, Russian Academy of Sciences, Moscow, 119991, Russian Federation.,Center for Data-Intensive Biomedicine and Biotechnology, Skolkovo Institute of Science and Technology, Moscow, 143026, Russia
| | - Andrey S Lando
- Vavilov Institute of General Genetics, Russian Academy of Sciences, Moscow, 119991, Russian Federation
| | - Artem S Kasianov
- Vavilov Institute of General Genetics, Russian Academy of Sciences, Moscow, 119991, Russian Federation.,Center for Data-Intensive Biomedicine and Biotechnology, Skolkovo Institute of Science and Technology, Moscow, 143026, Russia
| | - Dmitry A Kuzmin
- Genome Research and Education Center, Siberian Federal University, Krasnoyarsk, 660036, Russian Federation.,Department of High Performance Computing, Institute of Space and Information Technologies, Siberian Federal University, Krasnoyarsk, 660074, Russian Federation
| | - Yuliya A Putintseva
- Genome Research and Education Center, Siberian Federal University, Krasnoyarsk, 660036, Russian Federation
| | - Sergey I Feranchuk
- Genome Research and Education Center, Siberian Federal University, Krasnoyarsk, 660036, Russian Federation.,Irkutsk National Research Technical University, Irkutsk, 664074, Russian Federation.,Limnological Institute, Siberian Branch of Russian Academy of Sciences, Irkutsk, 664033, Russian Federation
| | - Mikhail V Shaposhnikov
- Institute of Biology of Komi Science Center of Ural Branch of RAS, Syktyvkar, 167982, Russian Federation
| | - Vadim E Fraifeld
- The Shraga Segal Department of Microbiology, Immunology and Genetics, Faculty of Health Sciences, Center for Multidisciplinary Research on Aging, Ben-Gurion University of the Negev, Beer-Sheva, 8410501, Israel
| | - Dmitri Toren
- The Shraga Segal Department of Microbiology, Immunology and Genetics, Faculty of Health Sciences, Center for Multidisciplinary Research on Aging, Ben-Gurion University of the Negev, Beer-Sheva, 8410501, Israel
| | - Anastasia V Snezhkina
- Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, Moscow, 119991, Russian Federation
| | - Vasily V Sitnik
- Center for Data-Intensive Biomedicine and Biotechnology, Skolkovo Institute of Science and Technology, Moscow, 143026, Russia
| |
Collapse
|
14
|
Huang J, Vendramin S, Shi L, McGinnis KM. Construction and Optimization of a Large Gene Coexpression Network in Maize Using RNA-Seq Data. PLANT PHYSIOLOGY 2017; 175:568-583. [PMID: 28768814 PMCID: PMC5580776 DOI: 10.1104/pp.17.00825] [Citation(s) in RCA: 36] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/19/2017] [Accepted: 07/31/2017] [Indexed: 05/22/2023]
Abstract
With the emergence of massively parallel sequencing, genomewide expression data production has reached an unprecedented level. This abundance of data has greatly facilitated maize research, but may not be amenable to traditional analysis techniques that were optimized for other data types. Using publicly available data, a gene coexpression network (GCN) can be constructed and used for gene function prediction, candidate gene selection, and improving understanding of regulatory pathways. Several GCN studies have been done in maize (Zea mays), mostly using microarray datasets. To build an optimal GCN from plant materials RNA-Seq data, parameters for expression data normalization and network inference were evaluated. A comprehensive evaluation of these two parameters and a ranked aggregation strategy on network performance, using libraries from 1266 maize samples, were conducted. Three normalization methods and 10 inference methods, including six correlation and four mutual information methods, were tested. The three normalization methods had very similar performance. For network inference, correlation methods performed better than mutual information methods at some genes. Increasing sample size also had a positive effect on GCN. Aggregating single networks together resulted in improved performance compared to single networks.
Collapse
Affiliation(s)
- Ji Huang
- Department of Biological Science, Florida State University, Tallahassee, Florida 32306
| | - Stefania Vendramin
- Department of Biological Science, Florida State University, Tallahassee, Florida 32306
| | - Lizhen Shi
- Department of Computer Science, Florida State University, Tallahassee, Florida 32306
| | - Karen M McGinnis
- Department of Biological Science, Florida State University, Tallahassee, Florida 32306
| |
Collapse
|
15
|
Finn RD, Attwood TK, Babbitt PC, Bateman A, Bork P, Bridge AJ, Chang HY, Dosztányi Z, El-Gebali S, Fraser M, Gough J, Haft D, Holliday GL, Huang H, Huang X, Letunic I, Lopez R, Lu S, Marchler-Bauer A, Mi H, Mistry J, Natale DA, Necci M, Nuka G, Orengo CA, Park Y, Pesseat S, Piovesan D, Potter SC, Rawlings ND, Redaschi N, Richardson L, Rivoire C, Sangrador-Vegas A, Sigrist C, Sillitoe I, Smithers B, Squizzato S, Sutton G, Thanki N, Thomas PD, Tosatto SCE, Wu CH, Xenarios I, Yeh LS, Young SY, Mitchell AL. InterPro in 2017-beyond protein family and domain annotations. Nucleic Acids Res 2016; 45:D190-D199. [PMID: 27899635 PMCID: PMC5210578 DOI: 10.1093/nar/gkw1107] [Citation(s) in RCA: 1016] [Impact Index Per Article: 127.0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2016] [Accepted: 10/27/2016] [Indexed: 02/07/2023] Open
Abstract
InterPro (http://www.ebi.ac.uk/interpro/) is a freely available database used to classify protein sequences into families and to predict the presence of important domains and sites. InterProScan is the underlying software that allows both protein and nucleic acid sequences to be searched against InterPro's predictive models, which are provided by its member databases. Here, we report recent developments with InterPro and its associated software, including the addition of two new databases (SFLD and CDD), and the functionality to include residue-level annotation and prediction of intrinsic disorder. These developments enrich the annotations provided by InterPro, increase the overall number of residues annotated and allow more specific functional inferences.
Collapse
Affiliation(s)
- Robert D Finn
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | | | - Patricia C Babbitt
- Department of Bioengineering & Therapeutic Sciences, University of California, San Francisco, CA 94143, USA
| | - Alex Bateman
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Peer Bork
- European Molecular Biology Laboratory, Biocomputing, Meyerhofstasse 1, 69117 Heidelberg, Germany
| | - Alan J Bridge
- Swiss-Prot Group, SIB Swiss Institute of Bioinformatics, CMU, 1 rue Michel-Servet, CH-1211 Geneva 4, Switzerland
| | - Hsin-Yu Chang
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Zsuzsanna Dosztányi
- MTA-ELTE Lendület Bioinformatics Research Group, Department of Biochemistry, Eötvös Loránd University, Pázmány Péter sétány 1/c, Budapest, Hungary
| | - Sara El-Gebali
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Matthew Fraser
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Julian Gough
- Computer Science department, University of Bristol, Woodland Road, Bristol BS8 1UB, UK
| | - David Haft
- Bioinformatics Department, J. Craig Venter Institute, 9714 Medical Center Drive, Rockville, MD 20850, USA
| | - Gemma L Holliday
- Department of Bioengineering & Therapeutic Sciences, University of California, San Francisco, CA 94143, USA
| | - Hongzhan Huang
- Center for Bioinformatics and Computational Biology, University of Delaware, Newark, DE 19711, USA
| | - Xiaosong Huang
- Division of Bioinformatics, Department of Preventive Medicine, University of Southern California, Los Angeles, CA 90033, USA
| | - Ivica Letunic
- Biobyte Solutions GmbH, Bothestr. 142, 69126 Heidelberg, Germany
| | - Rodrigo Lopez
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Shennan Lu
- National Center for Biotechnology Information, National Library of Medicine, NIH Bldg, 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Aron Marchler-Bauer
- National Center for Biotechnology Information, National Library of Medicine, NIH Bldg, 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Huaiyu Mi
- Division of Bioinformatics, Department of Preventive Medicine, University of Southern California, Los Angeles, CA 90033, USA
| | - Jaina Mistry
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Darren A Natale
- Georgetown University Medical Center, 3300 Whitehaven St, NW, Washington, DC 20007, USA
| | - Marco Necci
- Department of Biomedical Sciences and CRIBI Biotech Center, University of Padua, via U. Bassi 58/b, 35131 Padua, Italy
| | - Gift Nuka
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Christine A Orengo
- Structural and Molecular Biology, University College London, Darwin Building, London WC1E 6BT, UK
| | - Youngmi Park
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Sebastien Pesseat
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Damiano Piovesan
- Department of Biomedical Sciences and CRIBI Biotech Center, University of Padua, via U. Bassi 58/b, 35131 Padua, Italy
| | - Simon C Potter
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Neil D Rawlings
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Nicole Redaschi
- Swiss-Prot Group, SIB Swiss Institute of Bioinformatics, CMU, 1 rue Michel-Servet, CH-1211 Geneva 4, Switzerland
| | - Lorna Richardson
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Catherine Rivoire
- Swiss-Prot Group, SIB Swiss Institute of Bioinformatics, CMU, 1 rue Michel-Servet, CH-1211 Geneva 4, Switzerland
| | - Amaia Sangrador-Vegas
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Christian Sigrist
- Swiss-Prot Group, SIB Swiss Institute of Bioinformatics, CMU, 1 rue Michel-Servet, CH-1211 Geneva 4, Switzerland
| | - Ian Sillitoe
- Structural and Molecular Biology, University College London, Darwin Building, London WC1E 6BT, UK
| | - Ben Smithers
- Computer Science department, University of Bristol, Woodland Road, Bristol BS8 1UB, UK
| | - Silvano Squizzato
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Granger Sutton
- Bioinformatics Department, J. Craig Venter Institute, 9714 Medical Center Drive, Rockville, MD 20850, USA
| | - Narmada Thanki
- National Center for Biotechnology Information, National Library of Medicine, NIH Bldg, 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Paul D Thomas
- Division of Bioinformatics, Department of Preventive Medicine, University of Southern California, Los Angeles, CA 90033, USA
| | - Silvio C E Tosatto
- Department of Biomedical Sciences and CRIBI Biotech Center, University of Padua, via U. Bassi 58/b, 35131 Padua, Italy.,CNR Institute of Neuroscience, via U. Bassi 58/b, 35131 Padua, Italy
| | - Cathy H Wu
- Center for Bioinformatics and Computational Biology, University of Delaware, Newark, DE 19711, USA
| | - Ioannis Xenarios
- Swiss-Prot Group, SIB Swiss Institute of Bioinformatics, CMU, 1 rue Michel-Servet, CH-1211 Geneva 4, Switzerland
| | - Lai-Su Yeh
- Georgetown University Medical Center, 3300 Whitehaven St, NW, Washington, DC 20007, USA
| | - Siew-Yit Young
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Alex L Mitchell
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| |
Collapse
|