1
|
Alfatah M, Lim JJJ, Zhang Y, Naaz A, Cheng TYN, Yogasundaram S, Faidzinn NA, Lin JJ, Eisenhaber B, Eisenhaber F. Uncharacterized yeast gene YBR238C, an effector of TORC1 signaling in a mitochondrial feedback loop, accelerates cellular aging via HAP4- and RMD9-dependent mechanisms. eLife 2024; 12:RP92178. [PMID: 38713053 PMCID: PMC11076046 DOI: 10.7554/elife.92178] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/08/2024] Open
Abstract
Uncovering the regulators of cellular aging will unravel the complexity of aging biology and identify potential therapeutic interventions to delay the onset and progress of chronic, aging-related diseases. In this work, we systematically compared genesets involved in regulating the lifespan of Saccharomyces cerevisiae (a powerful model organism to study the cellular aging of humans) and those with expression changes under rapamycin treatment. Among the functionally uncharacterized genes in the overlap set, YBR238C stood out as the only one downregulated by rapamycin and with an increased chronological and replicative lifespan upon deletion. We show that YBR238C and its paralog RMD9 oppositely affect mitochondria and aging. YBR238C deletion increases the cellular lifespan by enhancing mitochondrial function. Its overexpression accelerates cellular aging via mitochondrial dysfunction. We find that the phenotypic effect of YBR238C is largely explained by HAP4- and RMD9-dependent mechanisms. Furthermore, we find that genetic- or chemical-based induction of mitochondrial dysfunction increases TORC1 (Target of Rapamycin Complex 1) activity that, subsequently, accelerates cellular aging. Notably, TORC1 inhibition by rapamycin (or deletion of YBR238C) improves the shortened lifespan under these mitochondrial dysfunction conditions in yeast and human cells. The growth of mutant cells (a proxy of TORC1 activity) with enhanced mitochondrial function is sensitive to rapamycin whereas the growth of defective mitochondrial mutants is largely resistant to rapamycin compared to wild type. Our findings demonstrate a feedback loop between TORC1 and mitochondria (the TORC1-MItochondria-TORC1 (TOMITO) signaling process) that regulates cellular aging processes. Hereby, YBR238C is an effector of TORC1 modulating mitochondrial function.
Collapse
Affiliation(s)
- Mohammad Alfatah
- Bioinformatics Institute (BII), Agency for Science, Technology and Research (A*STAR)SingaporeSingapore
| | - Jolyn Jia Jia Lim
- Bioinformatics Institute (BII), Agency for Science, Technology and Research (A*STAR)SingaporeSingapore
| | - Yizhong Zhang
- Bioinformatics Institute (BII), Agency for Science, Technology and Research (A*STAR)SingaporeSingapore
| | - Arshia Naaz
- Genome Institute of Singapore (GIS), Agency for Science, Technology and Research (A*STAR)SingaporeSingapore
| | - Trishia Yi Ning Cheng
- Bioinformatics Institute (BII), Agency for Science, Technology and Research (A*STAR)SingaporeSingapore
| | - Sonia Yogasundaram
- Bioinformatics Institute (BII), Agency for Science, Technology and Research (A*STAR)SingaporeSingapore
| | - Nashrul Afiq Faidzinn
- Bioinformatics Institute (BII), Agency for Science, Technology and Research (A*STAR)SingaporeSingapore
| | - Jovian Jing Lin
- Bioinformatics Institute (BII), Agency for Science, Technology and Research (A*STAR)SingaporeSingapore
| | - Birgit Eisenhaber
- Bioinformatics Institute (BII), Agency for Science, Technology and Research (A*STAR)SingaporeSingapore
- Genome Institute of Singapore (GIS), Agency for Science, Technology and Research (A*STAR)SingaporeSingapore
- LASA – Lausitz Advanced Scientific Applications gGmbHWeißwasserGermany
| | - Frank Eisenhaber
- Bioinformatics Institute (BII), Agency for Science, Technology and Research (A*STAR)SingaporeSingapore
- Genome Institute of Singapore (GIS), Agency for Science, Technology and Research (A*STAR)SingaporeSingapore
- LASA – Lausitz Advanced Scientific Applications gGmbHWeißwasserGermany
- School of Biological Sciences (SBS), Nanyang Technological University (NTU)SingaporeSingapore
| |
Collapse
|
2
|
Tantoso E, Eisenhaber B, Sinha S, Jensen LJ, Eisenhaber F. Did the early full genome sequencing of yeast boost gene function discovery? Biol Direct 2023; 18:46. [PMID: 37574542 PMCID: PMC10424406 DOI: 10.1186/s13062-023-00403-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2023] [Accepted: 08/01/2023] [Indexed: 08/15/2023] Open
Abstract
BACKGROUND Although the genome of Saccharomyces cerevisiae (S. cerevisiae) was the first one of a eukaryote organism that was fully sequenced (in 1996), a complete understanding of the potential of encoded biomolecular mechanisms has not yet been achieved. Here, we wish to quantify how far the goal of a full list of S. cerevisiae gene functions still is. RESULTS The scientific literature about S. cerevisiae protein-coding genes has been mapped onto the yeast genome via the mentioning of names for genomic regions in scientific publications. The match was quantified with the ratio of a given gene name's occurrences to those of any gene names in the article. We find that ~ 230 elite genes with ≥ 75 full publication equivalents (FPEs, FPE = 1 is an idealized publication referring to just a single gene) command ~ 45% of all literature. At the same time, about two thirds of the genes (each with less than 10 FPEs) are described in just 12% of the literature (in average each such gene has just ~ 1.5% of the literature of an elite gene). About 600 genes have not been mentioned in any dedicated article. Compared with other groups of genes, the literature growth rates were highest for uncharacterized or understudied genes until late nineties of the twentieth century. Yet, these growth rates deteriorated and became negative thereafter. Thus, yeast function discovery for previously uncharacterized genes has returned to the level of ~ 1980. At the same time, literature for anyhow well-studied genes (with a threshold T10 (≥ 10 FPEs) and higher) remains steadily growing. CONCLUSIONS Did the early full genome sequencing of yeast boost gene function discovery? The data proves that the moment of publishing the full genome in reality coincides with the onset of decline of gene function discovery for previously uncharacterized genes. If the current status of literature about yeast molecular mechanisms can be extrapolated into the future, it will take about another ~ 50 years to complete the yeast gene function list. We found that a small group of scientific journals contributed extraordinarily to publishing early reports relevant to yeast gene function discoveries.
Collapse
Affiliation(s)
- Erwin Tantoso
- Agency for Science, Technology and Research (A*STAR), Bioinformatics Institute (BII), 30 Biopolis Street #07-01, Matrix Building, Singapore, 138671, Republic of Singapore.
- Agency for Science, Technology and Research (A*STAR), Genome Institute of Singapore (GIS), 60 Biopolis Street, Singapore, 138672, Republic of Singapore.
| | - Birgit Eisenhaber
- Agency for Science, Technology and Research (A*STAR), Bioinformatics Institute (BII), 30 Biopolis Street #07-01, Matrix Building, Singapore, 138671, Republic of Singapore.
- Agency for Science, Technology and Research (A*STAR), Genome Institute of Singapore (GIS), 60 Biopolis Street, Singapore, 138672, Republic of Singapore.
- LASA - Lausitz Advanced Scientific Applications gGmbH, Straße Der Einheit 2-24, 02943, Weißwasser, Federal Republic of Germany.
| | - Swati Sinha
- European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Lars Juhl Jensen
- Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| | - Frank Eisenhaber
- Agency for Science, Technology and Research (A*STAR), Bioinformatics Institute (BII), 30 Biopolis Street #07-01, Matrix Building, Singapore, 138671, Republic of Singapore.
- Agency for Science, Technology and Research (A*STAR), Genome Institute of Singapore (GIS), 60 Biopolis Street, Singapore, 138672, Republic of Singapore.
- LASA - Lausitz Advanced Scientific Applications gGmbH, Straße Der Einheit 2-24, 02943, Weißwasser, Federal Republic of Germany.
- School of Biological Sciences, Nanyang Technological University, 60 Nanyang Drive, Singapore, 637551, Republic of Singapore.
| |
Collapse
|
3
|
Tantoso E, Eisenhaber B, Sinha S, Jensen LJ, Eisenhaber F. About the dark corners in the gene function space of Escherichia coli remaining without illumination by scientific literature. Biol Direct 2023; 18:7. [PMID: 36855185 PMCID: PMC9976479 DOI: 10.1186/s13062-023-00362-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2022] [Accepted: 02/21/2023] [Indexed: 03/02/2023] Open
Abstract
BACKGROUND Although Escherichia coli (E. coli) is the most studied prokaryote organism in the history of life sciences, many molecular mechanisms and gene functions encoded in its genome remain to be discovered. This work aims at quantifying the illumination of the E. coli gene function space by the scientific literature and how close we are towards the goal of a complete list of E. coli gene functions. RESULTS The scientific literature about E. coli protein-coding genes has been mapped onto the genome via the mentioning of names for genomic regions in scientific articles both for the case of the strain K-12 MG1655 as well as for the 95%-threshold softcore genome of 1324 E. coli strains with known complete genome. The article match was quantified with the ratio of a given gene name's occurrence to the mentioning of any gene names in the paper. The various genome regions have an extremely uneven literature coverage. A group of elite genes with ≥ 100 full publication equivalents (FPEs, FPE = 1 is an idealized publication devoted to just a single gene) attracts the lion share of the papers. For K-12, ~ 65% of the literature covers just 342 elite genes; for the softcore genome, ~ 68% of the FPEs is about only 342 elite gene families (GFs). We also find that most genes/GFs have at least one mentioning in a dedicated scientific article (with the exception of at least 137 protein-coding transcripts for K-12 and 26 GFs from the softcore genome). Whereas the literature growth rates were highest for uncharacterized or understudied genes until 2005-2010 compared with other groups of genes, they became negative thereafter. At the same time, literature for anyhow well-studied genes started to grow explosively with threshold T10 (≥ 10 FPEs). Typically, a body of ~ 20 actual articles generated over ~ 15 years of research effort was necessary to reach T10. Lineage-specific co-occurrence analysis of genes belonging to the accessory genome of E. coli together with genomic co-localization and sequence-analytic exploration hints previously completely uncharacterized genes yahV and yddL being associated with osmotic stress response/motility mechanisms. CONCLUSION If the numbers of scientific articles about uncharacterized and understudied genes remain at least at present levels, full gene function lists for the strain K-12 MG1655 and the E. coli softcore genome are in reach within the next 25-30 years. Once the literature body for a gene crosses 10 FPEs, most of the critical fundamental research risk appears overcome and steady incremental research becomes possible.
Collapse
Affiliation(s)
- Erwin Tantoso
- Agency for Science, Technology and Research (A*STAR), Genome Institute of Singapore (GIS), 60 Biopolis Street, Singapore, 138672, Republic of Singapore.,Agency for Science, Technology and Research (A*STAR), Bioinformatics Institute (BII), 30 Biopolis Street #07-01, Matrix Building, Singapore, 138671, Republic of Singapore
| | - Birgit Eisenhaber
- Agency for Science, Technology and Research (A*STAR), Genome Institute of Singapore (GIS), 60 Biopolis Street, Singapore, 138672, Republic of Singapore.,Agency for Science, Technology and Research (A*STAR), Bioinformatics Institute (BII), 30 Biopolis Street #07-01, Matrix Building, Singapore, 138671, Republic of Singapore
| | - Swati Sinha
- Agency for Science, Technology and Research (A*STAR), Genome Institute of Singapore (GIS), 60 Biopolis Street, Singapore, 138672, Republic of Singapore.,Agency for Science, Technology and Research (A*STAR), Bioinformatics Institute (BII), 30 Biopolis Street #07-01, Matrix Building, Singapore, 138671, Republic of Singapore.,European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Lars Juhl Jensen
- Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| | - Frank Eisenhaber
- Agency for Science, Technology and Research (A*STAR), Genome Institute of Singapore (GIS), 60 Biopolis Street, Singapore, 138672, Republic of Singapore. .,Agency for Science, Technology and Research (A*STAR), Bioinformatics Institute (BII), 30 Biopolis Street #07-01, Matrix Building, Singapore, 138671, Republic of Singapore. .,School of Biological Sciences, Nanyang Technological University, 60 Nanyang Drive, Singapore, 637551, Republic of Singapore.
| |
Collapse
|
4
|
Tang YJ, Pang YH, Liu B. DeepIDP-2L: protein intrinsically disordered region prediction by combining convolutional attention network and hierarchical attention network. Bioinformatics 2022; 38:1252-1260. [PMID: 34864847 DOI: 10.1093/bioinformatics/btab810] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2021] [Revised: 11/02/2021] [Accepted: 11/26/2021] [Indexed: 01/05/2023] Open
Abstract
MOTIVATION Intrinsically disordered regions (IDRs) are widely distributed in proteins. Accurate prediction of IDRs is critical for the protein structure and function analysis. The IDRs are divided into long disordered regions (LDRs) and short disordered regions (SDRs) according to their lengths. Previous studies have shown that LDRs and SDRs have different proprieties. However, the existing computational methods fail to extract different features for LDRs and SDRs separately. As a result, they achieve unstable performance on datasets with different ratios of LDRs and SDRs. RESULTS In this study, a two-layer predictor was proposed called DeepIDP-2L. In the first layer, two kinds of attention-based models are used to extract different features for LDRs and SDRs, respectively. The hierarchical attention network is used to capture the distribution pattern features of LDRs, and convolutional attention network is used to capture the local correlation features of SDRs. The second layer of DeepIDP-2L maps the feature extracted in the first layer into a new feature space. Convolutional network and bidirectional long short term memory are used to capture the local and long-range information for predicting both SDRs and LDRs. Experimental results show that DeepIDP-2L can achieve more stable performance than other exiting predictors on independent test sets with different ratios of SDRs and LDRs. AVAILABILITY AND IMPLEMENTATION For the convenience of most experimental scientists, a user-friendly and publicly accessible web-server for the new predictor has been established at http://bliulab.net/DeepIDP-2L/. It is anticipated that DeepIDP-2L will become a very useful tool for identification of intrinsically disordered regions. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Yi-Jun Tang
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing 100081, China
| | - Yi-He Pang
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing 100081, China
| | - Bin Liu
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing 100081, China.,Advanced Research Institute of Multidisciplinary Science, Beijing Institute of Technology, Beijing 100081, China
| |
Collapse
|
5
|
Tantoso E, Eisenhaber B, Eisenhaber F. Optimizing the Parametrization of Homologue Classification in the Pan-Genome Computation for a Bacterial Species: Case Study Streptococcus pyogenes. Methods Mol Biol 2022; 2449:299-324. [PMID: 35507269 DOI: 10.1007/978-1-0716-2095-3_13] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
The paradigm shift associated with the introduction of the pan-genome concept has drawn the attention from singular reference genomes toward the actual sequence diversity within organism populations, strain collections, clades, etc. A single genome is no longer sufficient to describe bacteria of interest, but instead, the genomic repertoire of all existing strains is the key to the metabolic, evolutionary, or pathogenic potential of a species. The classification of orthologous genes derived from a collection of taxonomically related genome sequences is central to bacterial pan-genome computational analysis. In this work, we present a review of methods for computing pan-genome gene clusters including their comparative analysis for the case of Streptococcus pyogenes strain genomes. We exhaustively scanned the parametrization space of the homologue searching procedures and find optimal parameters (sequence identity (60%) and coverage (50-60%) in the pairwise alignment) for the orthologous clustering of gene sequences. We find that the sequence identity threshold influences the number of gene families ~3 times stronger than the sequence coverage threshold.
Collapse
Affiliation(s)
- Erwin Tantoso
- Bioinformatics Institute (BII), Agency for Science, Technology and Research (A*STAR), Singapore, Republic of Singapore
| | - Birgit Eisenhaber
- Bioinformatics Institute (BII), Agency for Science, Technology and Research (A*STAR), Singapore, Republic of Singapore
- Genome Institute Singapore (GIS), Agency for Science, Technology and Research (A*STAR), Singapore, Republic of Singapore
| | - Frank Eisenhaber
- Genome Institute and Bioinformatics Institute, Singapore, Singapore.
| |
Collapse
|
6
|
Affiliation(s)
- Frank Eisenhaber
- Bioinformatics Institute (BII), Agency for Science, Technology and Research (A*STAR), Singapore.,Genome Institute of Singapore (GIS), Agency for Science, Technology and Research (A*STAR), Singapore.,School of Biological Sciences, Nanyang Technological University (NTU), Singapore
| | - Chandra Verma
- Bioinformatics Institute (BII), Agency for Science, Technology and Research (A*STAR), Singapore.,School of Biological Sciences, Nanyang Technological University (NTU), Singapore.,Department of Biological Sciences, National University of Singapore, Singapore
| | - Tom Blundell
- Department of Biochemistry, University of Cambridge, Cambridge, UK
| |
Collapse
|
7
|
Tang YJ, Pang YH, Liu B. IDP-Seq2Seq: identification of intrinsically disordered regions based on sequence to sequence learning. Bioinformatics 2021; 36:5177-5186. [PMID: 32702119 DOI: 10.1093/bioinformatics/btaa667] [Citation(s) in RCA: 85] [Impact Index Per Article: 28.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2020] [Revised: 06/21/2020] [Accepted: 07/17/2020] [Indexed: 12/29/2022] Open
Abstract
MOTIVATION Related to many important biological functions, intrinsically disordered regions (IDRs) are widely distributed in proteins. Accurate prediction of IDRs is critical for the protein structure and function analysis. However, the existing computational methods construct the predictive models solely in the sequence space, failing to convert the sequence space into the 'semantic space' to reflect the structure characteristics of proteins. Furthermore, although the length-dependent predictors showed promising results, new fusion strategies should be explored to improve their predictive performance and the generalization. RESULTS In this study, we applied the Sequence to Sequence Learning (Seq2Seq) derived from natural language processing (NLP) to map protein sequences to 'semantic space' to reflect the structure patterns with the help of predicted residue-residue contacts (CCMs) and other sequence-based features. Furthermore, the Attention mechanism was used to capture the global associations between all residue pairs in the proteins. Three length-dependent predictors were constructed: IDP-Seq2Seq-L for long disordered region prediction, IDP-Seq2Seq-S for short disordered region prediction and IDP-Seq2Seq-G for both long and short disordered region predictions. Finally, these three predictors were fused into one predictor called IDP-Seq2Seq to improve the discriminative power and generalization. Experimental results on four independent test datasets and the CASP test dataset showed that IDP-Seq2Seq is insensitive with the ratios of long and short disordered regions and outperforms other competing methods. AVAILABILITY AND IMPLEMENTATION For the convenience of most experimental scientists, a user-friendly and publicly accessible web-server for the powerful new predictor has been established at http://bliulab.net/IDP-Seq2Seq/. It is anticipated that IDP-Seq2Seq will become a very useful tool for identification of IDRs. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Yi-Jun Tang
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing 100081, China
| | - Yi-He Pang
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing 100081, China
| | - Bin Liu
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing 100081, China.,Advanced Research Institute of Multidisciplinary Science, Beijing Institute of Technology, Beijing 100081, China
| |
Collapse
|
8
|
Niska-Blakie J, Gopinathan L, Low KN, Kien YL, Goh CMF, Caldez MJ, Pfeiffenberger E, Jones OS, Ong CB, Kurochkin IV, Coppola V, Tessarollo L, Choi H, Kanagasundaram Y, Eisenhaber F, Maurer-Stroh S, Kaldis P. Knockout of the non-essential gene SUGCT creates diet-linked, age-related microbiome disbalance with a diabetes-like metabolic syndrome phenotype. Cell Mol Life Sci 2020; 77:3423-3439. [PMID: 31722069 PMCID: PMC7426296 DOI: 10.1007/s00018-019-03359-z] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2019] [Revised: 10/23/2019] [Accepted: 10/29/2019] [Indexed: 02/07/2023]
Abstract
SUGCT (C7orf10) is a mitochondrial enzyme that synthesizes glutaryl-CoA from glutarate in tryptophan and lysine catabolism, but it has not been studied in vivo. Although mutations in Sugct lead to Glutaric Aciduria Type 3 disease in humans, patients remain largely asymptomatic despite high levels of glutarate in the urine. To study the disease mechanism, we generated SugctKO mice and uncovered imbalanced lipid and acylcarnitine metabolism in kidney in addition to changes in the gut microbiome. After SugctKO mice were treated with antibiotics, metabolites were comparable to WT, indicating that the microbiome affects metabolism in SugctKO mice. SUGCT loss of function contributes to gut microbiota dysbiosis, leading to age-dependent pathological changes in kidney, liver, and adipose tissue. This is associated with an obesity-related phenotype that is accompanied by lipid accumulation in kidney and liver, as well as "crown-like" structures in adipocytes. Furthermore, we show that the SugctKO kidney pathology is accelerated and exacerbated by a high-lysine diet. Our study highlights the importance of non-essential genes with no readily detectable early phenotype, but with substantial contributions to the development of age-related pathologies, which result from an interplay between genetic background, microbiome, and diet in the health of mammals.
Collapse
Affiliation(s)
- Joanna Niska-Blakie
- Institute of Molecular and Cell Biology (IMCB), A*STAR (Agency for Science, Technology and Research), 61 Biopolis Drive, Proteos #3-09, Singapore, 138673, Republic of Singapore
- Bioinformatics Institute (BII), A*STAR, Singapore, 138671, Republic of Singapore
| | - Lakshmi Gopinathan
- Institute of Molecular and Cell Biology (IMCB), A*STAR (Agency for Science, Technology and Research), 61 Biopolis Drive, Proteos #3-09, Singapore, 138673, Republic of Singapore
| | - Kia Ngee Low
- Bioinformatics Institute (BII), A*STAR, Singapore, 138671, Republic of Singapore
| | - Yang Lay Kien
- Bioinformatics Institute (BII), A*STAR, Singapore, 138671, Republic of Singapore
| | - Christine M F Goh
- Institute of Molecular and Cell Biology (IMCB), A*STAR (Agency for Science, Technology and Research), 61 Biopolis Drive, Proteos #3-09, Singapore, 138673, Republic of Singapore
| | - Matias J Caldez
- Institute of Molecular and Cell Biology (IMCB), A*STAR (Agency for Science, Technology and Research), 61 Biopolis Drive, Proteos #3-09, Singapore, 138673, Republic of Singapore
- Department of Biochemistry, National University of Singapore (NUS), Singapore, 117597, Republic of Singapore
| | - Elisabeth Pfeiffenberger
- Institute of Molecular and Cell Biology (IMCB), A*STAR (Agency for Science, Technology and Research), 61 Biopolis Drive, Proteos #3-09, Singapore, 138673, Republic of Singapore
| | - Oliver S Jones
- Institute of Molecular and Cell Biology (IMCB), A*STAR (Agency for Science, Technology and Research), 61 Biopolis Drive, Proteos #3-09, Singapore, 138673, Republic of Singapore
| | - Chee Bing Ong
- Institute of Molecular and Cell Biology (IMCB), A*STAR (Agency for Science, Technology and Research), 61 Biopolis Drive, Proteos #3-09, Singapore, 138673, Republic of Singapore
| | - Igor V Kurochkin
- Bioinformatics Institute (BII), A*STAR, Singapore, 138671, Republic of Singapore
| | - Vincenzo Coppola
- Department of Cancer Biology and Genetics, The Ohio State University, 988 Biomedical Research Tower, 460 West 12th Ave, Columbus, OH, 43210, USA
| | - Lino Tessarollo
- Mouse Cancer Genetics Program, National Cancer Institute, NCI-Frederick, Bldg. 560, 1050 Boyles Street, Frederick, MD, 21702-1201, USA
| | - Hyungwon Choi
- Institute of Molecular and Cell Biology (IMCB), A*STAR (Agency for Science, Technology and Research), 61 Biopolis Drive, Proteos #3-09, Singapore, 138673, Republic of Singapore
- Department of Medicine, National University of Singapore (NUS), Singapore, 117597, Republic of Singapore
| | | | - Frank Eisenhaber
- Bioinformatics Institute (BII), A*STAR, Singapore, 138671, Republic of Singapore
- School of Computer Science and Engineering (SCSE), Nanyang Technological University (NTU), Singapore, 637553, Republic of Singapore
| | - Sebastian Maurer-Stroh
- Bioinformatics Institute (BII), A*STAR, Singapore, 138671, Republic of Singapore.
- Department of Biological Sciences (DBS), National University of Singapore (NUS), 14 Science Drive 4, Singapore, 117597, Republic of Singapore.
| | - Philipp Kaldis
- Institute of Molecular and Cell Biology (IMCB), A*STAR (Agency for Science, Technology and Research), 61 Biopolis Drive, Proteos #3-09, Singapore, 138673, Republic of Singapore.
- Department of Biochemistry, National University of Singapore (NUS), Singapore, 117597, Republic of Singapore.
- Department of Clinical Sciences, Lund University, Clinical Research Centre (CRC), Box 50332, 202 13, Malmö, Sweden.
| |
Collapse
|
9
|
Tantoso E, Wong WC, Tay WH, Lee J, Sinha S, Eisenhaber B, Eisenhaber F. Hypocrisy Around Medical Patient Data: Issues of Access for Biomedical Research, Data Quality, Usefulness for the Purpose and Omics Data as Game Changer. Asian Bioeth Rev 2019; 11:189-207. [PMID: 33717311 PMCID: PMC7747340 DOI: 10.1007/s41649-019-00085-3] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2018] [Revised: 04/23/2019] [Accepted: 04/30/2019] [Indexed: 11/14/2022] Open
Abstract
Whether due to simplicity or hypocrisy, the question of access to patient data for biomedical research is widely seen in the public discourse only from the angle of patient privacy. At the same time, the desire to live and to live without disability is of much higher value to the patients. This goal can only be achieved by extracting research insight from patient data in addition to working on model organisms, something that is well understood by many patients. Yet, most biomedical researchers working outside of clinics and hospitals are denied access to patient records when, at the same time, clinicians who guard the patient data are not optimally prepared for the data’s analysis. Medical data collection is a time- and cost-intensive process that is most of all tedious, with few elements of intellectual and emotional satisfaction on its own. In this process, clinicians and bioinformaticians, each group with their own interests, have to join forces with the goal to generate medical data sets both from clinical trials and from routinely collected electronic health records that are, as much as possible, free from errors and obvious inconsistencies. The data cleansing effort as we have learned during curation of Singaporean clinical trial data is not a trivial task. The introduction of omics and sophisticated imaging modalities into clinical practice that are only partially interpreted in terms of diagnosis and therapy with today’s level of knowledge warrant the creation of clinical databases with full patient history. This opens up opportunities for re-analyses and cross-trial studies at future time points with more sophisticated analyses of the same data, the collection of which is very expensive.
Collapse
Affiliation(s)
- Erwin Tantoso
- Bioinformatics Institute (BII), Agency for Science and Technology (ASTAR), 30 Biopolis Street, #07-01, Matrix, Singapore, 138671 Singapore
| | - Wing-Cheong Wong
- Bioinformatics Institute (BII), Agency for Science and Technology (ASTAR), 30 Biopolis Street, #07-01, Matrix, Singapore, 138671 Singapore
| | - Wei Hong Tay
- Bioinformatics Institute (BII), Agency for Science and Technology (ASTAR), 30 Biopolis Street, #07-01, Matrix, Singapore, 138671 Singapore
| | - Joanne Lee
- Bioinformatics Institute (BII), Agency for Science and Technology (ASTAR), 30 Biopolis Street, #07-01, Matrix, Singapore, 138671 Singapore
| | - Swati Sinha
- Bioinformatics Institute (BII), Agency for Science and Technology (ASTAR), 30 Biopolis Street, #07-01, Matrix, Singapore, 138671 Singapore
| | - Birgit Eisenhaber
- Bioinformatics Institute (BII), Agency for Science and Technology (ASTAR), 30 Biopolis Street, #07-01, Matrix, Singapore, 138671 Singapore
| | - Frank Eisenhaber
- Bioinformatics Institute (BII), Agency for Science and Technology (ASTAR), 30 Biopolis Street, #07-01, Matrix, Singapore, 138671 Singapore.,School of Computer Science and Engineering (SCSE), Nanyang Technological University (NTU), 50 Nanyang Drive, Singapore, 637553 Singapore
| |
Collapse
|
10
|
Ng SB, Kanagasundaram Y, Fan H, Arumugam P, Eisenhaber B, Eisenhaber F. The 160K Natural Organism Library, a unique resource for natural products research. Nat Biotechnol 2018; 36:570-573. [PMID: 29979661 DOI: 10.1038/nbt.4187] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
Affiliation(s)
- Siew Bee Ng
- Bioinformatics Institute, Agency for Science, Technology and Research (A*STAR), Singapore, Republic of Singapore
| | - Yoganathan Kanagasundaram
- Bioinformatics Institute, Agency for Science, Technology and Research (A*STAR), Singapore, Republic of Singapore
| | - Hao Fan
- Bioinformatics Institute, Agency for Science, Technology and Research (A*STAR), Singapore, Republic of Singapore
| | - Prakash Arumugam
- Bioinformatics Institute, Agency for Science, Technology and Research (A*STAR), Singapore, Republic of Singapore
| | - Birgit Eisenhaber
- Bioinformatics Institute, Agency for Science, Technology and Research (A*STAR), Singapore, Republic of Singapore
| | - Frank Eisenhaber
- Bioinformatics Institute, Agency for Science, Technology and Research (A*STAR), Singapore, Republic of Singapore.,School of Computer Engineering, Nanyang Technological University (NTU), Singapore, Republic of Singapore
| |
Collapse
|
11
|
Sinha S, Eisenhaber B, Jensen LJ, Kalbuaji B, Eisenhaber F. Darkness in the Human Gene and Protein Function Space: Widely Modest or Absent Illumination by the Life Science Literature and the Trend for Fewer Protein Function Discoveries Since 2000. Proteomics 2018; 18:e1800093. [PMID: 30265449 PMCID: PMC6282819 DOI: 10.1002/pmic.201800093] [Citation(s) in RCA: 20] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2018] [Revised: 09/07/2018] [Indexed: 12/15/2022]
Abstract
The mentioning of gene names in the body of the scientific literature 1901-2017 and their fractional counting is used as a proxy to assess the level of biological function discovery. A literature score of one has been defined as full publication equivalent (FPE), the amount of literature necessary to achieve one publication solely dedicated to a gene. It has been found that less than 5000 human genes have each at least 100 FPEs in the available literature corpus. This group of elite genes (4817 protein-coding genes, 119 non-coding RNAs) attracts the overwhelming majority of the scientific literature about genes. Yet, thousands of proteins have never been mentioned at all, ≈2000 further proteins have not even one FPE of literature and, for ≈4600 additional proteins, the FPE count is below 10. The protein function discovery rate measured as numbers of proteins first mentioned or crossing a threshold of accumulated FPEs in a given year has grown until 2000 but is in decline thereafter. This drop is partially offset by function discoveries for non-coding RNAs. The full human genome sequencing does not boost the function discovery rate. Since 2000, the fastest growing group in the literature is that with at least 500 FPEs per gene.
Collapse
Affiliation(s)
- Swati Sinha
- Bioinformatics Institute (BII)Agency for Science and Technology (A*STAR)Matrix138671Singapore
| | - Birgit Eisenhaber
- Bioinformatics Institute (BII)Agency for Science and Technology (A*STAR)Matrix138671Singapore
| | - Lars Juhl Jensen
- Novo Nordisk Foundation Center for Protein ResearchFaculty of Health and Medical SciencesUniversity of CopenhagenDK-2200 CopenhagenDenmark
| | - Bharata Kalbuaji
- Bioinformatics Institute (BII)Agency for Science and Technology (A*STAR)Matrix138671Singapore
| | - Frank Eisenhaber
- Bioinformatics Institute (BII)Agency for Science and Technology (A*STAR)Matrix138671Singapore
- School of Computer Science and Engineering (SCSE)Nanyang Technological University (NTU)637553Singapore
| |
Collapse
|
12
|
Eisenhaber B, Sinha S, Wong WC, Eisenhaber F. Function of a membrane-embedded domain evolutionarily multiplied in the GPI lipid anchor pathway proteins PIG-B, PIG-M, PIG-U, PIG-W, PIG-V, and PIG-Z. Cell Cycle 2018; 17:874-880. [PMID: 29764287 PMCID: PMC6056205 DOI: 10.1080/15384101.2018.1456294] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open
Abstract
Distant homology relationships among proteins with many transmembrane regions (TMs) are difficult to detect as they are clouded by the TMs’ hydrophobic compositional bias and mutational divergence in connecting loops. In the case of several GPI lipid anchor biosynthesis pathway components, the hidden evolutionary signal can be revealed with dissectHMMER, a sequence similarity search tool focusing on fold-critical, high complexity sequence segments. We find that a sequence module with 10 TMs in PIG-W, described as acyl transferase, is homologous to PIG-U, a transamidase subunit without characterized molecular function, and to mannosyltransferases PIG-B, PIG-M, PIG-V and PIG-Z. We conclude that this new, membrane-embedded domain named BindGPILA functions as the unit for recognizing, binding and stabilizing the GPI lipid anchor in a modification-competent form as this appears the only functional aspect shared among all proteins. Thus, PIG-U's likely molecular function is shuttling/presenting the anchor in a productive conformation to the transamidase complex.
Collapse
Affiliation(s)
- Birgit Eisenhaber
- a Bioinformatics Institute, Agency for Science , Technology and Research (A*STAR) , 30 Biopolis Street, #07-01 Matrix, Singapore 138671 , Republic of Singapore
| | - Swati Sinha
- a Bioinformatics Institute, Agency for Science , Technology and Research (A*STAR) , 30 Biopolis Street, #07-01 Matrix, Singapore 138671 , Republic of Singapore
| | - Wing-Cheong Wong
- a Bioinformatics Institute, Agency for Science , Technology and Research (A*STAR) , 30 Biopolis Street, #07-01 Matrix, Singapore 138671 , Republic of Singapore
| | - Frank Eisenhaber
- a Bioinformatics Institute, Agency for Science , Technology and Research (A*STAR) , 30 Biopolis Street, #07-01 Matrix, Singapore 138671 , Republic of Singapore.,b School of Computer Engineering , Nanyang Technological University (NTU) , 50 Nanyang Drive, Singapore 637553 , Republic of Singapore
| |
Collapse
|
13
|
Limviphuvadh V, Tan CS, Konishi F, Jenjaroenpun P, Xiang JS, Kremenska Y, Mu YS, Syn N, Lee SC, Soo RA, Eisenhaber F, Maurer-Stroh S, Yong WP. Discovering novel SNPs that are correlated with patient outcome in a Singaporean cancer patient cohort treated with gemcitabine-based chemotherapy. BMC Cancer 2018; 18:555. [PMID: 29751792 PMCID: PMC5948914 DOI: 10.1186/s12885-018-4471-x] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2017] [Accepted: 05/01/2018] [Indexed: 12/20/2022] Open
Abstract
Background Single Nucleotide Polymorphisms (SNPs) can influence patient outcome such as drug response and toxicity after drug intervention. The purpose of this study is to develop a systematic pathway approach to accurately and efficiently predict novel non-synonymous SNPs (nsSNPs) that could be causative to gemcitabine-based chemotherapy treatment outcome in Singaporean non-small cell lung cancer (NSCLC) patients. Methods Using a pathway approach that incorporates comprehensive protein-protein interaction data to systematically extend the gemcitabine pharmacologic pathway, we identified 77 related nsSNPs, common in the Singaporean population. After that, we used five computational criteria to prioritize the SNPs based on their importance for protein function. We specifically selected and screened six candidate SNPs in a patient cohort with NSCLC treated with gemcitabine-based chemotherapy. Result We performed survival analysis followed by hematologic toxicity analyses and found that three of six candidate SNPs are significantly correlated with the patient outcome (P < 0.05) i.e. ABCG2 Q141K (rs2231142), SLC29A3 S158F (rs780668) and POLR2A N764K (rs2228130). Conclusions Our computational SNP candidate enrichment workflow approach was able to identify several high confidence biomarkers predictive for personalized drug treatment outcome while providing a rationale for a molecular mechanism of the SNP effect. Trial registration NCT00695994. Registered 10 June, 2008 ‘retrospectively registered’. Electronic supplementary material The online version of this article (10.1186/s12885-018-4471-x) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Vachiranee Limviphuvadh
- Bioinformatics Institute (BII), Agency for Science, Technology and Research (A*STAR), 30 Biopolis Street, #07-01 Matrix, Singapore, 138671, Singapore
| | - Chee Seng Tan
- Department of Haematology-Oncology, National University Health System, 5 Lower Kent Ridge Road, Singapore, 119074, Singapore
| | - Fumikazu Konishi
- Education Academy of Computational Life Sciences, Tokyo Institute of Technology, Tokyo, Japan
| | - Piroon Jenjaroenpun
- Bioinformatics Institute (BII), Agency for Science, Technology and Research (A*STAR), 30 Biopolis Street, #07-01 Matrix, Singapore, 138671, Singapore
| | - Joy Shengnan Xiang
- Bioinformatics Institute (BII), Agency for Science, Technology and Research (A*STAR), 30 Biopolis Street, #07-01 Matrix, Singapore, 138671, Singapore
| | - Yuliya Kremenska
- Bioinformatics Institute (BII), Agency for Science, Technology and Research (A*STAR), 30 Biopolis Street, #07-01 Matrix, Singapore, 138671, Singapore
| | - Yar Soe Mu
- Department of Haematology-Oncology, National University Health System, 5 Lower Kent Ridge Road, Singapore, 119074, Singapore
| | - Nicholas Syn
- Department of Haematology-Oncology, National University Health System, 5 Lower Kent Ridge Road, Singapore, 119074, Singapore.,Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore
| | - Soo Chin Lee
- Department of Haematology-Oncology, National University Health System, 5 Lower Kent Ridge Road, Singapore, 119074, Singapore
| | - Ross A Soo
- Department of Haematology-Oncology, National University Health System, 5 Lower Kent Ridge Road, Singapore, 119074, Singapore.,Cancer Science Institute of Singapore, National University of Singapore, Singapore, Singapore
| | - Frank Eisenhaber
- Bioinformatics Institute (BII), Agency for Science, Technology and Research (A*STAR), 30 Biopolis Street, #07-01 Matrix, Singapore, 138671, Singapore.,Department of Biological Sciences, National University of Singapore (NUS), 14 Science Drive 4, Singapore, 117543, Singapore.,School of Computer Engineering (SCE), Nanyang Technological University (NTU), 50 Nanyang Drive, Singapore, 637553, Singapore
| | - Sebastian Maurer-Stroh
- Bioinformatics Institute (BII), Agency for Science, Technology and Research (A*STAR), 30 Biopolis Street, #07-01 Matrix, Singapore, 138671, Singapore.,Department of Biological Sciences, National University of Singapore (NUS), 14 Science Drive 4, Singapore, 117543, Singapore
| | - Wei Peng Yong
- Department of Haematology-Oncology, National University Health System, 5 Lower Kent Ridge Road, Singapore, 119074, Singapore.
| |
Collapse
|
14
|
Kumar G, Mudgal R, Srinivasan N, Sandhya S. Use of designed sequences in protein structure recognition. Biol Direct 2018; 13:8. [PMID: 29776380 PMCID: PMC5960202 DOI: 10.1186/s13062-018-0209-6] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2017] [Accepted: 04/18/2018] [Indexed: 12/13/2022] Open
Abstract
Background Knowledge of the protein structure is a pre-requisite for improved understanding of molecular function. The gap in the sequence-structure space has increased in the post-genomic era. Grouping related protein sequences into families can aid in narrowing the gap. In the Pfam database, structure description is provided for part or full-length proteins of 7726 families. For the remaining 52% of the families, information on 3-D structure is not yet available. We use the computationally designed sequences that are intermediately related to two protein domain families, which are already known to share the same fold. These strategically designed sequences enable detection of distant relationships and here, we have employed them for the purpose of structure recognition of protein families of yet unknown structure. Results We first measured the success rate of our approach using a dataset of protein families of known fold and achieved a success rate of 88%. Next, for 1392 families of yet unknown structure, we made structural assignments for part/full length of the proteins. Fold association for 423 domains of unknown function (DUFs) are provided as a step towards functional annotation. Conclusion The results indicate that knowledge-based filling of gaps in protein sequence space is a lucrative approach for structure recognition. Such sequences assist in traversal through protein sequence space and effectively function as ‘linkers’, where natural linkers between distant proteins are unavailable. Reviewers This article was reviewed by Oliviero Carugo, Christine Orengo and Srikrishna Subramanian. Electronic supplementary material The online version of this article (10.1186/s13062-018-0209-6) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Gayatri Kumar
- Lab 103, Molecular Biophysics Unit, Indian Institute of Science, Bangalore, Karnataka, 560012, India
| | - Richa Mudgal
- Lab 103, Molecular Biophysics Unit, Indian Institute of Science, Bangalore, Karnataka, 560012, India.,Present address: Institute for Research in Biomedicine (IRB), Parc Cientific de Barcelona, C/ Baldiri Reixac 10, 08028, Barcelona, Spain
| | - Narayanaswamy Srinivasan
- Lab 103, Molecular Biophysics Unit, Indian Institute of Science, Bangalore, Karnataka, 560012, India.
| | - Sankaran Sandhya
- Lab 103, Molecular Biophysics Unit, Indian Institute of Science, Bangalore, Karnataka, 560012, India.
| |
Collapse
|
15
|
Marakasova ES, Eisenhaber B, Maurer-Stroh S, Eisenhaber F, Baranova A. Prenylation of viral proteins by enzymes of the host: Virus-driven rationale for therapy with statins and FT/GGT1 inhibitors. Bioessays 2017; 39. [DOI: 10.1002/bies.201700014] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
Affiliation(s)
| | - Birgit Eisenhaber
- Bioinformatics Institute; Agency for Science; Technology and Research Singapore
| | - Sebastian Maurer-Stroh
- Bioinformatics Institute; Agency for Science; Technology and Research Singapore
- Department of Biological Sciences; National University Singapore; Singapore
| | - Frank Eisenhaber
- Bioinformatics Institute; Agency for Science; Technology and Research Singapore
- Department of Biological Sciences; National University Singapore; Singapore
- School of Computer Engineering; Nanyang Technological University; Singapore
| | - Ancha Baranova
- School of Systems Biology; George Mason University; Fairfax VA USA
- Research Centre for Medical Genetics; Russian Academy of Medical Sciences; Moscow Russia
| |
Collapse
|
16
|
Baker JA, Wong WC, Eisenhaber B, Warwicker J, Eisenhaber F. Charged residues next to transmembrane regions revisited: "Positive-inside rule" is complemented by the "negative inside depletion/outside enrichment rule". BMC Biol 2017; 15:66. [PMID: 28738801 PMCID: PMC5525207 DOI: 10.1186/s12915-017-0404-4] [Citation(s) in RCA: 49] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2017] [Accepted: 07/07/2017] [Indexed: 11/25/2022] Open
Abstract
Background Transmembrane helices (TMHs) frequently occur amongst protein architectures as means for proteins to attach to or embed into biological membranes. Physical constraints such as the membrane’s hydrophobicity and electrostatic potential apply uniform requirements to TMHs and their flanking regions; consequently, they are mirrored in their sequence patterns (in addition to TMHs being a span of generally hydrophobic residues) on top of variations enforced by the specific protein’s biological functions. Results With statistics derived from a large body of protein sequences, we demonstrate that, in addition to the positive charge preference at the cytoplasmic inside (positive-inside rule), negatively charged residues preferentially occur or are even enriched at the non-cytoplasmic flank or, at least, they are suppressed at the cytoplasmic flank (negative-not-inside/negative-outside (NNI/NO) rule). As negative residues are generally rare within or near TMHs, the statistical significance is sensitive with regard to details of TMH alignment and residue frequency normalisation and also to dataset size; therefore, this trend was obscured in previous work. We observe variations amongst taxa as well as for organelles along the secretory pathway. The effect is most pronounced for TMHs from single-pass transmembrane (bitopic) proteins compared to those with multiple TMHs (polytopic proteins) and especially for the class of simple TMHs that evolved for the sole role as membrane anchors. Conclusions The charged-residue flank bias is only one of the TMH sequence features with a role in the anchorage mechanisms, others apparently being the leucine intra-helix propensity skew towards the cytoplasmic side, tryptophan flanking as well as the cysteine and tyrosine inside preference. These observations will stimulate new prediction methods for TMHs and protein topology from a sequence as well as new engineering designs for artificial membrane proteins. Electronic supplementary material The online version of this article (doi:10.1186/s12915-017-0404-4) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- James Alexander Baker
- Bioinformatics Institute, Agency for Science Technology and Research (A*STAR), 30 Biopolis Street #07-01, Matrix, Singapore, 138671, Singapore.,School of Chemistry, Manchester Institute of Biotechnology, 131 Princess Street, Manchester, M1 7DN, UK
| | - Wing-Cheong Wong
- Bioinformatics Institute, Agency for Science Technology and Research (A*STAR), 30 Biopolis Street #07-01, Matrix, Singapore, 138671, Singapore
| | - Birgit Eisenhaber
- Bioinformatics Institute, Agency for Science Technology and Research (A*STAR), 30 Biopolis Street #07-01, Matrix, Singapore, 138671, Singapore
| | - Jim Warwicker
- School of Chemistry, Manchester Institute of Biotechnology, 131 Princess Street, Manchester, M1 7DN, UK.
| | - Frank Eisenhaber
- Bioinformatics Institute, Agency for Science Technology and Research (A*STAR), 30 Biopolis Street #07-01, Matrix, Singapore, 138671, Singapore. .,School of Computer Engineering (SCE), Nanyang Technological University (NTU), 50 Nanyang Drive, Singapore, 637553, Singapore.
| |
Collapse
|
17
|
Yap CK, Eisenhaber B, Eisenhaber F, Wong WC. xHMMER3x2: Utilizing HMMER3's speed and HMMER2's sensitivity and specificity in the glocal alignment mode for improved large-scale protein domain annotation. Biol Direct 2016; 11:63. [PMID: 27894340 PMCID: PMC5126834 DOI: 10.1186/s13062-016-0163-0] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2016] [Accepted: 10/24/2016] [Indexed: 01/27/2023] Open
Abstract
BACKGROUND While the local-mode HMMER3 is notable for its massive speed improvement, the slower glocal-mode HMMER2 is more exact for domain annotation by enforcing full domain-to-sequence alignments. Since a unit of domain necessarily implies a unit of function, local-mode HMMER3 alone remains insufficient for precise function annotation tasks. In addition, the incomparable E-values for the same domain model by different HMMER builds create difficulty when checking for domain annotation consistency on a large-scale basis. RESULTS In this work, both the speed of HMMER3 and glocal-mode alignment of HMMER2 are combined within the xHMMER3x2 framework for tackling the large-scale domain annotation task. Briefly, HMMER3 is utilized for initial domain detection so that HMMER2 can subsequently perform the glocal-mode, sequence-to-full-domain alignments for the detected HMMER3 hits. An E-value calibration procedure is required to ensure that the search space by HMMER2 is sufficiently replicated by HMMER3. We find that the latter is straightforwardly possible for ~80% of the models in the Pfam domain library (release 29). However in the case of the remaining ~20% of HMMER3 domain models, the respective HMMER2 counterparts are more sensitive. Thus, HMMER3 searches alone are insufficient to ensure sensitivity and a HMMER2-based search needs to be initiated. When tested on the set of UniProt human sequences, xHMMER3x2 can be configured to be between 7× and 201× faster than HMMER2, but with descending domain detection sensitivity from 99.8 to 95.7% with respect to HMMER2 alone; HMMER3's sensitivity was 95.7%. At extremes, xHMMER3x2 is either the slow glocal-mode HMMER2 or the fast HMMER3 with glocal-mode. Finally, the E-values to false-positive rates (FPR) mapping by xHMMER3x2 allows E-values of different model builds to be compared, so that any annotation discrepancies in a large-scale annotation exercise can be flagged for further examination by dissectHMMER. CONCLUSION The xHMMER3x2 workflow allows large-scale domain annotation speed to be drastically improved over HMMER2 without compromising for domain-detection with regard to sensitivity and sequence-to-domain alignment incompleteness. The xHMMER3x2 code and its webserver (for Pfam release 27, 28 and 29) are freely available at http://xhmmer3x2.bii.a-star.edu.sg/ . REVIEWERS Reviewed by Thomas Dandekar, L. Aravind, Oliviero Carugo and Shamil Sunyaev. For the full reviews, please go to the Reviewers' comments section.
Collapse
Affiliation(s)
- Choon-Kong Yap
- Bioinformatics Institute (BII), Agency for Science, Technology and Research (A*STAR), 30 Biopolis Street, #07-01, Matrix, Singapore, 138671, Singapore
| | - Birgit Eisenhaber
- Bioinformatics Institute (BII), Agency for Science, Technology and Research (A*STAR), 30 Biopolis Street, #07-01, Matrix, Singapore, 138671, Singapore
| | - Frank Eisenhaber
- Bioinformatics Institute (BII), Agency for Science, Technology and Research (A*STAR), 30 Biopolis Street, #07-01, Matrix, Singapore, 138671, Singapore. .,School of Computer Engineering (SCE), Nanyang Technological University (NTU), 50 Nanyang Drive, Singapore, 637553, Singapore.
| | - Wing-Cheong Wong
- Bioinformatics Institute (BII), Agency for Science, Technology and Research (A*STAR), 30 Biopolis Street, #07-01, Matrix, Singapore, 138671, Singapore.
| |
Collapse
|
18
|
The Recipe for Protein Sequence-Based Function Prediction and Its Implementation in the ANNOTATOR Software Environment. Methods Mol Biol 2016; 1415:477-506. [PMID: 27115649 DOI: 10.1007/978-1-4939-3572-7_25] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/02/2023]
|
19
|
Sirota FL, Maurer-Stroh S, Eisenhaber B, Eisenhaber F. Single-residue posttranslational modification sites at the N-terminus, C-terminus or in-between: To be or not to be exposed for enzyme access. Proteomics 2016; 15:2525-46. [PMID: 26038108 PMCID: PMC4745020 DOI: 10.1002/pmic.201400633] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2014] [Revised: 04/17/2015] [Accepted: 05/29/2015] [Indexed: 11/30/2022]
Abstract
Many protein posttranslational modifications (PTMs) are the result of an enzymatic reaction. The modifying enzyme has to recognize the substrate protein's sequence motif containing the residue(s) to be modified; thus, the enzyme's catalytic cleft engulfs these residue(s) and the respective sequence environment. This residue accessibility condition principally limits the range where enzymatic PTMs can occur in the protein sequence. Non‐globular, flexible, intrinsically disordered segments or large loops/accessible long side chains should be preferred whereas residues buried in the core of structures should be void of what we call canonical, enzyme‐generated PTMs. We investigate whether PTM sites annotated in UniProtKB (with MOD_RES/LIPID keys) are situated within sequence ranges that can be mapped to known 3D structures. We find that N‐ or C‐termini harbor essentially exclusively canonical PTMs. We also find that the overwhelming majority of all other PTMs are also canonical though, later in the protein's life cycle, the PTM sites can become buried due to complex formation. Among the remaining cases, some can be explained (i) with autocatalysis, (ii) with modification before folding or after temporary unfolding, or (iii) as products of interaction with small, diffusible reactants. Others require further research how these PTMs are mechanistically generated in vivo.
Collapse
Affiliation(s)
- Fernanda L Sirota
- Bioinformatics Institute (BII), Agency for Science and Technology (A*STAR), Matrix, Singapore
| | - Sebastian Maurer-Stroh
- Bioinformatics Institute (BII), Agency for Science and Technology (A*STAR), Matrix, Singapore.,School of Biological Sciences (SBS), Nanyang Technological University (NTU), Singapore
| | - Birgit Eisenhaber
- Bioinformatics Institute (BII), Agency for Science and Technology (A*STAR), Matrix, Singapore
| | - Frank Eisenhaber
- Bioinformatics Institute (BII), Agency for Science and Technology (A*STAR), Matrix, Singapore.,Department of Biological Sciences (DBS), National University of Singapore (NUS), Singapore.,School of Computer Engineering (SCE), Nanyang Technological University (NTU), Singapore
| |
Collapse
|
20
|
Developing of the Computer Method for Annotation of Bacterial Genes. Adv Bioinformatics 2016; 2015:635437. [PMID: 26770195 PMCID: PMC4684837 DOI: 10.1155/2015/635437] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2015] [Revised: 11/16/2015] [Accepted: 11/18/2015] [Indexed: 02/07/2023] Open
Abstract
Over the last years a great number of bacterial genomes were sequenced. Now one of the most important challenges of computational genomics is the functional annotation of nucleic acid sequences. In this study we presented the computational method and the annotation system for predicting biological functions using phylogenetic profiles. The phylogenetic profile of a gene was created by way of searching for similarities between the nucleotide sequence of the gene and 1204 reference genomes, with further estimation of the statistical significance of found similarities. The profiles of the genes with known functions were used for prediction of possible functions and functional groups for the new genes. We conducted the functional annotation for genes from 104 bacterial genomes and compared the functions predicted by our system with the already known functions. For the genes that have already been annotated, the known function matched the function we predicted in 63% of the time, and in 86% of the time the known function was found within the top five predicted functions. Besides, our system increased the share of annotated genes by 19%. The developed system may be used as an alternative or complementary system to the current annotation systems.
Collapse
|
21
|
Scaiewicz A, Levitt M. The language of the protein universe. Curr Opin Genet Dev 2015; 35:50-6. [PMID: 26451980 DOI: 10.1016/j.gde.2015.08.010] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2015] [Revised: 08/20/2015] [Accepted: 08/25/2015] [Indexed: 11/17/2022]
Abstract
Proteins, the main cell machinery which play a major role in nearly every cellular process, have always been a central focus in biology. We live in the post-genomic era, and inferring information from massive data sets is a steadily growing universal challenge. The increasing availability of fully sequenced genomes can be regarded as the 'Rosetta Stone' of the protein universe, allowing the understanding of genomes and their evolution, just as the original Rosetta Stone allowed Champollion to decipher the ancient Egyptian hieroglyphics. In this review, we consider aspects of the protein domain architectures repertoire that are closely related to those of human languages and aim to provide some insights about the language of proteins.
Collapse
Affiliation(s)
- Andrea Scaiewicz
- Department of Structural Biology, Stanford University, Stanford, CA 94305-5126, United States
| | - Michael Levitt
- Department of Structural Biology, Stanford University, Stanford, CA 94305-5126, United States.
| |
Collapse
|
22
|
Sherman WA, Kuchibhatla DB, Limviphuvadh V, Maurer-Stroh S, Eisenhaber B, Eisenhaber F. HPMV: human protein mutation viewer - relating sequence mutations to protein sequence architecture and function changes. J Bioinform Comput Biol 2015; 13:1550028. [PMID: 26503432 DOI: 10.1142/s0219720015500286] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Next-generation sequencing advances are rapidly expanding the number of human mutations to be analyzed for causative roles in genetic disorders. Our Human Protein Mutation Viewer (HPMV) is intended to explore the biomolecular mechanistic significance of non-synonymous human mutations in protein-coding genomic regions. The tool helps to assess whether protein mutations affect the occurrence of sequence-architectural features (globular domains, targeting signals, post-translational modification sites, etc.). As input, HPMV accepts protein mutations - as UniProt accessions with mutations (e.g. HGVS nomenclature), genome coordinates, or FASTA sequences. As output, HPMV provides an interactive cartoon showing the mutations in relation to elements of the sequence architecture. A large variety of protein sequence architectural features were selected for their particular relevance to mutation interpretation. Clicking a sequence feature in the cartoon expands a tree view of additional information including multiple sequence alignments of conserved domains and a simple 3D viewer mapping the mutation to known PDB structures, if available. The cartoon is also correlated with a multiple sequence alignment of similar sequences from other organisms. In cases where a mutation is likely to have a straightforward interpretation (e.g. a point mutation disrupting a well-understood targeting signal), this interpretation is suggested. The interactive cartoon can be downloaded as standalone viewer in Java jar format to be saved and viewed later with only a standard Java runtime environment. The HPMV website is: http://hpmv.bii.a-star.edu.sg/ .
Collapse
Affiliation(s)
- Westley Arthur Sherman
- Bioinformatics Institute (BII), Agency for Science, Technology and Research (A*STAR), 30 Biopolis Street #07-01, Matrix, Singapore 138671, Singapore
| | - Durga Bhavani Kuchibhatla
- Bioinformatics Institute (BII), Agency for Science, Technology and Research (A*STAR), 30 Biopolis Street #07-01, Matrix, Singapore 138671, Singapore
| | - Vachiranee Limviphuvadh
- Bioinformatics Institute (BII), Agency for Science, Technology and Research (A*STAR), 30 Biopolis Street #07-01, Matrix, Singapore 138671, Singapore
| | - Sebastian Maurer-Stroh
- Bioinformatics Institute (BII), Agency for Science, Technology and Research (A*STAR), 30 Biopolis Street #07-01, Matrix, Singapore 138671, Singapore
- School of Biological Sciences (SBS), Nanyang Technological University (NTU), 60 Nanyang Drive, Singapore 637551, Singapore
| | - Birgit Eisenhaber
- Bioinformatics Institute (BII), Agency for Science, Technology and Research (A*STAR), 30 Biopolis Street #07-01, Matrix, Singapore 138671, Singapore
| | - Frank Eisenhaber
- Bioinformatics Institute (BII), Agency for Science, Technology and Research (A*STAR), 30 Biopolis Street #07-01, Matrix, Singapore 138671, Singapore
- Department of Biological Sciences (DBS), National University of Singapore (NUS), 8 Medical Drive 4, Singapore 117597, Singapore
- School of Computer Engineering (SCE), Nanyang Technological University (NTU), 50 Nanyang Drive, Singapore 637553, Singapore
| |
Collapse
|
23
|
Wong WC, Yap CK, Eisenhaber B, Eisenhaber F. dissectHMMER: a HMMER-based score dissection framework that statistically evaluates fold-critical sequence segments for domain fold similarity. Biol Direct 2015; 10:39. [PMID: 26228544 PMCID: PMC4521371 DOI: 10.1186/s13062-015-0068-3] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2015] [Accepted: 07/20/2015] [Indexed: 11/10/2022] Open
Abstract
Background Annotation transfer for function and structure within the sequence homology concept essentially requires protein sequence similarity for the secondary structural blocks forming the fold of a protein. A simplistic similarity approach in the case of non-globular segments (coiled coils, low complexity regions, transmembrane regions, long loops, etc.) is not justified and a pertinent source for mistaken homologies. The latter is either due to positional sequence conservation as a result of a very simple, physically induced pattern or integral sequence properties that are critical for function. Furthermore, against the backdrop that the number of well-studied proteins continues to grow at a slow rate, it necessitates for a search methodology to dive deeper into the sequence similarity space to connect the unknown sequences to the well-studied ones, albeit more distant, for biological function postulations. Results Based on our previous work of dissecting the hidden markov model (HMMER) based similarity score into fold-critical and the non-globular contributions to improve homology inference, we propose a framework-dissectHMMER, that identifies more fold-related domain hits from standard HMMER searches. Subsequent statistical stratification of the fold-related hits into cohorts of functionally-related domains allows for the function postulation of the query sequence. Briefly, the technical problems as to how to recognize non-globular parts in the domain model, resolve contradictory HMMER2/HMMER3 results and evaluate fold-related domain hits for homology, are addressed in this work. The framework is benchmarked against a set of SCOP-to-Pfam domain models. Despite being a sequence-to-profile method, dissectHMMER performs favorably against a profile-to-profile based method-HHsuite/HHsearch. Examples of function annotation using dissectHMMER, including the function discovery of an uncharacterized membrane protein Q9K8K1_BACHD (WP_010899149.1) as a lactose/H+ symporter, are presented. Finally, dissectHMMER webserver is made publicly available at http://dissecthmmer.bii.a-star.edu.sg. Conclusions The proposed framework-dissectHMMER, is faithful to the original inception of the sequence homology concept while improving upon the existing HMMER search tool through the rescue of statistically evaluated false-negative yet fold-related domain hits to the query sequence. Overall, this translates into an opportunity for any novel protein sequence to be functionally characterized. Reviewers This article was reviewed by Masanori Arita, Shamil Sunyaev and L. Aravind. Electronic supplementary material The online version of this article (doi:10.1186/s13062-015-0068-3) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Wing-Cheong Wong
- Bioinformatics Institute (BII), Agency for Science, Technology and Research (A*STAR), 30 Biopolis Street, #07-01, Matrix, Singapore, 138671, Singapore.
| | - Choon-Kong Yap
- Bioinformatics Institute (BII), Agency for Science, Technology and Research (A*STAR), 30 Biopolis Street, #07-01, Matrix, Singapore, 138671, Singapore.
| | - Birgit Eisenhaber
- Bioinformatics Institute (BII), Agency for Science, Technology and Research (A*STAR), 30 Biopolis Street, #07-01, Matrix, Singapore, 138671, Singapore.
| | - Frank Eisenhaber
- Bioinformatics Institute (BII), Agency for Science, Technology and Research (A*STAR), 30 Biopolis Street, #07-01, Matrix, Singapore, 138671, Singapore. .,Department of Biological Sciences (DBS), National University of Singapore (NUS), 8 Medical Drive, Singapore, 117597, Singapore. .,School of Computer Engineering (SCE), Nanyang Technological University (NTU), 50 Nanyang Drive, Singapore, 637553, Singapore.
| |
Collapse
|
24
|
Mudgal R, Sandhya S, Chandra N, Srinivasan N. De-DUFing the DUFs: Deciphering distant evolutionary relationships of Domains of Unknown Function using sensitive homology detection methods. Biol Direct 2015; 10:38. [PMID: 26228684 PMCID: PMC4520260 DOI: 10.1186/s13062-015-0069-2] [Citation(s) in RCA: 30] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2015] [Accepted: 07/20/2015] [Indexed: 12/23/2022] Open
Abstract
Background In the post-genomic era where sequences are being determined at a rapid rate, we are highly reliant on computational methods for their tentative biochemical characterization. The Pfam database currently contains 3,786 families corresponding to “Domains of Unknown Function” (DUF) or “Uncharacterized Protein Family” (UPF), of which 3,087 families have no reported three-dimensional structure, constituting almost one-fourth of the known protein families in search for both structure and function. Results We applied a ‘computational structural genomics’ approach using five state-of-the-art remote similarity detection methods to detect the relationship between uncharacterized DUFs and domain families of known structures. The association with a structural domain family could serve as a start point in elucidating the function of a DUF. Amongst these five methods, searches in SCOP-NrichD database have been applied for the first time. Predictions were classified into high, medium and low- confidence based on the consensus of results from various approaches and also annotated with enzyme and Gene ontology terms. 614 uncharacterized DUFs could be associated with a known structural domain, of which high confidence predictions, involving at least four methods, were made for 54 families. These structure-function relationships for the 614 DUF families can be accessed on-line at http://proline.biochem.iisc.ernet.in/RHD_DUFS/. For potential enzymes in this set, we assessed their compatibility with the associated fold and performed detailed structural and functional annotation by examining alignments and extent of conservation of functional residues. Detailed discussion is provided for interesting assignments for DUF3050, DUF1636, DUF1572, DUF2092 and DUF659. Conclusions This study provides insights into the structure and potential function for nearly 20 % of the DUFs. Use of different computational approaches enables us to reliably recognize distant relationships, especially when they converge to a common assignment because the methods are often complementary. We observe that while pointers to the structural domain can offer the right clues to the function of a protein, recognition of its precise functional role is still ‘non-trivial’ with many DUF domains conserving only some of the critical residues. It is not clear whether these are functional vestiges or instances involving alternate substrates and interacting partners. Reviewers This article was reviewed by Drs Eugene Koonin, Frank Eisenhaber and Srikrishna Subramanian. Electronic supplementary material The online version of this article (doi:10.1186/s13062-015-0069-2) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Richa Mudgal
- IISc Mathematics Initiative, Indian Institute of Science, Bangalore, 560 012, India.
| | - Sankaran Sandhya
- Molecular Biophysics Unit, Indian Institute of Science, Bangalore, 560 012, India.
| | - Nagasuma Chandra
- Department of Biochemistry, Indian Institute of Science, Bangalore, 560 012, India.
| | | |
Collapse
|
25
|
Eisenhaber F, Sherman WA. 10 years for the Journal of Bioinformatics and Computational Biology (2003-2013) -- a retrospective. J Bioinform Comput Biol 2014; 12:1471001. [PMID: 24969752 DOI: 10.1142/s0219720014710012] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
The Journal of Bioinformatics and Computational Biology (JBCB) started publishing scientific articles in 2003. It has established itself as home for solid research articles in the field (~ 60 per year) that are surprisingly well cited. JBCB has an important function as alternative publishing channel in addition to other, bigger journals.
Collapse
Affiliation(s)
- Frank Eisenhaber
- Bioinformatics Institute, Agency for Science, Technology and Research, 30 Biopolis Street #07-01, Matrix, Singapore 138671, Singapore , Department of Biological Sciences, National University of Singapore, 8 Medical Drive, Singapore 117597, Singapore , School of Computer Engineering, Nanyang Technological University, 50 Nanyang Drive, Singapore 637553, Singapore
| | | |
Collapse
|
26
|
Eisenhaber F. Unix interfaces, Kleisli, bucandin structure, etc. -- the heroic beginning of bioinformatics in Singapore. J Bioinform Comput Biol 2014; 12:1471002. [PMID: 24969753 DOI: 10.1142/s0219720014710024] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Remarkably, Singapore as one of today's hotspots for bioinformatics and computational biology research appeared de novo out of pioneering efforts of engaged local individuals in the early 90-s that, supported with increasing public funds from 1996 on, morphed into the present vibrant research community. This article brings to mind the pioneers, their first successes and early institutional developments.
Collapse
Affiliation(s)
- Frank Eisenhaber
- Bioinformatics Institute (BII), Agency for Science, Technology and Research (A*STAR), 30 Biopolis Street, #07-01, Matrix, Singapore 138671, Singapore , Department of Biological Sciences (DBS), National University of Singapore (NUS), 8 Medical Drive, Singapore 117597, Singapore , School of Computer Engineering (SCE), Nanyang Technological University (NTU), 50 Nanyang Drive, Singapore 637553, Singapore
| |
Collapse
|
27
|
Eisenhaber B, Eisenhaber S, Kwang TY, Grüber G, Eisenhaber F. Transamidase subunit GAA1/GPAA1 is a M28 family metallo-peptide-synthetase that catalyzes the peptide bond formation between the substrate protein's omega-site and the GPI lipid anchor's phosphoethanolamine. Cell Cycle 2014; 13:1912-7. [PMID: 24743167 PMCID: PMC4111754 DOI: 10.4161/cc.28761] [Citation(s) in RCA: 35] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022] Open
Abstract
The transamidase subunit GAA1/GPAA1 is predicted to be the enzyme that catalyzes the attachment of the glycosylphosphatidyl (GPI) lipid anchor to the carbonyl intermediate of the substrate protein at the ω-site. Its ~300-amino acid residue lumenal domain is a M28 family metallo-peptide-synthetase with an α/β hydrolase fold, including a central 8-strand β-sheet and a single metal (most likely zinc) ion coordinated by 3 conserved polar residues. Phosphoethanolamine is used as an adaptor to make the non-peptide GPI lipid anchor look chemically similar to the N terminus of a peptide.
Collapse
Affiliation(s)
- Birgit Eisenhaber
- Bioinformatics Institute (BII); A*STAR; Singapore, Republic of Singapore
| | - Stephan Eisenhaber
- Department of Physical Chemistry; University of Vienna; Wien/Vienna, Republic of Austria
| | - Toh Yew Kwang
- Bioinformatics Institute (BII); A*STAR; Singapore, Republic of Singapore
| | - Gerhard Grüber
- Bioinformatics Institute (BII); A*STAR; Singapore, Republic of Singapore; Nanyang Technological University; School of Biological Sciences; Singapore, Republic of Singapore
| | - Frank Eisenhaber
- Bioinformatics Institute (BII); A*STAR; Singapore, Republic of Singapore; Department of Biological Sciences (DBS); National University of Singapore (NUS); Singapore, Republic of Singapore; School of Computer Engineering (SCE); Nanyang Technological University (NTU); Singapore, Republic of Singapore
| |
Collapse
|
28
|
Eisenhaber F, Sung WK, Wong L. Guest Editorial for the International Conference on Genome Informatics (GIW 2013). IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2014; 11:5-6. [PMID: 26605388 DOI: 10.1109/tcbb.2014.2299751] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
|
29
|
EISENHABER FRANK, SUNG WINGKIN, WONG LIMSOON. THE 24TH INTERNATIONAL CONFERENCE ON GENOME INFORMATICS, GIW2013, IN SINGAPORE. J Bioinform Comput Biol 2013. [DOI: 10.1142/s0219720013020034] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Affiliation(s)
- FRANK EISENHABER
- Bioinformatics Institute, Agency for Science, Technology and Research, 30 Biopolis Street #07-01, Matrix, Singapore 138671, Singapore
- Department of Biological Sciences, National University of Singapore, 8 Medical Drive, Singapore 117597, Singapore
- School of Computer Engineering, Nanyang Technological University, 50 Nanyang Drive, Singapore 637553, Singapore
| | - WING-KIN SUNG
- School of Computing, National University of Singapore, 13 Computing Drive, Singapore 117417, Singapore
- Genome Institute of Singapore, 60 Biopolis Street #02-01, Genome, Singapore 138672, Singapore
| | - LIMSOON WONG
- School of Computing, National University of Singapore, 13 Computing Drive, Singapore 117417, Singapore
| |
Collapse
|
30
|
MAURER-STROH SEBASTIAN, GAO HE, HAN HAO, BAETEN LIES, SCHYMKOWITZ JOOST, ROUSSEAU FREDERIC, ZHANG LOUXIN, EISENHABER FRANK. MOTIF DISCOVERY WITH DATA MINING IN 3D PROTEIN STRUCTURE DATABASES: DISCOVERY, VALIDATION AND PREDICTION OF THE U-SHAPE ZINC BINDING ("HUF-ZINC") MOTIF. J Bioinform Comput Biol 2013; 11:1340008. [DOI: 10.1142/s0219720013400088] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Data mining in protein databases, derivatives from more fundamental protein 3D structure and sequence databases, has considerable unearthed potential for the discovery of sequence motif—structural motif—function relationships as the finding of the U-shape (Huf-Zinc) motif, originally a small student's project, exemplifies. The metal ion zinc is critically involved in universal biological processes, ranging from protein-DNA complexes and transcription regulation to enzymatic catalysis and metabolic pathways. Proteins have evolved a series of motifs to specifically recognize and bind zinc ions. Many of these, so called zinc fingers, are structurally independent globular domains with discontinuous binding motifs made up of residues mostly far apart in sequence. Through a systematic approach starting from the BRIX structure fragment database, we discovered that there exists another predictable subset of zinc-binding motifs that not only have a conserved continuous sequence pattern but also share a characteristic local conformation, despite being included in totally different overall folds. While this does not allow general prediction of all Zn binding motifs, a HMM-based web server, Huf-Zinc, is available for prediction of these novel, as well as conventional, zinc finger motifs in protein sequences. The Huf-Zinc webserver can be freely accessed through this URL ( http://mendel.bii.a-star.edu.sg/METHODS/hufzinc/ ).
Collapse
Affiliation(s)
- SEBASTIAN MAURER-STROH
- Bioinformatics Institute (BII), Agency for Science and Technology (A*STAR), 30 Biopolis Street, #07-01, Matrix, 138671, Singapore
- School of Biological Sciences (SBS), Nanyang Technological University (NTU), 60 Nanyang Drive, 637551, Singapore
| | - HE GAO
- Bioinformatics Institute (BII), Agency for Science and Technology (A*STAR), 30 Biopolis Street, #07-01, Matrix, 138671, Singapore
- NUS Graduate School for Integrative Sciences and Engineering, National University of Singapore, Centre for Life Sciences, #05-01, 28 Medical Drive, Singapore 117456, Singapore
| | - HAO HAN
- Bioinformatics Institute (BII), Agency for Science and Technology (A*STAR), 30 Biopolis Street, #07-01, Matrix, 138671, Singapore
| | - LIES BAETEN
- VIB Switch Laboratory, Katholieke Universiteit Leuven, Herestraat 49, Box 802, 3000 Leuven, Belgium
| | - JOOST SCHYMKOWITZ
- VIB Switch Laboratory, Katholieke Universiteit Leuven, Herestraat 49, Box 802, 3000 Leuven, Belgium
| | - FREDERIC ROUSSEAU
- VIB Switch Laboratory, Katholieke Universiteit Leuven, Herestraat 49, Box 802, 3000 Leuven, Belgium
| | - LOUXIN ZHANG
- Department of Mathematics, National University of Singapore, 10 Lower Kent Ridge Road, Singapore 119076, Singapore
| | - FRANK EISENHABER
- Bioinformatics Institute (BII), Agency for Science and Technology (A*STAR), 30 Biopolis Street, #07-01, Matrix, 138671, Singapore
- Department of Biological Sciences (DBS), National University of Singapore (NUS), 8 Medical Drive 4, 117597, Singapore
- School of Computer Engineering (SCE), Nanyang Technological University (NTU), 50 Nanyang Drive, 637553, Singapore
| |
Collapse
|
31
|
Kuznetsov V, Lee HK, Maurer-Stroh S, Molnár MJ, Pongor S, Eisenhaber B, Eisenhaber F. How bioinformatics influences health informatics: usage of biomolecular sequences, expression profiles and automated microscopic image analyses for clinical needs and public health. Health Inf Sci Syst 2013; 1:2. [PMID: 25825654 PMCID: PMC4336111 DOI: 10.1186/2047-2501-1-2] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2012] [Accepted: 10/05/2012] [Indexed: 01/25/2023] Open
Abstract
ABSTRACT The currently hyped expectation of personalized medicine is often associated with just achieving the information technology led integration of biomolecular sequencing, expression and histopathological bioimaging data with clinical records at the individual patients' level as if the significant biomedical conclusions would be its more or less mandatory result. It remains a sad fact that many, if not most biomolecular mechanisms that translate the human genomic information into phenotypes are not known and, thus, most of the molecular and cellular data cannot be interpreted in terms of biomedically relevant conclusions. Whereas the historical trend will certainly be into the general direction of personalized diagnostics and cures, the temperate view suggests that biomedical applications that rely either on the comparison of biomolecular sequences and/or on the already known biomolecular mechanisms have much greater chances to enter clinical practice soon. In addition to considering the general trends, we exemplarily review advances in the area of cancer biomarker discovery, in the clinically relevant characterization of patient-specific viral and bacterial pathogens (with emphasis on drug selection for influenza and enterohemorrhagic E. coli) as well as progress in the automated assessment of histopathological images. As molecular and cellular data analysis will become instrumental for achieving desirable clinical outcomes, the role of bioinformatics and computational biology approaches will dramatically grow. AUTHOR SUMMARY With DNA sequencing and computers becoming increasingly cheap and accessible to the layman, the idea of integrating biomolecular and clinical patient data seems to become a realistic, short-term option that will lead to patient-specific diagnostics and treatment design for many diseases such as cancer, metabolic disorders, inherited conditions, etc. These hyped expectations will fail since many, if not most biomolecular mechanisms that translate the human genomic information into phenotypes are not known yet and, thus, most of the molecular and cellular data collected will not lead to biomedically relevant conclusions. At the same time, less spectacular biomedical applications based on biomolecular sequence comparison and/or known biomolecular mechanisms have the potential to unfold enormous potential for healthcare and public health. Since the analysis of heterogeneous biomolecular data in context with clinical data will be increasingly critical, the role of bioinformatics and computational biology will grow correspondingly in this process.
Collapse
Affiliation(s)
- Vladimir Kuznetsov
- Bioinformatics Institute (BII), Agency for Science, Technology and Research (A*STAR), 30 Biopolis Street, #07-01, Matrix, 138671 Singapore
- School of Computer Engineering (SCE), Nanyang Technological University (NTU), 50 Nanyang Drive, Singapore, 637553 Singapore
| | - Hwee Kuan Lee
- Bioinformatics Institute (BII), Agency for Science, Technology and Research (A*STAR), 30 Biopolis Street, #07-01, Matrix, 138671 Singapore
| | - Sebastian Maurer-Stroh
- Bioinformatics Institute (BII), Agency for Science, Technology and Research (A*STAR), 30 Biopolis Street, #07-01, Matrix, 138671 Singapore
- School of Biological Sciences (SBS), Nanyang Technological University (NTU), 60 Nanyang Drive, Singapore, 637551 Singapore
| | - Maria Judit Molnár
- Institute of Genomic Medicine and Rare Disorders, Tömö Street 25-29, 1083 Budapest, Hungary
| | - Sandor Pongor
- Faculty of Information Technology, Pázmány Péter Catholic University, Budapest, Hungary (PPKE), Práter u. 50/a, 1083, Budapest, Hungary
| | - Birgit Eisenhaber
- Bioinformatics Institute (BII), Agency for Science, Technology and Research (A*STAR), 30 Biopolis Street, #07-01, Matrix, 138671 Singapore
| | - Frank Eisenhaber
- Bioinformatics Institute (BII), Agency for Science, Technology and Research (A*STAR), 30 Biopolis Street, #07-01, Matrix, 138671 Singapore
- School of Computer Engineering (SCE), Nanyang Technological University (NTU), 50 Nanyang Drive, Singapore, 637553 Singapore
- Department of Biological Sciences (DBS), National University of Singapore (NUS), 8 Medical Drive, Singapore, 117597 Singapore
| |
Collapse
|
32
|
Sirota FL, Batagov A, Schneider G, Eisenhaber B, Eisenhaber F, Maurer-Stroh S. Beware of moving targets: reference proteome content fluctuates substantially over the years. J Bioinform Comput Biol 2012; 10:1250020. [PMID: 22867629 DOI: 10.1142/s0219720012500205] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Reference proteomes are generated by increasingly sophisticated annotation pipelines as part of regular genome build releases; yet, the corresponding changes in reference proteomes' content are dramatic. In the history of the NCBI-curated human proteome, the total number of entries has remained roughly constant but approximately half of the proteins from the 2003 build 33 are no longer represented by entries in current releases, while about the same number of new proteins have been added (for sequence identity thresholds 50-90%). Although mostly hypothetical proteins are affected, there are also spectacular cases of entry removal/addition of well studied proteins. The changes between the 2003 and recent human proteomes are in a similar order of magnitude as the differences between recent human and chimpanzee proteome releases. As an application example, we show that the proteome fluctuations affect the interpretation (about 74% of hits) of organelle-specific mass-spectrometry data. Although proteome quality tends to improve with more recent releases as, for example, the fraction of proteins with functional annotation has increased over time, existing evidence implies that, apparently, the proteome content still remains incomplete, not just pertaining to isoforms/sequence variants but also to proteins and their families that are clearly distinct.
Collapse
Affiliation(s)
- Fernanda L Sirota
- Bioinformatics Institute (BII), Agency for Science and Technology (A*STAR), 30 Biopolis Street, #07-01, Matrix, 138671, Singapore.
| | | | | | | | | | | |
Collapse
|