Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For:	[Subscribe] [Scholar Register]

Number

Cited by Other Article(s)

Alfatah M, Lim JJJ, Zhang Y, Naaz A, Cheng TYN, Yogasundaram S, Faidzinn NA, Lin JJ, Eisenhaber B, Eisenhaber F. Uncharacterized yeast gene YBR238C, an effector of TORC1 signaling in a mitochondrial feedback loop, accelerates cellular aging via HAP4- and RMD9-dependent mechanisms. eLife 2024;12:RP92178. [PMID: 38713053 PMCID: PMC11076046 DOI: 10.7554/elife.92178] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/08/2024] Open

Abstract

Uncovering the regulators of cellular aging will unravel the complexity of aging biology and identify potential therapeutic interventions to delay the onset and progress of chronic, aging-related diseases. In this work, we systematically compared genesets involved in regulating the lifespan of Saccharomyces cerevisiae (a powerful model organism to study the cellular aging of humans) and those with expression changes under rapamycin treatment. Among the functionally uncharacterized genes in the overlap set, YBR238C stood out as the only one downregulated by rapamycin and with an increased chronological and replicative lifespan upon deletion. We show that YBR238C and its paralog RMD9 oppositely affect mitochondria and aging. YBR238C deletion increases the cellular lifespan by enhancing mitochondrial function. Its overexpression accelerates cellular aging via mitochondrial dysfunction. We find that the phenotypic effect of YBR238C is largely explained by HAP4- and RMD9-dependent mechanisms. Furthermore, we find that genetic- or chemical-based induction of mitochondrial dysfunction increases TORC1 (Target of Rapamycin Complex 1) activity that, subsequently, accelerates cellular aging. Notably, TORC1 inhibition by rapamycin (or deletion of YBR238C) improves the shortened lifespan under these mitochondrial dysfunction conditions in yeast and human cells. The growth of mutant cells (a proxy of TORC1 activity) with enhanced mitochondrial function is sensitive to rapamycin whereas the growth of defective mitochondrial mutants is largely resistant to rapamycin compared to wild type. Our findings demonstrate a feedback loop between TORC1 and mitochondria (the TORC1-MItochondria-TORC1 (TOMITO) signaling process) that regulates cellular aging processes. Hereby, YBR238C is an effector of TORC1 modulating mitochondrial function.

Collapse

Tantoso E, Eisenhaber B, Sinha S, Jensen LJ, Eisenhaber F. Did the early full genome sequencing of yeast boost gene function discovery? Biol Direct 2023;18:46. [PMID: 37574542 PMCID: PMC10424406 DOI: 10.1186/s13062-023-00403-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2023] [Accepted: 08/01/2023] [Indexed: 08/15/2023] Open

Abstract

BACKGROUND

Although the genome of Saccharomyces cerevisiae (S. cerevisiae) was the first one of a eukaryote organism that was fully sequenced (in 1996), a complete understanding of the potential of encoded biomolecular mechanisms has not yet been achieved. Here, we wish to quantify how far the goal of a full list of S. cerevisiae gene functions still is.

RESULTS

The scientific literature about S. cerevisiae protein-coding genes has been mapped onto the yeast genome via the mentioning of names for genomic regions in scientific publications. The match was quantified with the ratio of a given gene name's occurrences to those of any gene names in the article. We find that ~ 230 elite genes with ≥ 75 full publication equivalents (FPEs, FPE = 1 is an idealized publication referring to just a single gene) command ~ 45% of all literature. At the same time, about two thirds of the genes (each with less than 10 FPEs) are described in just 12% of the literature (in average each such gene has just ~ 1.5% of the literature of an elite gene). About 600 genes have not been mentioned in any dedicated article. Compared with other groups of genes, the literature growth rates were highest for uncharacterized or understudied genes until late nineties of the twentieth century. Yet, these growth rates deteriorated and became negative thereafter. Thus, yeast function discovery for previously uncharacterized genes has returned to the level of ~ 1980. At the same time, literature for anyhow well-studied genes (with a threshold T10 (≥ 10 FPEs) and higher) remains steadily growing.

CONCLUSIONS

Did the early full genome sequencing of yeast boost gene function discovery? The data proves that the moment of publishing the full genome in reality coincides with the onset of decline of gene function discovery for previously uncharacterized genes. If the current status of literature about yeast molecular mechanisms can be extrapolated into the future, it will take about another ~ 50 years to complete the yeast gene function list. We found that a small group of scientific journals contributed extraordinarily to publishing early reports relevant to yeast gene function discoveries.

Collapse

Tantoso E, Eisenhaber B, Sinha S, Jensen LJ, Eisenhaber F. About the dark corners in the gene function space of Escherichia coli remaining without illumination by scientific literature. Biol Direct 2023;18:7. [PMID: 36855185 PMCID: PMC9976479 DOI: 10.1186/s13062-023-00362-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2022] [Accepted: 02/21/2023] [Indexed: 03/02/2023] Open

Abstract

BACKGROUND

Although Escherichia coli (E. coli) is the most studied prokaryote organism in the history of life sciences, many molecular mechanisms and gene functions encoded in its genome remain to be discovered. This work aims at quantifying the illumination of the E. coli gene function space by the scientific literature and how close we are towards the goal of a complete list of E. coli gene functions.

RESULTS

The scientific literature about E. coli protein-coding genes has been mapped onto the genome via the mentioning of names for genomic regions in scientific articles both for the case of the strain K-12 MG1655 as well as for the 95%-threshold softcore genome of 1324 E. coli strains with known complete genome. The article match was quantified with the ratio of a given gene name's occurrence to the mentioning of any gene names in the paper. The various genome regions have an extremely uneven literature coverage. A group of elite genes with ≥ 100 full publication equivalents (FPEs, FPE = 1 is an idealized publication devoted to just a single gene) attracts the lion share of the papers. For K-12, ~ 65% of the literature covers just 342 elite genes; for the softcore genome, ~ 68% of the FPEs is about only 342 elite gene families (GFs). We also find that most genes/GFs have at least one mentioning in a dedicated scientific article (with the exception of at least 137 protein-coding transcripts for K-12 and 26 GFs from the softcore genome). Whereas the literature growth rates were highest for uncharacterized or understudied genes until 2005-2010 compared with other groups of genes, they became negative thereafter. At the same time, literature for anyhow well-studied genes started to grow explosively with threshold T10 (≥ 10 FPEs). Typically, a body of ~ 20 actual articles generated over ~ 15 years of research effort was necessary to reach T10. Lineage-specific co-occurrence analysis of genes belonging to the accessory genome of E. coli together with genomic co-localization and sequence-analytic exploration hints previously completely uncharacterized genes yahV and yddL being associated with osmotic stress response/motility mechanisms.

CONCLUSION

If the numbers of scientific articles about uncharacterized and understudied genes remain at least at present levels, full gene function lists for the strain K-12 MG1655 and the E. coli softcore genome are in reach within the next 25-30 years. Once the literature body for a gene crosses 10 FPEs, most of the critical fundamental research risk appears overcome and steady incremental research becomes possible.

Collapse

Tang YJ, Pang YH, Liu B. DeepIDP-2L: protein intrinsically disordered region prediction by combining convolutional attention network and hierarchical attention network. Bioinformatics 2022;38:1252-1260. [PMID: 34864847 DOI: 10.1093/bioinformatics/btab810] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2021] [Revised: 11/02/2021] [Accepted: 11/26/2021] [Indexed: 01/05/2023] Open

Tantoso E, Eisenhaber B, Eisenhaber F. Optimizing the Parametrization of Homologue Classification in the Pan-Genome Computation for a Bacterial Species: Case Study Streptococcus pyogenes. Methods Mol Biol 2022;2449:299-324. [PMID: 35507269 DOI: 10.1007/978-1-0716-2095-3_13] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]

Eisenhaber F, Verma C, Blundell T. In memoriam of Narayanaswamy Srinivasan (1962-2021). Proteins 2021. [PMID: 34825411 DOI: 10.1002/prot.26287] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]

Tang YJ, Pang YH, Liu B. IDP-Seq2Seq: identification of intrinsically disordered regions based on sequence to sequence learning. Bioinformatics 2021;36:5177-5186. [PMID: 32702119 DOI: 10.1093/bioinformatics/btaa667] [Citation(s) in RCA: 85] [Impact Index Per Article: 28.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2020] [Revised: 06/21/2020] [Accepted: 07/17/2020] [Indexed: 12/29/2022] Open

Abstract

MOTIVATION

Related to many important biological functions, intrinsically disordered regions (IDRs) are widely distributed in proteins. Accurate prediction of IDRs is critical for the protein structure and function analysis. However, the existing computational methods construct the predictive models solely in the sequence space, failing to convert the sequence space into the 'semantic space' to reflect the structure characteristics of proteins. Furthermore, although the length-dependent predictors showed promising results, new fusion strategies should be explored to improve their predictive performance and the generalization.

RESULTS

In this study, we applied the Sequence to Sequence Learning (Seq2Seq) derived from natural language processing (NLP) to map protein sequences to 'semantic space' to reflect the structure patterns with the help of predicted residue-residue contacts (CCMs) and other sequence-based features. Furthermore, the Attention mechanism was used to capture the global associations between all residue pairs in the proteins. Three length-dependent predictors were constructed: IDP-Seq2Seq-L for long disordered region prediction, IDP-Seq2Seq-S for short disordered region prediction and IDP-Seq2Seq-G for both long and short disordered region predictions. Finally, these three predictors were fused into one predictor called IDP-Seq2Seq to improve the discriminative power and generalization. Experimental results on four independent test datasets and the CASP test dataset showed that IDP-Seq2Seq is insensitive with the ratios of long and short disordered regions and outperforms other competing methods.

AVAILABILITY AND IMPLEMENTATION

For the convenience of most experimental scientists, a user-friendly and publicly accessible web-server for the powerful new predictor has been established at http://bliulab.net/IDP-Seq2Seq/. It is anticipated that IDP-Seq2Seq will become a very useful tool for identification of IDRs.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

Collapse

Niska-Blakie J, Gopinathan L, Low KN, Kien YL, Goh CMF, Caldez MJ, Pfeiffenberger E, Jones OS, Ong CB, Kurochkin IV, Coppola V, Tessarollo L, Choi H, Kanagasundaram Y, Eisenhaber F, Maurer-Stroh S, Kaldis P. Knockout of the non-essential gene SUGCT creates diet-linked, age-related microbiome disbalance with a diabetes-like metabolic syndrome phenotype. Cell Mol Life Sci 2020;77:3423-3439. [PMID: 31722069 PMCID: PMC7426296 DOI: 10.1007/s00018-019-03359-z] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2019] [Revised: 10/23/2019] [Accepted: 10/29/2019] [Indexed: 02/07/2023]

Affiliation(s)

Joanna Niska-Blakie Institute of Molecular and Cell Biology (IMCB), ASTAR (Agency for Science, Technology and Research), 61 Biopolis Drive, Proteos #3-09, Singapore, 138673, Republic of Singapore Bioinformatics Institute (BII), ASTAR, Singapore, 138671, Republic of Singapore
Lakshmi Gopinathan Institute of Molecular and Cell Biology (IMCB), A*STAR (Agency for Science, Technology and Research), 61 Biopolis Drive, Proteos #3-09, Singapore, 138673, Republic of Singapore
Kia Ngee Low Bioinformatics Institute (BII), A*STAR, Singapore, 138671, Republic of Singapore
Yang Lay Kien Bioinformatics Institute (BII), A*STAR, Singapore, 138671, Republic of Singapore
Christine M F Goh Institute of Molecular and Cell Biology (IMCB), A*STAR (Agency for Science, Technology and Research), 61 Biopolis Drive, Proteos #3-09, Singapore, 138673, Republic of Singapore
Matias J Caldez Institute of Molecular and Cell Biology (IMCB), A*STAR (Agency for Science, Technology and Research), 61 Biopolis Drive, Proteos #3-09, Singapore, 138673, Republic of Singapore Department of Biochemistry, National University of Singapore (NUS), Singapore, 117597, Republic of Singapore
Elisabeth Pfeiffenberger Institute of Molecular and Cell Biology (IMCB), A*STAR (Agency for Science, Technology and Research), 61 Biopolis Drive, Proteos #3-09, Singapore, 138673, Republic of Singapore
Oliver S Jones Institute of Molecular and Cell Biology (IMCB), A*STAR (Agency for Science, Technology and Research), 61 Biopolis Drive, Proteos #3-09, Singapore, 138673, Republic of Singapore
Chee Bing Ong Institute of Molecular and Cell Biology (IMCB), A*STAR (Agency for Science, Technology and Research), 61 Biopolis Drive, Proteos #3-09, Singapore, 138673, Republic of Singapore
Igor V Kurochkin Bioinformatics Institute (BII), A*STAR, Singapore, 138671, Republic of Singapore
Vincenzo Coppola Department of Cancer Biology and Genetics, The Ohio State University, 988 Biomedical Research Tower, 460 West 12th Ave, Columbus, OH, 43210, USA
Lino Tessarollo Mouse Cancer Genetics Program, National Cancer Institute, NCI-Frederick, Bldg. 560, 1050 Boyles Street, Frederick, MD, 21702-1201, USA
Hyungwon Choi Institute of Molecular and Cell Biology (IMCB), A*STAR (Agency for Science, Technology and Research), 61 Biopolis Drive, Proteos #3-09, Singapore, 138673, Republic of Singapore Department of Medicine, National University of Singapore (NUS), Singapore, 117597, Republic of Singapore
Yoganathan Kanagasundaram Bioinformatics Institute (BII), A*STAR, Singapore, 138671, Republic of Singapore
Frank Eisenhaber Bioinformatics Institute (BII), A*STAR, Singapore, 138671, Republic of Singapore School of Computer Science and Engineering (SCSE), Nanyang Technological University (NTU), Singapore, 637553, Republic of Singapore
Sebastian Maurer-Stroh Bioinformatics Institute (BII), A*STAR, Singapore, 138671, Republic of Singapore. Department of Biological Sciences (DBS), National University of Singapore (NUS), 14 Science Drive 4, Singapore, 117597, Republic of Singapore.
Philipp Kaldis Institute of Molecular and Cell Biology (IMCB), A*STAR (Agency for Science, Technology and Research), 61 Biopolis Drive, Proteos #3-09, Singapore, 138673, Republic of Singapore. Department of Biochemistry, National University of Singapore (NUS), Singapore, 117597, Republic of Singapore. Department of Clinical Sciences, Lund University, Clinical Research Centre (CRC), Box 50332, 202 13, Malmö, Sweden.

Collapse

Tantoso E, Wong WC, Tay WH, Lee J, Sinha S, Eisenhaber B, Eisenhaber F. Hypocrisy Around Medical Patient Data: Issues of Access for Biomedical Research, Data Quality, Usefulness for the Purpose and Omics Data as Game Changer. Asian Bioeth Rev 2019;11:189-207. [PMID: 33717311 PMCID: PMC7747340 DOI: 10.1007/s41649-019-00085-3] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2018] [Revised: 04/23/2019] [Accepted: 04/30/2019] [Indexed: 11/14/2022] Open

Ng SB, Kanagasundaram Y, Fan H, Arumugam P, Eisenhaber B, Eisenhaber F. The 160K Natural Organism Library, a unique resource for natural products research. Nat Biotechnol 2018;36:570-573. [PMID: 29979661 DOI: 10.1038/nbt.4187] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]

Sinha S, Eisenhaber B, Jensen LJ, Kalbuaji B, Eisenhaber F. Darkness in the Human Gene and Protein Function Space: Widely Modest or Absent Illumination by the Life Science Literature and the Trend for Fewer Protein Function Discoveries Since 2000. Proteomics 2018;18:e1800093. [PMID: 30265449 PMCID: PMC6282819 DOI: 10.1002/pmic.201800093] [Citation(s) in RCA: 20] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2018] [Revised: 09/07/2018] [Indexed: 12/15/2022]

Eisenhaber B, Sinha S, Wong WC, Eisenhaber F. Function of a membrane-embedded domain evolutionarily multiplied in the GPI lipid anchor pathway proteins PIG-B, PIG-M, PIG-U, PIG-W, PIG-V, and PIG-Z. Cell Cycle 2018;17:874-880. [PMID: 29764287 PMCID: PMC6056205 DOI: 10.1080/15384101.2018.1456294] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open

Limviphuvadh V, Tan CS, Konishi F, Jenjaroenpun P, Xiang JS, Kremenska Y, Mu YS, Syn N, Lee SC, Soo RA, Eisenhaber F, Maurer-Stroh S, Yong WP. Discovering novel SNPs that are correlated with patient outcome in a Singaporean cancer patient cohort treated with gemcitabine-based chemotherapy. BMC Cancer 2018;18:555. [PMID: 29751792 PMCID: PMC5948914 DOI: 10.1186/s12885-018-4471-x] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2017] [Accepted: 05/01/2018] [Indexed: 12/20/2022] Open

Affiliation(s)

Vachiranee Limviphuvadh Bioinformatics Institute (BII), Agency for Science, Technology and Research (A*STAR), 30 Biopolis Street, #07-01 Matrix, Singapore, 138671, Singapore
Chee Seng Tan Department of Haematology-Oncology, National University Health System, 5 Lower Kent Ridge Road, Singapore, 119074, Singapore
Fumikazu Konishi Education Academy of Computational Life Sciences, Tokyo Institute of Technology, Tokyo, Japan
Piroon Jenjaroenpun Bioinformatics Institute (BII), Agency for Science, Technology and Research (A*STAR), 30 Biopolis Street, #07-01 Matrix, Singapore, 138671, Singapore
Joy Shengnan Xiang Bioinformatics Institute (BII), Agency for Science, Technology and Research (A*STAR), 30 Biopolis Street, #07-01 Matrix, Singapore, 138671, Singapore
Yuliya Kremenska Bioinformatics Institute (BII), Agency for Science, Technology and Research (A*STAR), 30 Biopolis Street, #07-01 Matrix, Singapore, 138671, Singapore
Yar Soe Mu Department of Haematology-Oncology, National University Health System, 5 Lower Kent Ridge Road, Singapore, 119074, Singapore
Nicholas Syn Department of Haematology-Oncology, National University Health System, 5 Lower Kent Ridge Road, Singapore, 119074, Singapore.,Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore
Soo Chin Lee Department of Haematology-Oncology, National University Health System, 5 Lower Kent Ridge Road, Singapore, 119074, Singapore
Ross A Soo Department of Haematology-Oncology, National University Health System, 5 Lower Kent Ridge Road, Singapore, 119074, Singapore.,Cancer Science Institute of Singapore, National University of Singapore, Singapore, Singapore
Frank Eisenhaber Bioinformatics Institute (BII), Agency for Science, Technology and Research (A*STAR), 30 Biopolis Street, #07-01 Matrix, Singapore, 138671, Singapore.,Department of Biological Sciences, National University of Singapore (NUS), 14 Science Drive 4, Singapore, 117543, Singapore.,School of Computer Engineering (SCE), Nanyang Technological University (NTU), 50 Nanyang Drive, Singapore, 637553, Singapore
Sebastian Maurer-Stroh Bioinformatics Institute (BII), Agency for Science, Technology and Research (A*STAR), 30 Biopolis Street, #07-01 Matrix, Singapore, 138671, Singapore.,Department of Biological Sciences, National University of Singapore (NUS), 14 Science Drive 4, Singapore, 117543, Singapore
Wei Peng Yong Department of Haematology-Oncology, National University Health System, 5 Lower Kent Ridge Road, Singapore, 119074, Singapore.

Collapse

Kumar G, Mudgal R, Srinivasan N, Sandhya S. Use of designed sequences in protein structure recognition. Biol Direct 2018;13:8. [PMID: 29776380 PMCID: PMC5960202 DOI: 10.1186/s13062-018-0209-6] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2017] [Accepted: 04/18/2018] [Indexed: 12/13/2022] Open

Marakasova ES, Eisenhaber B, Maurer-Stroh S, Eisenhaber F, Baranova A. Prenylation of viral proteins by enzymes of the host: Virus-driven rationale for therapy with statins and FT/GGT1 inhibitors. Bioessays 2017;39. [DOI: 10.1002/bies.201700014] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]

Baker JA, Wong WC, Eisenhaber B, Warwicker J, Eisenhaber F. Charged residues next to transmembrane regions revisited: "Positive-inside rule" is complemented by the "negative inside depletion/outside enrichment rule". BMC Biol 2017;15:66. [PMID: 28738801 PMCID: PMC5525207 DOI: 10.1186/s12915-017-0404-4] [Citation(s) in RCA: 49] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2017] [Accepted: 07/07/2017] [Indexed: 11/25/2022] Open

Abstract

Background

Transmembrane helices (TMHs) frequently occur amongst protein architectures as means for proteins to attach to or embed into biological membranes. Physical constraints such as the membrane’s hydrophobicity and electrostatic potential apply uniform requirements to TMHs and their flanking regions; consequently, they are mirrored in their sequence patterns (in addition to TMHs being a span of generally hydrophobic residues) on top of variations enforced by the specific protein’s biological functions.

Results

With statistics derived from a large body of protein sequences, we demonstrate that, in addition to the positive charge preference at the cytoplasmic inside (positive-inside rule), negatively charged residues preferentially occur or are even enriched at the non-cytoplasmic flank or, at least, they are suppressed at the cytoplasmic flank (negative-not-inside/negative-outside (NNI/NO) rule). As negative residues are generally rare within or near TMHs, the statistical significance is sensitive with regard to details of TMH alignment and residue frequency normalisation and also to dataset size; therefore, this trend was obscured in previous work. We observe variations amongst taxa as well as for organelles along the secretory pathway. The effect is most pronounced for TMHs from single-pass transmembrane (bitopic) proteins compared to those with multiple TMHs (polytopic proteins) and especially for the class of simple TMHs that evolved for the sole role as membrane anchors.

Conclusions

The charged-residue flank bias is only one of the TMH sequence features with a role in the anchorage mechanisms, others apparently being the leucine intra-helix propensity skew towards the cytoplasmic side, tryptophan flanking as well as the cysteine and tyrosine inside preference. These observations will stimulate new prediction methods for TMHs and protein topology from a sequence as well as new engineering designs for artificial membrane proteins.

Electronic supplementary material

The online version of this article (doi:10.1186/s12915-017-0404-4) contains supplementary material, which is available to authorized users.

Collapse

Yap CK, Eisenhaber B, Eisenhaber F, Wong WC. xHMMER3x2: Utilizing HMMER3's speed and HMMER2's sensitivity and specificity in the glocal alignment mode for improved large-scale protein domain annotation. Biol Direct 2016;11:63. [PMID: 27894340 PMCID: PMC5126834 DOI: 10.1186/s13062-016-0163-0] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2016] [Accepted: 10/24/2016] [Indexed: 01/27/2023] Open

Abstract

BACKGROUND

While the local-mode HMMER3 is notable for its massive speed improvement, the slower glocal-mode HMMER2 is more exact for domain annotation by enforcing full domain-to-sequence alignments. Since a unit of domain necessarily implies a unit of function, local-mode HMMER3 alone remains insufficient for precise function annotation tasks. In addition, the incomparable E-values for the same domain model by different HMMER builds create difficulty when checking for domain annotation consistency on a large-scale basis.

RESULTS

In this work, both the speed of HMMER3 and glocal-mode alignment of HMMER2 are combined within the xHMMER3x2 framework for tackling the large-scale domain annotation task. Briefly, HMMER3 is utilized for initial domain detection so that HMMER2 can subsequently perform the glocal-mode, sequence-to-full-domain alignments for the detected HMMER3 hits. An E-value calibration procedure is required to ensure that the search space by HMMER2 is sufficiently replicated by HMMER3. We find that the latter is straightforwardly possible for ~80% of the models in the Pfam domain library (release 29). However in the case of the remaining ~20% of HMMER3 domain models, the respective HMMER2 counterparts are more sensitive. Thus, HMMER3 searches alone are insufficient to ensure sensitivity and a HMMER2-based search needs to be initiated. When tested on the set of UniProt human sequences, xHMMER3x2 can be configured to be between 7× and 201× faster than HMMER2, but with descending domain detection sensitivity from 99.8 to 95.7% with respect to HMMER2 alone; HMMER3's sensitivity was 95.7%. At extremes, xHMMER3x2 is either the slow glocal-mode HMMER2 or the fast HMMER3 with glocal-mode. Finally, the E-values to false-positive rates (FPR) mapping by xHMMER3x2 allows E-values of different model builds to be compared, so that any annotation discrepancies in a large-scale annotation exercise can be flagged for further examination by dissectHMMER.

CONCLUSION

The xHMMER3x2 workflow allows large-scale domain annotation speed to be drastically improved over HMMER2 without compromising for domain-detection with regard to sensitivity and sequence-to-domain alignment incompleteness. The xHMMER3x2 code and its webserver (for Pfam release 27, 28 and 29) are freely available at http://xhmmer3x2.bii.a-star.edu.sg/ .

REVIEWERS

Reviewed by Thomas Dandekar, L. Aravind, Oliviero Carugo and Shamil Sunyaev. For the full reviews, please go to the Reviewers' comments section.

Collapse

The Recipe for Protein Sequence-Based Function Prediction and Its Implementation in the ANNOTATOR Software Environment. Methods Mol Biol 2016;1415:477-506. [PMID: 27115649 DOI: 10.1007/978-1-4939-3572-7_25] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/02/2023]

Sirota FL, Maurer-Stroh S, Eisenhaber B, Eisenhaber F. Single-residue posttranslational modification sites at the N-terminus, C-terminus or in-between: To be or not to be exposed for enzyme access. Proteomics 2016;15:2525-46. [PMID: 26038108 PMCID: PMC4745020 DOI: 10.1002/pmic.201400633] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2014] [Revised: 04/17/2015] [Accepted: 05/29/2015] [Indexed: 11/30/2022]

Developing of the Computer Method for Annotation of Bacterial Genes. Adv Bioinformatics 2016;2015:635437. [PMID: 26770195 PMCID: PMC4684837 DOI: 10.1155/2015/635437] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2015] [Revised: 11/16/2015] [Accepted: 11/18/2015] [Indexed: 02/07/2023] Open

Scaiewicz A, Levitt M. The language of the protein universe. Curr Opin Genet Dev 2015;35:50-6. [PMID: 26451980 DOI: 10.1016/j.gde.2015.08.010] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2015] [Revised: 08/20/2015] [Accepted: 08/25/2015] [Indexed: 11/17/2022]

Sherman WA, Kuchibhatla DB, Limviphuvadh V, Maurer-Stroh S, Eisenhaber B, Eisenhaber F. HPMV: human protein mutation viewer - relating sequence mutations to protein sequence architecture and function changes. J Bioinform Comput Biol 2015;13:1550028. [PMID: 26503432 DOI: 10.1142/s0219720015500286] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]

Wong WC, Yap CK, Eisenhaber B, Eisenhaber F. dissectHMMER: a HMMER-based score dissection framework that statistically evaluates fold-critical sequence segments for domain fold similarity. Biol Direct 2015;10:39. [PMID: 26228544 PMCID: PMC4521371 DOI: 10.1186/s13062-015-0068-3] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2015] [Accepted: 07/20/2015] [Indexed: 11/10/2022] Open

Abstract

Background

Annotation transfer for function and structure within the sequence homology concept essentially requires protein sequence similarity for the secondary structural blocks forming the fold of a protein. A simplistic similarity approach in the case of non-globular segments (coiled coils, low complexity regions, transmembrane regions, long loops, etc.) is not justified and a pertinent source for mistaken homologies. The latter is either due to positional sequence conservation as a result of a very simple, physically induced pattern or integral sequence properties that are critical for function. Furthermore, against the backdrop that the number of well-studied proteins continues to grow at a slow rate, it necessitates for a search methodology to dive deeper into the sequence similarity space to connect the unknown sequences to the well-studied ones, albeit more distant, for biological function postulations.

Results

Based on our previous work of dissecting the hidden markov model (HMMER) based similarity score into fold-critical and the non-globular contributions to improve homology inference, we propose a framework-dissectHMMER, that identifies more fold-related domain hits from standard HMMER searches. Subsequent statistical stratification of the fold-related hits into cohorts of functionally-related domains allows for the function postulation of the query sequence. Briefly, the technical problems as to how to recognize non-globular parts in the domain model, resolve contradictory HMMER2/HMMER3 results and evaluate fold-related domain hits for homology, are addressed in this work. The framework is benchmarked against a set of SCOP-to-Pfam domain models. Despite being a sequence-to-profile method, dissectHMMER performs favorably against a profile-to-profile based method-HHsuite/HHsearch. Examples of function annotation using dissectHMMER, including the function discovery of an uncharacterized membrane protein Q9K8K1_BACHD (WP_010899149.1) as a lactose/H+ symporter, are presented. Finally, dissectHMMER webserver is made publicly available at http://dissecthmmer.bii.a-star.edu.sg.

Conclusions

The proposed framework-dissectHMMER, is faithful to the original inception of the sequence homology concept while improving upon the existing HMMER search tool through the rescue of statistically evaluated false-negative yet fold-related domain hits to the query sequence. Overall, this translates into an opportunity for any novel protein sequence to be functionally characterized.

Reviewers

This article was reviewed by Masanori Arita, Shamil Sunyaev and L. Aravind.

Electronic supplementary material

The online version of this article (doi:10.1186/s13062-015-0068-3) contains supplementary material, which is available to authorized users.

Collapse

Mudgal R, Sandhya S, Chandra N, Srinivasan N. De-DUFing the DUFs: Deciphering distant evolutionary relationships of Domains of Unknown Function using sensitive homology detection methods. Biol Direct 2015;10:38. [PMID: 26228684 PMCID: PMC4520260 DOI: 10.1186/s13062-015-0069-2] [Citation(s) in RCA: 30] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2015] [Accepted: 07/20/2015] [Indexed: 12/23/2022] Open

Abstract

Background

In the post-genomic era where sequences are being determined at a rapid rate, we are highly reliant on computational methods for their tentative biochemical characterization. The Pfam database currently contains 3,786 families corresponding to “Domains of Unknown Function” (DUF) or “Uncharacterized Protein Family” (UPF), of which 3,087 families have no reported three-dimensional structure, constituting almost one-fourth of the known protein families in search for both structure and function.

Results

We applied a ‘computational structural genomics’ approach using five state-of-the-art remote similarity detection methods to detect the relationship between uncharacterized DUFs and domain families of known structures. The association with a structural domain family could serve as a start point in elucidating the function of a DUF. Amongst these five methods, searches in SCOP-NrichD database have been applied for the first time. Predictions were classified into high, medium and low- confidence based on the consensus of results from various approaches and also annotated with enzyme and Gene ontology terms. 614 uncharacterized DUFs could be associated with a known structural domain, of which high confidence predictions, involving at least four methods, were made for 54 families. These structure-function relationships for the 614 DUF families can be accessed on-line at http://proline.biochem.iisc.ernet.in/RHD_DUFS/. For potential enzymes in this set, we assessed their compatibility with the associated fold and performed detailed structural and functional annotation by examining alignments and extent of conservation of functional residues. Detailed discussion is provided for interesting assignments for DUF3050, DUF1636, DUF1572, DUF2092 and DUF659.

Conclusions

This study provides insights into the structure and potential function for nearly 20 % of the DUFs. Use of different computational approaches enables us to reliably recognize distant relationships, especially when they converge to a common assignment because the methods are often complementary. We observe that while pointers to the structural domain can offer the right clues to the function of a protein, recognition of its precise functional role is still ‘non-trivial’ with many DUF domains conserving only some of the critical residues. It is not clear whether these are functional vestiges or instances involving alternate substrates and interacting partners.

Reviewers

This article was reviewed by Drs Eugene Koonin, Frank Eisenhaber and Srikrishna Subramanian.

Electronic supplementary material

The online version of this article (doi:10.1186/s13062-015-0069-2) contains supplementary material, which is available to authorized users.

Collapse

Eisenhaber F, Sherman WA. 10 years for the Journal of Bioinformatics and Computational Biology (2003-2013) -- a retrospective. J Bioinform Comput Biol 2014;12:1471001. [PMID: 24969752 DOI: 10.1142/s0219720014710012] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]

Eisenhaber F. Unix interfaces, Kleisli, bucandin structure, etc. -- the heroic beginning of bioinformatics in Singapore. J Bioinform Comput Biol 2014;12:1471002. [PMID: 24969753 DOI: 10.1142/s0219720014710024] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]

Eisenhaber B, Eisenhaber S, Kwang TY, Grüber G, Eisenhaber F. Transamidase subunit GAA1/GPAA1 is a M28 family metallo-peptide-synthetase that catalyzes the peptide bond formation between the substrate protein's omega-site and the GPI lipid anchor's phosphoethanolamine. Cell Cycle 2014;13:1912-7. [PMID: 24743167 PMCID: PMC4111754 DOI: 10.4161/cc.28761] [Citation(s) in RCA: 35] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022] Open

Eisenhaber F, Sung WK, Wong L. Guest Editorial for the International Conference on Genome Informatics (GIW 2013). IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2014;11:5-6. [PMID: 26605388 DOI: 10.1109/tcbb.2014.2299751] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]

EISENHABER FRANK, SUNG WINGKIN, WONG LIMSOON. THE 24TH INTERNATIONAL CONFERENCE ON GENOME INFORMATICS, GIW2013, IN SINGAPORE. J Bioinform Comput Biol 2013. [DOI: 10.1142/s0219720013020034] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]

MAURER-STROH SEBASTIAN, GAO HE, HAN HAO, BAETEN LIES, SCHYMKOWITZ JOOST, ROUSSEAU FREDERIC, ZHANG LOUXIN, EISENHABER FRANK. MOTIF DISCOVERY WITH DATA MINING IN 3D PROTEIN STRUCTURE DATABASES: DISCOVERY, VALIDATION AND PREDICTION OF THE U-SHAPE ZINC BINDING ("HUF-ZINC") MOTIF. J Bioinform Comput Biol 2013;11:1340008. [DOI: 10.1142/s0219720013400088] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]

Kuznetsov V, Lee HK, Maurer-Stroh S, Molnár MJ, Pongor S, Eisenhaber B, Eisenhaber F. How bioinformatics influences health informatics: usage of biomolecular sequences, expression profiles and automated microscopic image analyses for clinical needs and public health. Health Inf Sci Syst 2013;1:2. [PMID: 25825654 PMCID: PMC4336111 DOI: 10.1186/2047-2501-1-2] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2012] [Accepted: 10/05/2012] [Indexed: 01/25/2023] Open

Abstract

ABSTRACT

The currently hyped expectation of personalized medicine is often associated with just achieving the information technology led integration of biomolecular sequencing, expression and histopathological bioimaging data with clinical records at the individual patients' level as if the significant biomedical conclusions would be its more or less mandatory result. It remains a sad fact that many, if not most biomolecular mechanisms that translate the human genomic information into phenotypes are not known and, thus, most of the molecular and cellular data cannot be interpreted in terms of biomedically relevant conclusions. Whereas the historical trend will certainly be into the general direction of personalized diagnostics and cures, the temperate view suggests that biomedical applications that rely either on the comparison of biomolecular sequences and/or on the already known biomolecular mechanisms have much greater chances to enter clinical practice soon. In addition to considering the general trends, we exemplarily review advances in the area of cancer biomarker discovery, in the clinically relevant characterization of patient-specific viral and bacterial pathogens (with emphasis on drug selection for influenza and enterohemorrhagic E. coli) as well as progress in the automated assessment of histopathological images. As molecular and cellular data analysis will become instrumental for achieving desirable clinical outcomes, the role of bioinformatics and computational biology approaches will dramatically grow.

AUTHOR SUMMARY

With DNA sequencing and computers becoming increasingly cheap and accessible to the layman, the idea of integrating biomolecular and clinical patient data seems to become a realistic, short-term option that will lead to patient-specific diagnostics and treatment design for many diseases such as cancer, metabolic disorders, inherited conditions, etc. These hyped expectations will fail since many, if not most biomolecular mechanisms that translate the human genomic information into phenotypes are not known yet and, thus, most of the molecular and cellular data collected will not lead to biomedically relevant conclusions. At the same time, less spectacular biomedical applications based on biomolecular sequence comparison and/or known biomolecular mechanisms have the potential to unfold enormous potential for healthcare and public health. Since the analysis of heterogeneous biomolecular data in context with clinical data will be increasingly critical, the role of bioinformatics and computational biology will grow correspondingly in this process.

Collapse

Sirota FL, Batagov A, Schneider G, Eisenhaber B, Eisenhaber F, Maurer-Stroh S. Beware of moving targets: reference proteome content fluctuates substantially over the years. J Bioinform Comput Biol 2012;10:1250020. [PMID: 22867629 DOI: 10.1142/s0219720012500205] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]