1
|
Rutherford KM, Lera-Ramírez M, Wood V. PomBase: a Global Core Biodata Resource-growth, collaboration, and sustainability. Genetics 2024; 227:iyae007. [PMID: 38376816 PMCID: PMC11075564 DOI: 10.1093/genetics/iyae007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2023] [Accepted: 01/13/2024] [Indexed: 02/21/2024] Open
Abstract
PomBase (https://www.pombase.org), the model organism database (MOD) for fission yeast, was recently awarded Global Core Biodata Resource (GCBR) status by the Global Biodata Coalition (GBC; https://globalbiodata.org/) after a rigorous selection process. In this MOD review, we present PomBase's continuing growth and improvement over the last 2 years. We describe these improvements in the context of the qualitative GCBR indicators related to scientific quality, comprehensivity, accelerating science, user stories, and collaborations with other biodata resources. This review also showcases the depth of existing connections both within the biocuration ecosystem and between PomBase and its user community.
Collapse
Affiliation(s)
- Kim M Rutherford
- Department of Biochemistry, University of Cambridge, Cambridge CB2 1GA, UK
| | - Manuel Lera-Ramírez
- Department of Genetics, Evolution and Environment, University College London, London WC1E 6BT, UK
| | - Valerie Wood
- Department of Biochemistry, University of Cambridge, Cambridge CB2 1GA, UK
| |
Collapse
|
2
|
Alfatah M, Lim JJJ, Zhang Y, Naaz A, Cheng TYN, Yogasundaram S, Faidzinn NA, Lin JJ, Eisenhaber B, Eisenhaber F. Uncharacterized yeast gene YBR238C, an effector of TORC1 signaling in a mitochondrial feedback loop, accelerates cellular aging via HAP4- and RMD9-dependent mechanisms. eLife 2024; 12:RP92178. [PMID: 38713053 PMCID: PMC11076046 DOI: 10.7554/elife.92178] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/08/2024] Open
Abstract
Uncovering the regulators of cellular aging will unravel the complexity of aging biology and identify potential therapeutic interventions to delay the onset and progress of chronic, aging-related diseases. In this work, we systematically compared genesets involved in regulating the lifespan of Saccharomyces cerevisiae (a powerful model organism to study the cellular aging of humans) and those with expression changes under rapamycin treatment. Among the functionally uncharacterized genes in the overlap set, YBR238C stood out as the only one downregulated by rapamycin and with an increased chronological and replicative lifespan upon deletion. We show that YBR238C and its paralog RMD9 oppositely affect mitochondria and aging. YBR238C deletion increases the cellular lifespan by enhancing mitochondrial function. Its overexpression accelerates cellular aging via mitochondrial dysfunction. We find that the phenotypic effect of YBR238C is largely explained by HAP4- and RMD9-dependent mechanisms. Furthermore, we find that genetic- or chemical-based induction of mitochondrial dysfunction increases TORC1 (Target of Rapamycin Complex 1) activity that, subsequently, accelerates cellular aging. Notably, TORC1 inhibition by rapamycin (or deletion of YBR238C) improves the shortened lifespan under these mitochondrial dysfunction conditions in yeast and human cells. The growth of mutant cells (a proxy of TORC1 activity) with enhanced mitochondrial function is sensitive to rapamycin whereas the growth of defective mitochondrial mutants is largely resistant to rapamycin compared to wild type. Our findings demonstrate a feedback loop between TORC1 and mitochondria (the TORC1-MItochondria-TORC1 (TOMITO) signaling process) that regulates cellular aging processes. Hereby, YBR238C is an effector of TORC1 modulating mitochondrial function.
Collapse
Affiliation(s)
- Mohammad Alfatah
- Bioinformatics Institute (BII), Agency for Science, Technology and Research (A*STAR)SingaporeSingapore
| | - Jolyn Jia Jia Lim
- Bioinformatics Institute (BII), Agency for Science, Technology and Research (A*STAR)SingaporeSingapore
| | - Yizhong Zhang
- Bioinformatics Institute (BII), Agency for Science, Technology and Research (A*STAR)SingaporeSingapore
| | - Arshia Naaz
- Genome Institute of Singapore (GIS), Agency for Science, Technology and Research (A*STAR)SingaporeSingapore
| | - Trishia Yi Ning Cheng
- Bioinformatics Institute (BII), Agency for Science, Technology and Research (A*STAR)SingaporeSingapore
| | - Sonia Yogasundaram
- Bioinformatics Institute (BII), Agency for Science, Technology and Research (A*STAR)SingaporeSingapore
| | - Nashrul Afiq Faidzinn
- Bioinformatics Institute (BII), Agency for Science, Technology and Research (A*STAR)SingaporeSingapore
| | - Jovian Jing Lin
- Bioinformatics Institute (BII), Agency for Science, Technology and Research (A*STAR)SingaporeSingapore
| | - Birgit Eisenhaber
- Bioinformatics Institute (BII), Agency for Science, Technology and Research (A*STAR)SingaporeSingapore
- Genome Institute of Singapore (GIS), Agency for Science, Technology and Research (A*STAR)SingaporeSingapore
- LASA – Lausitz Advanced Scientific Applications gGmbHWeißwasserGermany
| | - Frank Eisenhaber
- Bioinformatics Institute (BII), Agency for Science, Technology and Research (A*STAR)SingaporeSingapore
- Genome Institute of Singapore (GIS), Agency for Science, Technology and Research (A*STAR)SingaporeSingapore
- LASA – Lausitz Advanced Scientific Applications gGmbHWeißwasserGermany
- School of Biological Sciences (SBS), Nanyang Technological University (NTU)SingaporeSingapore
| |
Collapse
|
3
|
Richardson R, Tejedor Navarro H, Amaral LAN, Stoeger T. Meta-Research: Understudied genes are lost in a leaky pipeline between genome-wide assays and reporting of results. eLife 2024; 12:RP93429. [PMID: 38546716 PMCID: PMC10977968 DOI: 10.7554/elife.93429] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/01/2024] Open
Abstract
Present-day publications on human genes primarily feature genes that already appeared in many publications prior to completion of the Human Genome Project in 2003. These patterns persist despite the subsequent adoption of high-throughput technologies, which routinely identify novel genes associated with biological processes and disease. Although several hypotheses for bias in the selection of genes as research targets have been proposed, their explanatory powers have not yet been compared. Our analysis suggests that understudied genes are systematically abandoned in favor of better-studied genes between the completion of -omics experiments and the reporting of results. Understudied genes remain abandoned by studies that cite these -omics experiments. Conversely, we find that publications on understudied genes may even accrue a greater number of citations. Among 45 biological and experimental factors previously proposed to affect which genes are being studied, we find that 33 are significantly associated with the choice of hit genes presented in titles and abstracts of -omics studies. To promote the investigation of understudied genes, we condense our insights into a tool, find my understudied genes (FMUG), that allows scientists to engage with potential bias during the selection of hits. We demonstrate the utility of FMUG through the identification of genes that remain understudied in vertebrate aging. FMUG is developed in Flutter and is available for download at fmug.amaral.northwestern.edu as a MacOS/Windows app.
Collapse
Affiliation(s)
- Reese Richardson
- Interdisciplinary Biological Sciences, Northwestern UniversityEvanstonUnited States
- Department of Chemical and Biological Engineering, Northwestern UniversityEvanstonUnited States
| | - Heliodoro Tejedor Navarro
- Department of Chemical and Biological Engineering, Northwestern UniversityEvanstonUnited States
- Northwestern Institute on Complex Systems, Northwestern UniversityEvanstonUnited States
| | - Luis A Nunes Amaral
- Department of Chemical and Biological Engineering, Northwestern UniversityEvanstonUnited States
- Northwestern Institute on Complex Systems, Northwestern UniversityEvanstonUnited States
- Department of Molecular Biosciences, Northwestern UniversityEvanstonUnited States
- Department of Physics and Astronomy, Northwestern UniversityEvanstonUnited States
| | - Thomas Stoeger
- Department of Chemical and Biological Engineering, Northwestern UniversityEvanstonUnited States
- The Potocsnak Longevity Institute, Northwestern UniversityChicagoUnited States
- Simpson Querrey Lung Institute for Translational Science, Northwestern UniversityChicagoUnited States
| |
Collapse
|
4
|
Schäfer PSL, Dimitrov D, Villablanca EJ, Saez-Rodriguez J. Integrating single-cell multi-omics and prior biological knowledge for a functional characterization of the immune system. Nat Immunol 2024; 25:405-417. [PMID: 38413722 DOI: 10.1038/s41590-024-01768-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2023] [Accepted: 01/16/2024] [Indexed: 02/29/2024]
Abstract
The immune system comprises diverse specialized cell types that cooperate to defend the host against a wide range of pathogenic threats. Recent advancements in single-cell and spatial multi-omics technologies provide rich information about the molecular state of immune cells. Here, we review how the integration of single-cell and spatial multi-omics data with prior knowledge-gathered from decades of detailed biochemical studies-allows us to obtain functional insights, focusing on gene regulatory processes and cell-cell interactions. We present diverse applications in immunology and critically assess underlying assumptions and limitations. Finally, we offer a perspective on the ongoing technological and algorithmic developments that promise to get us closer to a systemic mechanistic understanding of the immune system.
Collapse
Affiliation(s)
- Philipp Sven Lars Schäfer
- Institute for Computational Bioscience, Faculty of Medicine and Heidelberg University Hospital, Heidelberg University, Heidelberg, Germany
| | - Daniel Dimitrov
- Institute for Computational Bioscience, Faculty of Medicine and Heidelberg University Hospital, Heidelberg University, Heidelberg, Germany
| | - Eduardo J Villablanca
- Division of Immunology and Allergy, Department of Medicine Solna, Karolinska Institute and Karolinska University Hospital, Stockholm, Sweden
- Center of Molecular Medicine, Stockholm, Sweden
| | - Julio Saez-Rodriguez
- Institute for Computational Bioscience, Faculty of Medicine and Heidelberg University Hospital, Heidelberg University, Heidelberg, Germany.
| |
Collapse
|
5
|
Richardson RAK, Tejedor Navarro H, Amaral LAN, Stoeger T. Meta-Research: understudied genes are lost in a leaky pipeline between genome-wide assays and reporting of results. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.02.28.530483. [PMID: 36909550 PMCID: PMC10002660 DOI: 10.1101/2023.02.28.530483] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/06/2023]
Abstract
Present-day publications on human genes primarily feature genes that already appeared in many publications prior to completion of the Human Genome Project in 2003. These patterns persist despite the subsequent adoption of high-throughput technologies, which routinely identify novel genes associated with biological processes and disease. Although several hypotheses for bias in the selection of genes as research targets have been proposed, their explanatory powers have not yet been compared. Our analysis suggests that understudied genes are systematically abandoned in favor of better-studied genes between the completion of -omics experiments and the reporting of results. Understudied genes remain abandoned by studies that cite these -omics experiments. Conversely, we find that publications on understudied genes may even accrue a greater number of citations. Among 45 biological and experimental factors previously proposed to affect which genes are being studied, we find that 33 are significantly associated with the choice of hit genes presented in titles and abstracts of - omics studies. To promote the investigation of understudied genes we condense our insights into a tool, find my understudied genes (FMUG), that allows scientists to engage with potential bias during the selection of hits. We demonstrate the utility of FMUG through the identification of genes that remain understudied in vertebrate aging. FMUG is developed in Flutter and is available for download at fmug.amaral.northwestern.edu as a MacOS/Windows app.
Collapse
Affiliation(s)
- Reese AK Richardson
- Interdisciplinary Biological Sciences, Northwestern University
- Department of Chemical and Biological Engineering, Northwestern University
| | - Heliodoro Tejedor Navarro
- Department of Chemical and Biological Engineering, Northwestern University
- Northwestern Institute on Complex Systems, Northwestern University
| | - Luis A Nunes Amaral
- Department of Chemical and Biological Engineering, Northwestern University
- Northwestern Institute on Complex Systems, Northwestern University
- Department of Physics and Astronomy, Northwestern University
- Department of Molecular Biosciences, Northwestern University
| | - Thomas Stoeger
- Department of Chemical and Biological Engineering, Northwestern University
- The Potocsnak Longevity Institute, Northwestern University
- Simpson Querrey Lung Institute for Translational Science, Northwestern University
| |
Collapse
|
6
|
Macedo-da-Silva J, Mule SN, Rosa-Fernandes L, Palmisano G. A computational pipeline elucidating functions of conserved hypothetical Trypanosoma cruzi proteins based on public proteomic data. ADVANCES IN PROTEIN CHEMISTRY AND STRUCTURAL BIOLOGY 2024; 138:401-428. [PMID: 38220431 DOI: 10.1016/bs.apcsb.2023.07.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/16/2024]
Abstract
The proteome is complex, dynamic, and functionally diverse. Functional proteomics aims to characterize the functions of proteins in biological systems. However, there is a delay in annotating the function of proteins, even in model organisms. This gap is even greater in other organisms, including Trypanosoma cruzi, the causative agent of the parasitic, systemic, and sometimes fatal disease called Chagas disease. About 99.8% of Trypanosoma cruzi proteome is not manually annotated (unreviewed), among which>25% are conserved hypothetical proteins (CHPs), calling attention to the knowledge gap on the protein content of this organism. CHPs are conserved proteins among different species of various evolutionary lineages; however, they lack functional validation. This study describes a bioinformatics pipeline applied to public proteomic data to infer possible biological functions of conserved hypothetical Trypanosoma cruzi proteins. Here, the adopted strategy consisted of collecting differentially expressed proteins between the epimastigote and metacyclic trypomastigotes stages of Trypanosoma cruzi; followed by the functional characterization of these CHPs applying a manifold learning technique for dimension reduction and 3D structure homology analysis (Spalog). We found a panel of 25 and 26 upregulated proteins in the epimastigote and metacyclic trypomastigote stages, respectively; among these, 18 CHPs (8 in the epimastigote stage and 10 in the metacyclic stage) were characterized. The data generated corroborate the literature and complement the functional analyses of differentially regulated proteins at each stage, as they attribute potential functions to CHPs, which are frequently identified in Trypanosoma cruzi proteomics studies. However, it is important to point out that experimental validation is required to deepen our understanding of the CHPs.
Collapse
Affiliation(s)
- Janaina Macedo-da-Silva
- GlycoProteomics Laboratory, Department of Parasitology, ICB, University of São Paulo, Sao Paulo, Brazil
| | - Simon Ngao Mule
- GlycoProteomics Laboratory, Department of Parasitology, ICB, University of São Paulo, Sao Paulo, Brazil
| | - Livia Rosa-Fernandes
- GlycoProteomics Laboratory, Department of Parasitology, ICB, University of São Paulo, Sao Paulo, Brazil; Centre for Motor Neuron Disease Research, Faculty of Medicine, Health & Human Sciences, Macquarie Medical School, Sydney, NSW, Australia
| | - Giuseppe Palmisano
- GlycoProteomics Laboratory, Department of Parasitology, ICB, University of São Paulo, Sao Paulo, Brazil; School of Natural Sciences, Macquarie University, Sydney, NSW, Australia.
| |
Collapse
|
7
|
Rappsilber J. A dive into the unknome. Trends Genet 2024; 40:15-16. [PMID: 37968205 DOI: 10.1016/j.tig.2023.10.011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2023] [Accepted: 10/23/2023] [Indexed: 11/17/2023]
Abstract
We may never understand the function of all genes, findings by Freeman, Munro and colleagues suggest, unless we rethink our approaches. They make a thorough attempt at quantifying the unknownness of protein-coding genes and experimentally prove that many neglected genes hold the seed of important discoveries.
Collapse
Affiliation(s)
- Juri Rappsilber
- Technische Universität Berlin, Chair of Bioanalytics, 10623 Berlin, Germany; Wellcome Centre for Cell Biology, University of Edinburgh, Edinburgh, EH9 3BF, UK; Si-M/'Der Simulierte Mensch', a Science Framework of Technische Universität Berlin and Charité - Universitätsmedizin Berlin, Berlin, Germany.
| |
Collapse
|
8
|
Brechtmann F, Bechtler T, Londhe S, Mertes C, Gagneur J. Evaluation of input data modality choices on functional gene embeddings. NAR Genom Bioinform 2023; 5:lqad095. [PMID: 37942285 PMCID: PMC10629286 DOI: 10.1093/nargab/lqad095] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2023] [Revised: 09/07/2023] [Accepted: 09/28/2023] [Indexed: 11/10/2023] Open
Abstract
Functional gene embeddings, numerical vectors capturing gene function, provide a promising way to integrate functional gene information into machine learning models. These embeddings are learnt by applying self-supervised machine-learning algorithms on various data types including quantitative omics measurements, protein-protein interaction networks and literature. However, downstream evaluations comparing alternative data modalities used to construct functional gene embeddings have been lacking. Here we benchmarked functional gene embeddings obtained from various data modalities for predicting disease-gene lists, cancer drivers, phenotype-gene associations and scores from genome-wide association studies. Off-the-shelf predictors trained on precomputed embeddings matched or outperformed dedicated state-of-the-art predictors, demonstrating their high utility. Embeddings based on literature and protein-protein interactions inferred from low-throughput experiments outperformed embeddings derived from genome-wide experimental data (transcriptomics, deletion screens and protein sequence) when predicting curated gene lists. In contrast, they did not perform better when predicting genome-wide association signals and were biased towards highly-studied genes. These results indicate that embeddings derived from literature and low-throughput experiments appear favourable in many existing benchmarks because they are biased towards well-studied genes and should therefore be considered with caution. Altogether, our study and precomputed embeddings will facilitate the development of machine-learning models in genetics and related fields.
Collapse
Affiliation(s)
- Felix Brechtmann
- TUM School of Computation, Information and Technology, Technical University of Munich, Garching, Germany
- Munich Center for Machine Learning, Munich, Germany
| | - Thibault Bechtler
- TUM School of Computation, Information and Technology, Technical University of Munich, Garching, Germany
| | - Shubhankar Londhe
- TUM School of Computation, Information and Technology, Technical University of Munich, Garching, Germany
| | - Christian Mertes
- TUM School of Computation, Information and Technology, Technical University of Munich, Garching, Germany
- Munich Data Science Institute, Technical University of Munich, Garching, Germany
- Institute of Human Genetics, School of Medicine, Technical University of Munich, Munich, Germany
| | - Julien Gagneur
- TUM School of Computation, Information and Technology, Technical University of Munich, Garching, Germany
- Institute of Human Genetics, School of Medicine, Technical University of Munich, Munich, Germany
- Computational Health Center, Helmholtz Center Munich, Neuherberg, Germany
| |
Collapse
|
9
|
Rodríguez-López M, Bordin N, Lees J, Scholes H, Hassan S, Saintain Q, Kamrad S, Orengo C, Bähler J. Broad functional profiling of fission yeast proteins using phenomics and machine learning. eLife 2023; 12:RP88229. [PMID: 37787768 PMCID: PMC10547477 DOI: 10.7554/elife.88229] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/04/2023] Open
Abstract
Many proteins remain poorly characterized even in well-studied organisms, presenting a bottleneck for research. We applied phenomics and machine-learning approaches with Schizosaccharomyces pombe for broad cues on protein functions. We assayed colony-growth phenotypes to measure the fitness of deletion mutants for 3509 non-essential genes in 131 conditions with different nutrients, drugs, and stresses. These analyses exposed phenotypes for 3492 mutants, including 124 mutants of 'priority unstudied' proteins conserved in humans, providing varied functional clues. For example, over 900 proteins were newly implicated in the resistance to oxidative stress. Phenotype-correlation networks suggested roles for poorly characterized proteins through 'guilt by association' with known proteins. For complementary functional insights, we predicted Gene Ontology (GO) terms using machine learning methods exploiting protein-network and protein-homology data (NET-FF). We obtained 56,594 high-scoring GO predictions, of which 22,060 also featured high information content. Our phenotype-correlation data and NET-FF predictions showed a strong concordance with existing PomBase GO annotations and protein networks, with integrated analyses revealing 1675 novel GO predictions for 783 genes, including 47 predictions for 23 priority unstudied proteins. Experimental validation identified new proteins involved in cellular aging, showing that these predictions and phenomics data provide a rich resource to uncover new protein functions.
Collapse
Affiliation(s)
- María Rodríguez-López
- University College London, Institute of Healthy Ageing and Department of Genetics, Evolution & EnvironmentLondonUnited Kingdom
| | - Nicola Bordin
- University College London, Institute of Structural and Molecular BiologyLondonUnited Kingdom
| | - Jon Lees
- University College London, Institute of Structural and Molecular BiologyLondonUnited Kingdom
- University of BristolBristolUnited Kingdom
| | - Harry Scholes
- University College London, Institute of Structural and Molecular BiologyLondonUnited Kingdom
| | - Shaimaa Hassan
- University College London, Institute of Healthy Ageing and Department of Genetics, Evolution & EnvironmentLondonUnited Kingdom
- Helwan University, Faculty of PharmacyCairoEgypt
| | - Quentin Saintain
- University College London, Institute of Healthy Ageing and Department of Genetics, Evolution & EnvironmentLondonUnited Kingdom
| | - Stephan Kamrad
- University College London, Institute of Healthy Ageing and Department of Genetics, Evolution & EnvironmentLondonUnited Kingdom
| | - Christine Orengo
- University College London, Institute of Structural and Molecular BiologyLondonUnited Kingdom
| | - Jürg Bähler
- University College London, Institute of Healthy Ageing and Department of Genetics, Evolution & EnvironmentLondonUnited Kingdom
| |
Collapse
|
10
|
Tantoso E, Eisenhaber B, Sinha S, Jensen LJ, Eisenhaber F. Did the early full genome sequencing of yeast boost gene function discovery? Biol Direct 2023; 18:46. [PMID: 37574542 PMCID: PMC10424406 DOI: 10.1186/s13062-023-00403-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2023] [Accepted: 08/01/2023] [Indexed: 08/15/2023] Open
Abstract
BACKGROUND Although the genome of Saccharomyces cerevisiae (S. cerevisiae) was the first one of a eukaryote organism that was fully sequenced (in 1996), a complete understanding of the potential of encoded biomolecular mechanisms has not yet been achieved. Here, we wish to quantify how far the goal of a full list of S. cerevisiae gene functions still is. RESULTS The scientific literature about S. cerevisiae protein-coding genes has been mapped onto the yeast genome via the mentioning of names for genomic regions in scientific publications. The match was quantified with the ratio of a given gene name's occurrences to those of any gene names in the article. We find that ~ 230 elite genes with ≥ 75 full publication equivalents (FPEs, FPE = 1 is an idealized publication referring to just a single gene) command ~ 45% of all literature. At the same time, about two thirds of the genes (each with less than 10 FPEs) are described in just 12% of the literature (in average each such gene has just ~ 1.5% of the literature of an elite gene). About 600 genes have not been mentioned in any dedicated article. Compared with other groups of genes, the literature growth rates were highest for uncharacterized or understudied genes until late nineties of the twentieth century. Yet, these growth rates deteriorated and became negative thereafter. Thus, yeast function discovery for previously uncharacterized genes has returned to the level of ~ 1980. At the same time, literature for anyhow well-studied genes (with a threshold T10 (≥ 10 FPEs) and higher) remains steadily growing. CONCLUSIONS Did the early full genome sequencing of yeast boost gene function discovery? The data proves that the moment of publishing the full genome in reality coincides with the onset of decline of gene function discovery for previously uncharacterized genes. If the current status of literature about yeast molecular mechanisms can be extrapolated into the future, it will take about another ~ 50 years to complete the yeast gene function list. We found that a small group of scientific journals contributed extraordinarily to publishing early reports relevant to yeast gene function discoveries.
Collapse
Affiliation(s)
- Erwin Tantoso
- Agency for Science, Technology and Research (A*STAR), Bioinformatics Institute (BII), 30 Biopolis Street #07-01, Matrix Building, Singapore, 138671, Republic of Singapore.
- Agency for Science, Technology and Research (A*STAR), Genome Institute of Singapore (GIS), 60 Biopolis Street, Singapore, 138672, Republic of Singapore.
| | - Birgit Eisenhaber
- Agency for Science, Technology and Research (A*STAR), Bioinformatics Institute (BII), 30 Biopolis Street #07-01, Matrix Building, Singapore, 138671, Republic of Singapore.
- Agency for Science, Technology and Research (A*STAR), Genome Institute of Singapore (GIS), 60 Biopolis Street, Singapore, 138672, Republic of Singapore.
- LASA - Lausitz Advanced Scientific Applications gGmbH, Straße Der Einheit 2-24, 02943, Weißwasser, Federal Republic of Germany.
| | - Swati Sinha
- European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Lars Juhl Jensen
- Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| | - Frank Eisenhaber
- Agency for Science, Technology and Research (A*STAR), Bioinformatics Institute (BII), 30 Biopolis Street #07-01, Matrix Building, Singapore, 138671, Republic of Singapore.
- Agency for Science, Technology and Research (A*STAR), Genome Institute of Singapore (GIS), 60 Biopolis Street, Singapore, 138672, Republic of Singapore.
- LASA - Lausitz Advanced Scientific Applications gGmbH, Straße Der Einheit 2-24, 02943, Weißwasser, Federal Republic of Germany.
- School of Biological Sciences, Nanyang Technological University, 60 Nanyang Drive, Singapore, 637551, Republic of Singapore.
| |
Collapse
|
11
|
Potter A, Hangas A, Goffart S, Huynen MA, Cabrera-Orefice A, Spelbrink JN. Uncharacterized protein C17orf80 - a novel interactor of human mitochondrial nucleoids. J Cell Sci 2023; 136:jcs260822. [PMID: 37401363 PMCID: PMC10445727 DOI: 10.1242/jcs.260822] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2022] [Accepted: 06/26/2023] [Indexed: 07/05/2023] Open
Abstract
Molecular functions of many human proteins remain unstudied, despite the demonstrated association with diseases or pivotal molecular structures, such as mitochondrial DNA (mtDNA). This small genome is crucial for the proper functioning of mitochondria, the energy-converting organelles. In mammals, mtDNA is arranged into macromolecular complexes called nucleoids that serve as functional stations for its maintenance and expression. Here, we aimed to explore an uncharacterized protein C17orf80, which was previously detected close to the nucleoid components by proximity labelling mass spectrometry. To investigate the subcellular localization and function of C17orf80, we took advantage of immunofluorescence microscopy, interaction proteomics and several biochemical assays. We demonstrate that C17orf80 is a mitochondrial membrane-associated protein that interacts with nucleoids even when mtDNA replication is inhibited. In addition, we show that C17orf80 is not essential for mtDNA maintenance and mitochondrial gene expression in cultured human cells. These results provide a basis for uncovering the molecular function of C17orf80 and the nature of its association with nucleoids, possibly leading to new insights about mtDNA and its expression.
Collapse
Affiliation(s)
- Alisa Potter
- Department of Pediatrics, Amalia Children's Hospital, Radboud University Medical Center, Nijmegen, 6525 GA, The Netherlands
- Radboud Center for Mitochondrial Medicine (RCMM), Radboud University Medical Center, Nijmegen, 6525 GA, The Netherlands
| | - Anu Hangas
- Department of Environmental and Biological Sciences, University of Eastern Finland, Joensuu, 80101, Finland
| | - Steffi Goffart
- Department of Environmental and Biological Sciences, University of Eastern Finland, Joensuu, 80101, Finland
| | - Martijn A. Huynen
- Department of Medical BioSciences, Radboud University Medical Center, Nijmegen, 6525 GA, The Netherlands
| | - Alfredo Cabrera-Orefice
- Radboud Center for Mitochondrial Medicine (RCMM), Radboud University Medical Center, Nijmegen, 6525 GA, The Netherlands
- Department of Medical BioSciences, Radboud University Medical Center, Nijmegen, 6525 GA, The Netherlands
| | - Johannes N. Spelbrink
- Department of Pediatrics, Amalia Children's Hospital, Radboud University Medical Center, Nijmegen, 6525 GA, The Netherlands
- Radboud Center for Mitochondrial Medicine (RCMM), Radboud University Medical Center, Nijmegen, 6525 GA, The Netherlands
| |
Collapse
|
12
|
Rocha JJ, Jayaram SA, Stevens TJ, Muschalik N, Shah RD, Emran S, Robles C, Freeman M, Munro S. Functional unknomics: Systematic screening of conserved genes of unknown function. PLoS Biol 2023; 21:e3002222. [PMID: 37552676 PMCID: PMC10409296 DOI: 10.1371/journal.pbio.3002222] [Citation(s) in RCA: 20] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2023] [Accepted: 06/27/2023] [Indexed: 08/10/2023] Open
Abstract
The human genome encodes approximately 20,000 proteins, many still uncharacterised. It has become clear that scientific research tends to focus on well-studied proteins, leading to a concern that poorly understood genes are unjustifiably neglected. To address this, we have developed a publicly available and customisable "Unknome database" that ranks proteins based on how little is known about them. We applied RNA interference (RNAi) in Drosophila to 260 unknown genes that are conserved between flies and humans. Knockdown of some genes resulted in loss of viability, and functional screening of the rest revealed hits for fertility, development, locomotion, protein quality control, and resilience to stress. CRISPR/Cas9 gene disruption validated a component of Notch signalling and 2 genes contributing to male fertility. Our work illustrates the importance of poorly understood genes, provides a resource to accelerate future research, and highlights a need to support database curation to ensure that misannotation does not erode our awareness of our own ignorance.
Collapse
Affiliation(s)
- João J. Rocha
- MRC Laboratory of Molecular Biology, Cambridge, United Kingdom
| | | | - Tim J. Stevens
- MRC Laboratory of Molecular Biology, Cambridge, United Kingdom
| | | | - Rajen D. Shah
- Centre for Mathematical Sciences, University of Cambridge, Cambridge, United Kingdom
| | - Sahar Emran
- MRC Laboratory of Molecular Biology, Cambridge, United Kingdom
| | - Cristina Robles
- MRC Laboratory of Molecular Biology, Cambridge, United Kingdom
| | - Matthew Freeman
- MRC Laboratory of Molecular Biology, Cambridge, United Kingdom
- Sir William Dunn School of Pathology, University of Oxford, Oxford, United Kingdom
| | - Sean Munro
- MRC Laboratory of Molecular Biology, Cambridge, United Kingdom
| |
Collapse
|
13
|
Tantoso E, Eisenhaber B, Sinha S, Jensen LJ, Eisenhaber F. About the dark corners in the gene function space of Escherichia coli remaining without illumination by scientific literature. Biol Direct 2023; 18:7. [PMID: 36855185 PMCID: PMC9976479 DOI: 10.1186/s13062-023-00362-0] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2022] [Accepted: 02/21/2023] [Indexed: 03/02/2023] Open
Abstract
BACKGROUND Although Escherichia coli (E. coli) is the most studied prokaryote organism in the history of life sciences, many molecular mechanisms and gene functions encoded in its genome remain to be discovered. This work aims at quantifying the illumination of the E. coli gene function space by the scientific literature and how close we are towards the goal of a complete list of E. coli gene functions. RESULTS The scientific literature about E. coli protein-coding genes has been mapped onto the genome via the mentioning of names for genomic regions in scientific articles both for the case of the strain K-12 MG1655 as well as for the 95%-threshold softcore genome of 1324 E. coli strains with known complete genome. The article match was quantified with the ratio of a given gene name's occurrence to the mentioning of any gene names in the paper. The various genome regions have an extremely uneven literature coverage. A group of elite genes with ≥ 100 full publication equivalents (FPEs, FPE = 1 is an idealized publication devoted to just a single gene) attracts the lion share of the papers. For K-12, ~ 65% of the literature covers just 342 elite genes; for the softcore genome, ~ 68% of the FPEs is about only 342 elite gene families (GFs). We also find that most genes/GFs have at least one mentioning in a dedicated scientific article (with the exception of at least 137 protein-coding transcripts for K-12 and 26 GFs from the softcore genome). Whereas the literature growth rates were highest for uncharacterized or understudied genes until 2005-2010 compared with other groups of genes, they became negative thereafter. At the same time, literature for anyhow well-studied genes started to grow explosively with threshold T10 (≥ 10 FPEs). Typically, a body of ~ 20 actual articles generated over ~ 15 years of research effort was necessary to reach T10. Lineage-specific co-occurrence analysis of genes belonging to the accessory genome of E. coli together with genomic co-localization and sequence-analytic exploration hints previously completely uncharacterized genes yahV and yddL being associated with osmotic stress response/motility mechanisms. CONCLUSION If the numbers of scientific articles about uncharacterized and understudied genes remain at least at present levels, full gene function lists for the strain K-12 MG1655 and the E. coli softcore genome are in reach within the next 25-30 years. Once the literature body for a gene crosses 10 FPEs, most of the critical fundamental research risk appears overcome and steady incremental research becomes possible.
Collapse
Affiliation(s)
- Erwin Tantoso
- Agency for Science, Technology and Research (A*STAR), Genome Institute of Singapore (GIS), 60 Biopolis Street, Singapore, 138672, Republic of Singapore.,Agency for Science, Technology and Research (A*STAR), Bioinformatics Institute (BII), 30 Biopolis Street #07-01, Matrix Building, Singapore, 138671, Republic of Singapore
| | - Birgit Eisenhaber
- Agency for Science, Technology and Research (A*STAR), Genome Institute of Singapore (GIS), 60 Biopolis Street, Singapore, 138672, Republic of Singapore.,Agency for Science, Technology and Research (A*STAR), Bioinformatics Institute (BII), 30 Biopolis Street #07-01, Matrix Building, Singapore, 138671, Republic of Singapore
| | - Swati Sinha
- Agency for Science, Technology and Research (A*STAR), Genome Institute of Singapore (GIS), 60 Biopolis Street, Singapore, 138672, Republic of Singapore.,Agency for Science, Technology and Research (A*STAR), Bioinformatics Institute (BII), 30 Biopolis Street #07-01, Matrix Building, Singapore, 138671, Republic of Singapore.,European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Lars Juhl Jensen
- Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| | - Frank Eisenhaber
- Agency for Science, Technology and Research (A*STAR), Genome Institute of Singapore (GIS), 60 Biopolis Street, Singapore, 138672, Republic of Singapore. .,Agency for Science, Technology and Research (A*STAR), Bioinformatics Institute (BII), 30 Biopolis Street #07-01, Matrix Building, Singapore, 138671, Republic of Singapore. .,School of Biological Sciences, Nanyang Technological University, 60 Nanyang Drive, Singapore, 637551, Republic of Singapore.
| |
Collapse
|
14
|
Byrne JA, Park Y, Richardson RAK, Pathmendra P, Sun M, Stoeger T. Protection of the human gene research literature from contract cheating organizations known as research paper mills. Nucleic Acids Res 2022; 50:12058-12070. [PMID: 36477580 PMCID: PMC9757046 DOI: 10.1093/nar/gkac1139] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2022] [Revised: 11/08/2022] [Accepted: 11/14/2022] [Indexed: 12/12/2022] Open
Abstract
Human gene research generates new biology insights with translational potential, yet few studies have considered the health of the human gene literature. The accessibility of human genes for targeted research, combined with unreasonable publication pressures and recent developments in scholarly publishing, may have created a market for low-quality or fraudulent human gene research articles, including articles produced by contract cheating organizations known as paper mills. This review summarises the evidence that paper mills contribute to the human gene research literature at scale and outlines why targeted gene research may be particularly vulnerable to systematic research fraud. To raise awareness of targeted gene research from paper mills, we highlight features of problematic manuscripts and publications that can be detected by gene researchers and/or journal staff. As improved awareness and detection could drive the further evolution of paper mill-supported publications, we also propose changes to academic publishing to more effectively deter and correct problematic publications at scale. In summary, the threat of paper mill-supported gene research highlights the need for all researchers to approach the literature with a more critical mindset, and demand publications that are underpinned by plausible research justifications, rigorous experiments and fully transparent reporting.
Collapse
Affiliation(s)
- Jennifer A Byrne
- School of Medical Sciences, Faculty of Medicine and Health, The University of Sydney, NSW, Australia
- NSW Health Statewide Biobank, NSW Health Pathology, Camperdown, NSW, Australia
| | - Yasunori Park
- School of Medical Sciences, Faculty of Medicine and Health, The University of Sydney, NSW, Australia
| | - Reese A K Richardson
- Department of Chemical and Biological Engineering, Northwestern University, Evanston, USA
| | - Pranujan Pathmendra
- School of Medical Sciences, Faculty of Medicine and Health, The University of Sydney, NSW, Australia
| | - Mengyi Sun
- Department of Chemical and Biological Engineering, Northwestern University, Evanston, USA
| | - Thomas Stoeger
- Department of Chemical and Biological Engineering, Northwestern University, Evanston, USA
- Successful Clinical Response in Pneumonia Therapy (SCRIPT) Systems Biology Center, Northwestern University, Evanston, USA
- Center for Genetic Medicine, Northwestern University School of Medicine, Chicago, USA
| |
Collapse
|
15
|
|
16
|
Kustatscher G, Collins T, Gingras AC, Guo T, Hermjakob H, Ideker T, Lilley KS, Lundberg E, Marcotte EM, Ralser M, Rappsilber J. An open invitation to the Understudied Proteins Initiative. Nat Biotechnol 2022; 40:815-817. [PMID: 35534555 DOI: 10.1038/s41587-022-01316-z] [Citation(s) in RCA: 25] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]
Affiliation(s)
- Georg Kustatscher
- Institute of Quantitative Biology, Biochemistry and Biotechnology, University of Edinburgh, Edinburgh, UK.
| | | | - Anne-Claude Gingras
- Lunenfeld-Tanenbaum Research Institute, Mount Sinai Hospital, Sinai Health System, Toronto, Ontario, Canada.,Department of Molecular Genetics, University of Toronto, Toronto, Ontario, Canada
| | - Tiannan Guo
- Zhejiang Provincial Laboratory of Life Sciences and Biomedicine, Key Laboratory of Structural Biology of Zhejiang Province, School of Life Sciences, Westlake University, Hangzhou, China.,Institute of Basic Medical Sciences, Westlake Institute for Advanced Study, Hangzhou, China
| | - Henning Hermjakob
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Cambridge, UK
| | - Trey Ideker
- Division of Genetics, Department of Medicine, University of California San Diego, La Jolla, CA, USA
| | - Kathryn S Lilley
- Cambridge Centre for Proteomics, Department of Biochemistry, University of Cambridge, Cambridge, UK
| | - Emma Lundberg
- Science for Life Laboratory, School of Engineering Sciences in Chemistry, Biotechnology and Health, KTH-Royal Institute of Technology, Stockholm, Sweden.,Department of Bioengineering, Stanford University, Stanford, CA, USA.,Department of Pathology, Stanford University, Stanford, CA, USA.,Chan Zuckerberg Biohub, San Francisco, CA, USA
| | - Edward M Marcotte
- Department of Molecular Biosciences, Center for Systems and Synthetic Biology, University of Texas at Austin, Austin, TX, USA
| | - Markus Ralser
- Department of Biochemistry, Charité University Medicine, Berlin, Germany.,The Molecular Biology of Metabolism Laboratory, The Francis Crick Institute, London, UK
| | - Juri Rappsilber
- Institute of Quantitative Biology, Biochemistry and Biotechnology, University of Edinburgh, Edinburgh, UK. .,Bioanalytics, Institute of Biotechnology, Technische Universität Berlin, Berlin, Germany. .,Wellcome Centre for Cell Biology, University of Edinburgh, Edinburgh, UK.
| |
Collapse
|
17
|
Tantoso E, Eisenhaber B, Eisenhaber F. Optimizing the Parametrization of Homologue Classification in the Pan-Genome Computation for a Bacterial Species: Case Study Streptococcus pyogenes. Methods Mol Biol 2022; 2449:299-324. [PMID: 35507269 DOI: 10.1007/978-1-0716-2095-3_13] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
The paradigm shift associated with the introduction of the pan-genome concept has drawn the attention from singular reference genomes toward the actual sequence diversity within organism populations, strain collections, clades, etc. A single genome is no longer sufficient to describe bacteria of interest, but instead, the genomic repertoire of all existing strains is the key to the metabolic, evolutionary, or pathogenic potential of a species. The classification of orthologous genes derived from a collection of taxonomically related genome sequences is central to bacterial pan-genome computational analysis. In this work, we present a review of methods for computing pan-genome gene clusters including their comparative analysis for the case of Streptococcus pyogenes strain genomes. We exhaustively scanned the parametrization space of the homologue searching procedures and find optimal parameters (sequence identity (60%) and coverage (50-60%) in the pairwise alignment) for the orthologous clustering of gene sequences. We find that the sequence identity threshold influences the number of gene families ~3 times stronger than the sequence coverage threshold.
Collapse
Affiliation(s)
- Erwin Tantoso
- Bioinformatics Institute (BII), Agency for Science, Technology and Research (A*STAR), Singapore, Republic of Singapore
| | - Birgit Eisenhaber
- Bioinformatics Institute (BII), Agency for Science, Technology and Research (A*STAR), Singapore, Republic of Singapore
- Genome Institute Singapore (GIS), Agency for Science, Technology and Research (A*STAR), Singapore, Republic of Singapore
| | - Frank Eisenhaber
- Genome Institute and Bioinformatics Institute, Singapore, Singapore.
| |
Collapse
|
18
|
Affiliation(s)
- Frank Eisenhaber
- Bioinformatics Institute (BII), Agency for Science, Technology and Research (A*STAR), Singapore.,Genome Institute of Singapore (GIS), Agency for Science, Technology and Research (A*STAR), Singapore.,School of Biological Sciences, Nanyang Technological University (NTU), Singapore
| | - Chandra Verma
- Bioinformatics Institute (BII), Agency for Science, Technology and Research (A*STAR), Singapore.,School of Biological Sciences, Nanyang Technological University (NTU), Singapore.,Department of Biological Sciences, National University of Singapore, Singapore
| | - Tom Blundell
- Department of Biochemistry, University of Cambridge, Cambridge, UK
| |
Collapse
|
19
|
Niska-Blakie J, Gopinathan L, Low KN, Kien YL, Goh CMF, Caldez MJ, Pfeiffenberger E, Jones OS, Ong CB, Kurochkin IV, Coppola V, Tessarollo L, Choi H, Kanagasundaram Y, Eisenhaber F, Maurer-Stroh S, Kaldis P. Knockout of the non-essential gene SUGCT creates diet-linked, age-related microbiome disbalance with a diabetes-like metabolic syndrome phenotype. Cell Mol Life Sci 2020; 77:3423-3439. [PMID: 31722069 PMCID: PMC7426296 DOI: 10.1007/s00018-019-03359-z] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2019] [Revised: 10/23/2019] [Accepted: 10/29/2019] [Indexed: 02/07/2023]
Abstract
SUGCT (C7orf10) is a mitochondrial enzyme that synthesizes glutaryl-CoA from glutarate in tryptophan and lysine catabolism, but it has not been studied in vivo. Although mutations in Sugct lead to Glutaric Aciduria Type 3 disease in humans, patients remain largely asymptomatic despite high levels of glutarate in the urine. To study the disease mechanism, we generated SugctKO mice and uncovered imbalanced lipid and acylcarnitine metabolism in kidney in addition to changes in the gut microbiome. After SugctKO mice were treated with antibiotics, metabolites were comparable to WT, indicating that the microbiome affects metabolism in SugctKO mice. SUGCT loss of function contributes to gut microbiota dysbiosis, leading to age-dependent pathological changes in kidney, liver, and adipose tissue. This is associated with an obesity-related phenotype that is accompanied by lipid accumulation in kidney and liver, as well as "crown-like" structures in adipocytes. Furthermore, we show that the SugctKO kidney pathology is accelerated and exacerbated by a high-lysine diet. Our study highlights the importance of non-essential genes with no readily detectable early phenotype, but with substantial contributions to the development of age-related pathologies, which result from an interplay between genetic background, microbiome, and diet in the health of mammals.
Collapse
Affiliation(s)
- Joanna Niska-Blakie
- Institute of Molecular and Cell Biology (IMCB), A*STAR (Agency for Science, Technology and Research), 61 Biopolis Drive, Proteos #3-09, Singapore, 138673, Republic of Singapore
- Bioinformatics Institute (BII), A*STAR, Singapore, 138671, Republic of Singapore
| | - Lakshmi Gopinathan
- Institute of Molecular and Cell Biology (IMCB), A*STAR (Agency for Science, Technology and Research), 61 Biopolis Drive, Proteos #3-09, Singapore, 138673, Republic of Singapore
| | - Kia Ngee Low
- Bioinformatics Institute (BII), A*STAR, Singapore, 138671, Republic of Singapore
| | - Yang Lay Kien
- Bioinformatics Institute (BII), A*STAR, Singapore, 138671, Republic of Singapore
| | - Christine M F Goh
- Institute of Molecular and Cell Biology (IMCB), A*STAR (Agency for Science, Technology and Research), 61 Biopolis Drive, Proteos #3-09, Singapore, 138673, Republic of Singapore
| | - Matias J Caldez
- Institute of Molecular and Cell Biology (IMCB), A*STAR (Agency for Science, Technology and Research), 61 Biopolis Drive, Proteos #3-09, Singapore, 138673, Republic of Singapore
- Department of Biochemistry, National University of Singapore (NUS), Singapore, 117597, Republic of Singapore
| | - Elisabeth Pfeiffenberger
- Institute of Molecular and Cell Biology (IMCB), A*STAR (Agency for Science, Technology and Research), 61 Biopolis Drive, Proteos #3-09, Singapore, 138673, Republic of Singapore
| | - Oliver S Jones
- Institute of Molecular and Cell Biology (IMCB), A*STAR (Agency for Science, Technology and Research), 61 Biopolis Drive, Proteos #3-09, Singapore, 138673, Republic of Singapore
| | - Chee Bing Ong
- Institute of Molecular and Cell Biology (IMCB), A*STAR (Agency for Science, Technology and Research), 61 Biopolis Drive, Proteos #3-09, Singapore, 138673, Republic of Singapore
| | - Igor V Kurochkin
- Bioinformatics Institute (BII), A*STAR, Singapore, 138671, Republic of Singapore
| | - Vincenzo Coppola
- Department of Cancer Biology and Genetics, The Ohio State University, 988 Biomedical Research Tower, 460 West 12th Ave, Columbus, OH, 43210, USA
| | - Lino Tessarollo
- Mouse Cancer Genetics Program, National Cancer Institute, NCI-Frederick, Bldg. 560, 1050 Boyles Street, Frederick, MD, 21702-1201, USA
| | - Hyungwon Choi
- Institute of Molecular and Cell Biology (IMCB), A*STAR (Agency for Science, Technology and Research), 61 Biopolis Drive, Proteos #3-09, Singapore, 138673, Republic of Singapore
- Department of Medicine, National University of Singapore (NUS), Singapore, 117597, Republic of Singapore
| | | | - Frank Eisenhaber
- Bioinformatics Institute (BII), A*STAR, Singapore, 138671, Republic of Singapore
- School of Computer Science and Engineering (SCSE), Nanyang Technological University (NTU), Singapore, 637553, Republic of Singapore
| | - Sebastian Maurer-Stroh
- Bioinformatics Institute (BII), A*STAR, Singapore, 138671, Republic of Singapore.
- Department of Biological Sciences (DBS), National University of Singapore (NUS), 14 Science Drive 4, Singapore, 117597, Republic of Singapore.
| | - Philipp Kaldis
- Institute of Molecular and Cell Biology (IMCB), A*STAR (Agency for Science, Technology and Research), 61 Biopolis Drive, Proteos #3-09, Singapore, 138673, Republic of Singapore.
- Department of Biochemistry, National University of Singapore (NUS), Singapore, 117597, Republic of Singapore.
- Department of Clinical Sciences, Lund University, Clinical Research Centre (CRC), Box 50332, 202 13, Malmö, Sweden.
| |
Collapse
|
20
|
Uversky VN. Bringing Darkness to Light: Intrinsic Disorder as a Means to Dig into the Dark Proteome. Proteomics 2019; 18:e1800352. [PMID: 30334344 DOI: 10.1002/pmic.201800352] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Affiliation(s)
- Vladimir N Uversky
- Department of Molecular Medicine, Morsani College of Medicine, University of South Florida, Tampa, FL, 33612, USA.,Laboratory of New Methods in Biology, Institute for Biological Instrumentation, Russian Academy of Sciences, Pushchino, 142290, Moscow Region, Russia
| |
Collapse
|
21
|
Tantoso E, Wong WC, Tay WH, Lee J, Sinha S, Eisenhaber B, Eisenhaber F. Hypocrisy Around Medical Patient Data: Issues of Access for Biomedical Research, Data Quality, Usefulness for the Purpose and Omics Data as Game Changer. Asian Bioeth Rev 2019; 11:189-207. [PMID: 33717311 PMCID: PMC7747340 DOI: 10.1007/s41649-019-00085-3] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2018] [Revised: 04/23/2019] [Accepted: 04/30/2019] [Indexed: 11/14/2022] Open
Abstract
Whether due to simplicity or hypocrisy, the question of access to patient data for biomedical research is widely seen in the public discourse only from the angle of patient privacy. At the same time, the desire to live and to live without disability is of much higher value to the patients. This goal can only be achieved by extracting research insight from patient data in addition to working on model organisms, something that is well understood by many patients. Yet, most biomedical researchers working outside of clinics and hospitals are denied access to patient records when, at the same time, clinicians who guard the patient data are not optimally prepared for the data’s analysis. Medical data collection is a time- and cost-intensive process that is most of all tedious, with few elements of intellectual and emotional satisfaction on its own. In this process, clinicians and bioinformaticians, each group with their own interests, have to join forces with the goal to generate medical data sets both from clinical trials and from routinely collected electronic health records that are, as much as possible, free from errors and obvious inconsistencies. The data cleansing effort as we have learned during curation of Singaporean clinical trial data is not a trivial task. The introduction of omics and sophisticated imaging modalities into clinical practice that are only partially interpreted in terms of diagnosis and therapy with today’s level of knowledge warrant the creation of clinical databases with full patient history. This opens up opportunities for re-analyses and cross-trial studies at future time points with more sophisticated analyses of the same data, the collection of which is very expensive.
Collapse
Affiliation(s)
- Erwin Tantoso
- Bioinformatics Institute (BII), Agency for Science and Technology (ASTAR), 30 Biopolis Street, #07-01, Matrix, Singapore, 138671 Singapore
| | - Wing-Cheong Wong
- Bioinformatics Institute (BII), Agency for Science and Technology (ASTAR), 30 Biopolis Street, #07-01, Matrix, Singapore, 138671 Singapore
| | - Wei Hong Tay
- Bioinformatics Institute (BII), Agency for Science and Technology (ASTAR), 30 Biopolis Street, #07-01, Matrix, Singapore, 138671 Singapore
| | - Joanne Lee
- Bioinformatics Institute (BII), Agency for Science and Technology (ASTAR), 30 Biopolis Street, #07-01, Matrix, Singapore, 138671 Singapore
| | - Swati Sinha
- Bioinformatics Institute (BII), Agency for Science and Technology (ASTAR), 30 Biopolis Street, #07-01, Matrix, Singapore, 138671 Singapore
| | - Birgit Eisenhaber
- Bioinformatics Institute (BII), Agency for Science and Technology (ASTAR), 30 Biopolis Street, #07-01, Matrix, Singapore, 138671 Singapore
| | - Frank Eisenhaber
- Bioinformatics Institute (BII), Agency for Science and Technology (ASTAR), 30 Biopolis Street, #07-01, Matrix, Singapore, 138671 Singapore.,School of Computer Science and Engineering (SCSE), Nanyang Technological University (NTU), 50 Nanyang Drive, Singapore, 637553 Singapore
| |
Collapse
|
22
|
Sinha S, Eisenhaber B, Jensen LJ, Kalbuaji B, Eisenhaber F. Darkness in the Human Gene and Protein Function Space: Widely Modest or Absent Illumination by the Life Science Literature and the Trend for Fewer Protein Function Discoveries Since 2000. Proteomics 2018; 18:e1800093. [PMID: 30265449 PMCID: PMC6282819 DOI: 10.1002/pmic.201800093] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2018] [Revised: 09/07/2018] [Indexed: 12/15/2022]
Abstract
The mentioning of gene names in the body of the scientific literature 1901-2017 and their fractional counting is used as a proxy to assess the level of biological function discovery. A literature score of one has been defined as full publication equivalent (FPE), the amount of literature necessary to achieve one publication solely dedicated to a gene. It has been found that less than 5000 human genes have each at least 100 FPEs in the available literature corpus. This group of elite genes (4817 protein-coding genes, 119 non-coding RNAs) attracts the overwhelming majority of the scientific literature about genes. Yet, thousands of proteins have never been mentioned at all, ≈2000 further proteins have not even one FPE of literature and, for ≈4600 additional proteins, the FPE count is below 10. The protein function discovery rate measured as numbers of proteins first mentioned or crossing a threshold of accumulated FPEs in a given year has grown until 2000 but is in decline thereafter. This drop is partially offset by function discoveries for non-coding RNAs. The full human genome sequencing does not boost the function discovery rate. Since 2000, the fastest growing group in the literature is that with at least 500 FPEs per gene.
Collapse
Affiliation(s)
- Swati Sinha
- Bioinformatics Institute (BII)Agency for Science and Technology (A*STAR)Matrix138671Singapore
| | - Birgit Eisenhaber
- Bioinformatics Institute (BII)Agency for Science and Technology (A*STAR)Matrix138671Singapore
| | - Lars Juhl Jensen
- Novo Nordisk Foundation Center for Protein ResearchFaculty of Health and Medical SciencesUniversity of CopenhagenDK-2200 CopenhagenDenmark
| | - Bharata Kalbuaji
- Bioinformatics Institute (BII)Agency for Science and Technology (A*STAR)Matrix138671Singapore
| | - Frank Eisenhaber
- Bioinformatics Institute (BII)Agency for Science and Technology (A*STAR)Matrix138671Singapore
- School of Computer Science and Engineering (SCSE)Nanyang Technological University (NTU)637553Singapore
| |
Collapse
|