1
|
Wang Z, Wei Z. PT-KGNN: A framework for pre-training biomedical knowledge graphs with graph neural networks. Comput Biol Med 2024; 178:108768. [PMID: 38936076 DOI: 10.1016/j.compbiomed.2024.108768] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2024] [Revised: 05/23/2024] [Accepted: 06/15/2024] [Indexed: 06/29/2024]
Abstract
Biomedical knowledge graphs (KGs) serve as comprehensive data repositories that contain rich information about nodes and edges, providing modeling capabilities for complex relationships among biological entities. Many approaches either learn node features through traditional machine learning methods, or leverage graph neural networks (GNNs) to directly learn features of target nodes in the biomedical KGs and utilize them for downstream tasks. Motivated by the pre-training technique in natural language processing (NLP), we propose a framework named PT-KGNN (Pre-Training the biomedical KG with GNNs) to learn embeddings of nodes in a broader context by applying GNNs on the biomedical KG. We design several experiments to evaluate the effectivity of our proposed framework and the impact of the scale of KGs. The results of tasks consistently improve as the scale of the biomedical KG used for pre-training increases. Pre-training on large-scale biomedical KGs significantly enhances the drug-drug interaction (DDI) and drug-disease association (DDA) prediction performance on the independent dataset. The embeddings derived from a larger biomedical KG have demonstrated superior performance compared to those obtained from a smaller KG. By applying pre-training techniques on biomedical KGs, rich semantic and structural information can be learned, leading to enhanced performance on downstream tasks. it is evident that pre-training techniques hold tremendous potential and wide-ranging applications in bioinformatics.
Collapse
Affiliation(s)
- Zhenxing Wang
- School of Data Science, Fudan University, 220 Handan Rd., Shanghai, 200433, China.
| | - Zhongyu Wei
- School of Data Science, Fudan University, 220 Handan Rd., Shanghai, 200433, China.
| |
Collapse
|
2
|
Devarakonda MV, Mohanty S, Sunkishala RR, Mallampalli N, Liu X. Clinical trial recommendations using Semantics-Based inductive inference and knowledge graph embeddings. J Biomed Inform 2024; 154:104627. [PMID: 38561170 DOI: 10.1016/j.jbi.2024.104627] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2023] [Revised: 02/06/2024] [Accepted: 03/20/2024] [Indexed: 04/04/2024]
Abstract
OBJECTIVE Designing a new clinical trial entails many decisions, such as defining a cohort and setting the study objectives to name a few, and therefore can benefit from recommendations based on exhaustive mining of past clinical trial records. This study proposes an approach based on knowledge graph embeddings and semantics-driven inductive inference for generating such recommendations. METHOD The proposed recommendation methodology is based on neural embeddings trained on first-of-its-kind knowledge graph constructed from clinical trials data. The methodology includes design of a knowledge graph for clinical trial data, evaluation of various knowledge graph embedding techniques for it, application of a novel inductive inference method using these embeddings, and generation of recommendations for clinical trial design. The study uses freely available data from clinicaltrials.gov and related sources. RESULTS The proposed approach for recommendations obtained relevance scores ranging from 70% to 83%. These scores were determined by evaluating the text similarity of recommended elements to actual elements used in clinical trials that are in progress. Furthermore, the most pertinent recommendations were consistently located towards the top of the list, indicating the effectiveness of our method. CONCLUSION Our study suggests that inductive inference using node semantics is a viable approach for generating recommendations using graphs neural embeddings, and that there is a potential for improvement in training graph embeddings using node semantics.
Collapse
Affiliation(s)
| | | | | | | | - Xiong Liu
- Biomedical Research, Novartis, Cambridge, MA, USA
| |
Collapse
|
3
|
Gualdi F, Oliva B, Piñero J. Predicting gene disease associations with knowledge graph embeddings for diseases with curtailed information. NAR Genom Bioinform 2024; 6:lqae049. [PMID: 38745993 PMCID: PMC11091931 DOI: 10.1093/nargab/lqae049] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2023] [Revised: 03/08/2024] [Accepted: 04/24/2024] [Indexed: 05/16/2024] Open
Abstract
Knowledge graph embeddings (KGE) are a powerful technique used in the biomedical domain to represent biological knowledge in a low dimensional space. However, a deep understanding of these methods is still missing, and, in particular, regarding their applications to prioritize genes associated with complex diseases with reduced genetic information. In this contribution, we built a knowledge graph (KG) by integrating heterogeneous biomedical data and generated KGE by implementing state-of-the-art methods, and two novel algorithms: Dlemb and BioKG2vec. Extensive testing of the embeddings with unsupervised clustering and supervised methods showed that KGE can be successfully implemented to predict genes associated with diseases and that our novel approaches outperform most existing algorithms in both scenarios. Our findings underscore the significance of data quality, preprocessing, and integration in achieving accurate predictions. Additionally, we applied KGE to predict genes linked to Intervertebral Disc Degeneration (IDD) and illustrated that functions pertinent to the disease are enriched within the prioritized gene set.
Collapse
Affiliation(s)
- Francesco Gualdi
- Integrative Biomedical Informatics, Research Programme on Biomedical Informatics (IBI-GRIB), Hospital del Mar Medical Research Institute (IMIM), Department of Experimental and Health Sciences, Universitat Pompeu Fabra, C/Dr Aiguader 88, E-08003 Barcelona, Spain
- Structural Bioinformatics Lab, Research Programme on Biomedical Informatics (SBI-GRIB), Department of Experimental and Health Sciences, Universitat Pompeu Fabra, C/Dr Aiguader 88, E-08003 Barcelona, Spain
| | - Baldomero Oliva
- Structural Bioinformatics Lab, Research Programme on Biomedical Informatics (SBI-GRIB), Department of Experimental and Health Sciences, Universitat Pompeu Fabra, C/Dr Aiguader 88, E-08003 Barcelona, Spain
| | - Janet Piñero
- Integrative Biomedical Informatics, Research Programme on Biomedical Informatics (IBI-GRIB), Hospital del Mar Medical Research Institute (IMIM), Department of Experimental and Health Sciences, Universitat Pompeu Fabra, C/Dr Aiguader 88, E-08003 Barcelona, Spain
- Medbioinformatics Solutions SL, Barcelona, Spain
| |
Collapse
|
4
|
Offensperger F, Tin G, Duran-Frigola M, Hahn E, Dobner S, Ende CWA, Strohbach JW, Rukavina A, Brennsteiner V, Ogilvie K, Marella N, Kladnik K, Ciuffa R, Majmudar JD, Field SD, Bensimon A, Ferrari L, Ferrada E, Ng A, Zhang Z, Degliesposti G, Boeszoermenyi A, Martens S, Stanton R, Müller AC, Hannich JT, Hepworth D, Superti-Furga G, Kubicek S, Schenone M, Winter GE. Large-scale chemoproteomics expedites ligand discovery and predicts ligand behavior in cells. Science 2024; 384:eadk5864. [PMID: 38662832 DOI: 10.1126/science.adk5864] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2023] [Accepted: 03/22/2024] [Indexed: 05/04/2024]
Abstract
Chemical modulation of proteins enables a mechanistic understanding of biology and represents the foundation of most therapeutics. However, despite decades of research, 80% of the human proteome lacks functional ligands. Chemical proteomics has advanced fragment-based ligand discovery toward cellular systems, but throughput limitations have stymied the scalable identification of fragment-protein interactions. We report proteome-wide maps of protein-binding propensity for 407 structurally diverse small-molecule fragments. We verified that identified interactions can be advanced to active chemical probes of E3 ubiquitin ligases, transporters, and kinases. Integrating machine learning binary classifiers further enabled interpretable predictions of fragment behavior in cells. The resulting resource of fragment-protein interactions and predictive models will help to elucidate principles of molecular recognition and expedite ligand discovery efforts for hitherto undrugged proteins.
Collapse
Affiliation(s)
- Fabian Offensperger
- CeMM, Research Center for Molecular Medicine of the Austrian Academy of Sciences, 1090 Vienna, Austria
| | - Gary Tin
- CeMM, Research Center for Molecular Medicine of the Austrian Academy of Sciences, 1090 Vienna, Austria
| | - Miquel Duran-Frigola
- CeMM, Research Center for Molecular Medicine of the Austrian Academy of Sciences, 1090 Vienna, Austria
- Ersilia Open Source Initiative, Cambridge CB1 3DE, UK
| | - Elisa Hahn
- CeMM, Research Center for Molecular Medicine of the Austrian Academy of Sciences, 1090 Vienna, Austria
| | - Sarah Dobner
- CeMM, Research Center for Molecular Medicine of the Austrian Academy of Sciences, 1090 Vienna, Austria
| | | | | | - Andrea Rukavina
- CeMM, Research Center for Molecular Medicine of the Austrian Academy of Sciences, 1090 Vienna, Austria
| | - Vincenth Brennsteiner
- CeMM, Research Center for Molecular Medicine of the Austrian Academy of Sciences, 1090 Vienna, Austria
| | - Kevin Ogilvie
- Medicine Design, Pfizer Worldwide Research and Development, Groton, CT 06340, USA
| | - Nara Marella
- CeMM, Research Center for Molecular Medicine of the Austrian Academy of Sciences, 1090 Vienna, Austria
| | - Katharina Kladnik
- CeMM, Research Center for Molecular Medicine of the Austrian Academy of Sciences, 1090 Vienna, Austria
| | - Rodolfo Ciuffa
- CeMM, Research Center for Molecular Medicine of the Austrian Academy of Sciences, 1090 Vienna, Austria
| | | | | | - Ariel Bensimon
- CeMM, Research Center for Molecular Medicine of the Austrian Academy of Sciences, 1090 Vienna, Austria
| | - Luca Ferrari
- Max Perutz Labs, Vienna Biocenter Campus (VBC), Vienna Biocenter 5, 1030 Vienna, Austria
- University of Vienna, Max Perutz Labs, Vienna Biocenter 5, 1030 Vienna, Austria
| | - Evandro Ferrada
- CeMM, Research Center for Molecular Medicine of the Austrian Academy of Sciences, 1090 Vienna, Austria
| | - Amanda Ng
- CeMM, Research Center for Molecular Medicine of the Austrian Academy of Sciences, 1090 Vienna, Austria
| | - Zhechun Zhang
- Molecular Informatics, Machine Learning and Computational Sciences, Early Clinical Development, Pfizer, Cambridge, MA 02139, USA
| | - Gianluca Degliesposti
- CeMM, Research Center for Molecular Medicine of the Austrian Academy of Sciences, 1090 Vienna, Austria
| | - Andras Boeszoermenyi
- CeMM, Research Center for Molecular Medicine of the Austrian Academy of Sciences, 1090 Vienna, Austria
| | - Sascha Martens
- Max Perutz Labs, Vienna Biocenter Campus (VBC), Vienna Biocenter 5, 1030 Vienna, Austria
- University of Vienna, Max Perutz Labs, Vienna Biocenter 5, 1030 Vienna, Austria
| | - Robert Stanton
- Molecular Informatics, Machine Learning and Computational Sciences, Early Clinical Development, Pfizer, Cambridge, MA 02139, USA
| | - André C Müller
- CeMM, Research Center for Molecular Medicine of the Austrian Academy of Sciences, 1090 Vienna, Austria
| | - J Thomas Hannich
- CeMM, Research Center for Molecular Medicine of the Austrian Academy of Sciences, 1090 Vienna, Austria
| | | | - Giulio Superti-Furga
- CeMM, Research Center for Molecular Medicine of the Austrian Academy of Sciences, 1090 Vienna, Austria
- Center for Physiology and Pharmacology, Medical University of Vienna, 1090 Vienna, Austria
| | - Stefan Kubicek
- CeMM, Research Center for Molecular Medicine of the Austrian Academy of Sciences, 1090 Vienna, Austria
| | | | - Georg E Winter
- CeMM, Research Center for Molecular Medicine of the Austrian Academy of Sciences, 1090 Vienna, Austria
| |
Collapse
|
5
|
Li Q, Hu Z, He J, Liu X, Liu Y, Wei J, Wu B, Lu X, He H, Zhang Y, He J, Li M, Wu C, Lv L, Wang Y, Zhou L, Zhang Q, Zhang J, Cheng X, Shao H, Lu X. Deciphering the comprehensive knowledgebase landscape featuring infertility with IDDB Xtra. Comput Biol Med 2024; 170:108105. [PMID: 38330823 DOI: 10.1016/j.compbiomed.2024.108105] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2023] [Revised: 01/15/2024] [Accepted: 02/04/2024] [Indexed: 02/10/2024]
Abstract
Infertility affects ∼15% of couples globally and half of cases are related to genetic disorders. Despite growing data and unprecedented improvements in high-throughput sequencing technologies, accumulated fertility-related issues concerning genetic diagnosis and potential treatment are urgent to be solved. However, there is a lack of comprehensive platforms that characterise various infertility-related records to provide research applications for exploring infertility in-depth and genetic counselling of infertility couple. To solve this problem, we provide IDDB Xtra by further integrating phenotypic manifestations, genomic datasets, epigenetics, modulators in collaboration with numerous interactive tools into our previous infertility database, IDDB. IDDB Xtra houses manually-curated 2369 genes of human and nine model organisms, 273 chromosomal abnormalities, 884 phenotypes, 60 genomic datasets, 464 epigenetic records, 1144 modulators relevant to infertility diagnosis and treatment. Additionally, IDDB Xtra incorporated customized graphical applications for researchers and clinicians to decipher in-depth disease mechanisms from the perspectives of developmental atlas, mutation effects, and clinical manifestations. Users can browse genes across developmental stages of human and mouse, filter candidate genes, mine potential variants and retrieve infertility biomedical network in an intuitive web interface. In summary, IDDB Xtra not only captures valuable research and data, but also provides useful applications to facilitate the genetic counselling and drug discovery of infertility. IDDB Xtra is freely available at https://mdl.shsmu.edu.cn/IDDB/and http://www.allostery.net/IDDB.
Collapse
Affiliation(s)
- Qian Li
- Department of Assisted Reproduction, Shanghai Ninth People's Hospital, Shanghai Jiao Tong University School of Medicine (SJTU-SM), Shanghai, 200011, China; Medicinal Bioinformatics Center, Shanghai Jiao Tong University School of Medicine (SJTU-SM), Shanghai, 200025, China
| | - Zhijie Hu
- Department of Assisted Reproduction, Shanghai Ninth People's Hospital, Shanghai Jiao Tong University School of Medicine (SJTU-SM), Shanghai, 200011, China
| | - Jiayin He
- Department of Assisted Reproduction, Shanghai Ninth People's Hospital, Shanghai Jiao Tong University School of Medicine (SJTU-SM), Shanghai, 200011, China; Medicinal Bioinformatics Center, Shanghai Jiao Tong University School of Medicine (SJTU-SM), Shanghai, 200025, China
| | - Xinyi Liu
- Medicinal Bioinformatics Center, Shanghai Jiao Tong University School of Medicine (SJTU-SM), Shanghai, 200025, China
| | - Yini Liu
- Department of Assisted Reproduction, Shanghai Ninth People's Hospital, Shanghai Jiao Tong University School of Medicine (SJTU-SM), Shanghai, 200011, China; Medicinal Bioinformatics Center, Shanghai Jiao Tong University School of Medicine (SJTU-SM), Shanghai, 200025, China
| | - Jiale Wei
- Medicinal Bioinformatics Center, Shanghai Jiao Tong University School of Medicine (SJTU-SM), Shanghai, 200025, China
| | - Binjian Wu
- Medicinal Bioinformatics Center, Shanghai Jiao Tong University School of Medicine (SJTU-SM), Shanghai, 200025, China
| | - Xun Lu
- Medicinal Bioinformatics Center, Shanghai Jiao Tong University School of Medicine (SJTU-SM), Shanghai, 200025, China
| | - Hongxi He
- Department of Assisted Reproduction, Shanghai Ninth People's Hospital, Shanghai Jiao Tong University School of Medicine (SJTU-SM), Shanghai, 200011, China; School of Pharmaceutical Sciences, Southern Medical University, Guangzhou, 510000, China
| | - Yuqi Zhang
- Department of Assisted Reproduction, Shanghai Ninth People's Hospital, Shanghai Jiao Tong University School of Medicine (SJTU-SM), Shanghai, 200011, China
| | - Jixiao He
- Medicinal Bioinformatics Center, Shanghai Jiao Tong University School of Medicine (SJTU-SM), Shanghai, 200025, China
| | - Mingyu Li
- Medicinal Bioinformatics Center, Shanghai Jiao Tong University School of Medicine (SJTU-SM), Shanghai, 200025, China
| | - Chengwei Wu
- Medicinal Bioinformatics Center, Shanghai Jiao Tong University School of Medicine (SJTU-SM), Shanghai, 200025, China
| | - Lijun Lv
- Medicinal Bioinformatics Center, Shanghai Jiao Tong University School of Medicine (SJTU-SM), Shanghai, 200025, China
| | - Yang Wang
- Medicinal Bioinformatics Center, Shanghai Jiao Tong University School of Medicine (SJTU-SM), Shanghai, 200025, China
| | - Linxuan Zhou
- Medicinal Bioinformatics Center, Shanghai Jiao Tong University School of Medicine (SJTU-SM), Shanghai, 200025, China
| | - Quan Zhang
- Medicinal Bioinformatics Center, Shanghai Jiao Tong University School of Medicine (SJTU-SM), Shanghai, 200025, China
| | - Jian Zhang
- Medicinal Bioinformatics Center, Shanghai Jiao Tong University School of Medicine (SJTU-SM), Shanghai, 200025, China; School of Pharmaceutical Sciences, Southern Medical University, Guangzhou, 510000, China.
| | - Xiaoyue Cheng
- Center for Reproductive Medicine, Renji Hospital, School of Medicine, Shanghai Jiao Tong University, Shanghai, 200135, China.
| | - Hongfang Shao
- Center of Reproductive Medicine, Department of Gynecology and Obstetrics, Shanghai Jiao Tong University School of Medicine Affiliated Sixth People's Hospital, Shanghai, 200233, China.
| | - Xuefeng Lu
- Department of Assisted Reproduction, Shanghai Ninth People's Hospital, Shanghai Jiao Tong University School of Medicine (SJTU-SM), Shanghai, 200011, China.
| |
Collapse
|
6
|
Lin CX, Guan Y, Li HD. Artificial intelligence approaches for molecular representation in drug response prediction. Curr Opin Struct Biol 2024; 84:102747. [PMID: 38091924 DOI: 10.1016/j.sbi.2023.102747] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2023] [Revised: 11/26/2023] [Accepted: 11/26/2023] [Indexed: 02/09/2024]
Abstract
Drug response prediction is essential for drug development and disease treatment. One key question in predicting drug response is the representation of molecules, which has been greatly advanced by artificial intelligence (AI) techniques in recent years. In this review, we first describe different types of representation methods, pinpointing their key principles and discussing their limitations. Thereafter we discuss potential ways how these methods could be further developed. We expect that this review will provide useful guidance for researchers in the community.
Collapse
Affiliation(s)
- Cui-Xiang Lin
- School of Mathematics and Computational Science, Xiangtan University, Xiangtan, 411105, Hunan Province, PR China
| | - Yuanfang Guan
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA.
| | - Hong-Dong Li
- School of Computer Science and Engineering, Hunan Provincial Key Lab on Bioinformatics, Central South University, Changsha, Hunan 410083, PR China.
| |
Collapse
|
7
|
James T, Hennig H. Knowledge Graphs and Their Applications in Drug Discovery. Methods Mol Biol 2024; 2716:203-221. [PMID: 37702941 DOI: 10.1007/978-1-0716-3449-3_9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/14/2023]
Abstract
Knowledge graphs represent information in the form of entities and relationships between those entities. Such a representation has multiple potential applications in drug discovery, including democratizing access to biomedical data, contextualizing or visualizing that data, and generating novel insights through the application of machine learning approaches. Knowledge graphs put data into context and therefore offer the opportunity to generate explainable predictions, which is a key topic in contemporary artificial intelligence. In this chapter, we outline some of the factors that need to be considered when constructing biomedical knowledge graphs, examine recent advances in mining such systems to gain insights for drug discovery, and identify potential future areas for further development.
Collapse
Affiliation(s)
- Tim James
- Evotec (UK) Ltd., Abingdon, Oxfordshire, UK.
| | | |
Collapse
|
8
|
Mizuno T, Kusuhara H. Investigation of normalization procedures for transcriptome profiles of compounds oriented toward practical study design. J Toxicol Sci 2024; 49:249-259. [PMID: 38825484 DOI: 10.2131/jts.49.249] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/04/2024]
Abstract
The transcriptome profile is a representative phenotype-based descriptor of compounds, widely acknowledged for its ability to effectively capture compound effects. However, the presence of batch differences is inevitable. Despite the existence of sophisticated statistical methods, many of them presume a substantial sample size. How should we design a transcriptome analysis to obtain robust compound profiles, particularly in the context of small datasets frequently encountered in practical scenarios? This study addresses this question by investigating the normalization procedures for transcriptome profiles, focusing on the baseline distribution employed in deriving biological responses as profiles. Firstly, we investigated two large GeneChip datasets, comparing the impact of different normalization procedures. Through an evaluation of the similarity between response profiles of biological replicates within each dataset and the similarity between response profiles of the same compound across datasets, we revealed that the baseline distribution defined by all samples within each batch under batch-corrected condition is a good choice for large datasets. Subsequently, we conducted a simulation to explore the influence of the number of control samples on the robustness of response profiles across datasets. The results offer insights into determining the suitable quantity of control samples for diminutive datasets. It is crucial to acknowledge that these conclusions stem from constrained datasets. Nevertheless, we believe that this study enhances our understanding of how to effectively leverage transcriptome profiles of compounds and promotes the accumulation of essential knowledge for the practical application of such profiles.
Collapse
Affiliation(s)
- Tadahaya Mizuno
- Laboratory of Molecular Pharmacokinetics, Graduate School of Pharmaceutical Sciences, The University of Tokyo
| | - Hiroyuki Kusuhara
- Laboratory of Molecular Pharmacokinetics, Graduate School of Pharmaceutical Sciences, The University of Tokyo
| |
Collapse
|
9
|
Chen J, Gu Z, Lai L, Pei J. In silico protein function prediction: the rise of machine learning-based approaches. MEDICAL REVIEW (2021) 2023; 3:487-510. [PMID: 38282798 PMCID: PMC10808870 DOI: 10.1515/mr-2023-0038] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/14/2023] [Accepted: 10/11/2023] [Indexed: 01/30/2024]
Abstract
Proteins function as integral actors in essential life processes, rendering the realm of protein research a fundamental domain that possesses the potential to propel advancements in pharmaceuticals and disease investigation. Within the context of protein research, an imperious demand arises to uncover protein functionalities and untangle intricate mechanistic underpinnings. Due to the exorbitant costs and limited throughput inherent in experimental investigations, computational models offer a promising alternative to accelerate protein function annotation. In recent years, protein pre-training models have exhibited noteworthy advancement across multiple prediction tasks. This advancement highlights a notable prospect for effectively tackling the intricate downstream task associated with protein function prediction. In this review, we elucidate the historical evolution and research paradigms of computational methods for predicting protein function. Subsequently, we summarize the progress in protein and molecule representation as well as feature extraction techniques. Furthermore, we assess the performance of machine learning-based algorithms across various objectives in protein function prediction, thereby offering a comprehensive perspective on the progress within this field.
Collapse
Affiliation(s)
- Jiaxiao Chen
- Center for Quantitative Biology, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing, China
| | - Zhonghui Gu
- Peking-Tsinghua Center for Life Sciences, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing, China
| | - Luhua Lai
- Center for Quantitative Biology, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing, China
- Peking-Tsinghua Center for Life Sciences, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing, China
- BNLMS, College of Chemistry and Molecular Engineering, Peking University, Beijing, China
- Research Unit of Drug Design Method, Chinese Academy of Medical Sciences (2021RU014), Beijing, China
| | - Jianfeng Pei
- Center for Quantitative Biology, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing, China
- Research Unit of Drug Design Method, Chinese Academy of Medical Sciences (2021RU014), Beijing, China
| |
Collapse
|
10
|
Sánchez-Valle J, Valencia A. Molecular bases of comorbidities: present and future perspectives. Trends Genet 2023; 39:773-786. [PMID: 37482451 DOI: 10.1016/j.tig.2023.06.003] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2023] [Revised: 06/12/2023] [Accepted: 06/12/2023] [Indexed: 07/25/2023]
Abstract
Co-occurrence of diseases decreases patient quality of life, complicates treatment choices, and increases mortality. Analyses of electronic health records present a complex scenario of comorbidity relationships that vary by age, sex, and cohort under study. The study of similarities between diseases using 'omics data, such as genes altered in diseases, gene expression, proteome, and microbiome, are fundamental to uncovering the origin of, and potential treatment for, comorbidities. Recent studies have produced a first generation of genetic interpretations for as much as 46% of the comorbidities described in large cohorts. Integrating different sources of molecular information and using artificial intelligence (AI) methods are promising approaches for the study of comorbidities. They may help to improve the treatment of comorbidities, including the potential repositioning of drugs.
Collapse
Affiliation(s)
- Jon Sánchez-Valle
- Life Sciences Department, Barcelona Supercomputing Center, Barcelona, 08034, Spain.
| | - Alfonso Valencia
- Life Sciences Department, Barcelona Supercomputing Center, Barcelona, 08034, Spain; ICREA, Barcelona, 08010, Spain.
| |
Collapse
|
11
|
Fernández-Torras A, Locatelli M, Bertoni M, Aloy P. BQsupports: systematic assessment of the support and novelty of new biomedical associations. Bioinformatics 2023; 39:btad581. [PMID: 37725353 PMCID: PMC10521632 DOI: 10.1093/bioinformatics/btad581] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2023] [Revised: 09/04/2023] [Accepted: 09/15/2023] [Indexed: 09/21/2023] Open
Abstract
MOTIVATION Living a Big Data era in Biomedicine, there is an unmet need to systematically assess experimental observations in the context of available information. This assessment would offer a means for a comprehensive and robust validation of biomedical data results and provide an initial estimate of the potential novelty of the findings. RESULTS Here we present BQsupports, a web-based tool built upon the Bioteque biomedical descriptors that systematically analyzes and quantifies the current support to a given set of observations. The tool relies on over 1000 distinct types of biomedical descriptors, covering over 11 different biological and chemical entities, including genes, cell lines, diseases, and small molecules. By exploring hundreds of descriptors, BQsupports provide support scores for each observation across a wide variety of biomedical contexts. These scores are then aggregated to summarize the biomedical support of the assessed dataset as a whole. Finally, the BQsupports also suggests predictive features of the given dataset, which can be exploited in downstream machine learning applications. AVAILABILITY AND IMPLEMENTATION The web application and underlying data are available online (https://bqsupports.irbbarcelona.org).
Collapse
Affiliation(s)
- Adrià Fernández-Torras
- Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology, Barcelona, Catalonia, Spain
| | - Martina Locatelli
- Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology, Barcelona, Catalonia, Spain
| | - Martino Bertoni
- Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology, Barcelona, Catalonia, Spain
| | - Patrick Aloy
- Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology, Barcelona, Catalonia, Spain
- Institució Catalana de Recerca i Estudis Avançats (ICREA), Barcelona, Catalonia, Spain
| |
Collapse
|
12
|
Renaux A, Terwagne C, Cochez M, Tiddi I, Nowé A, Lenaerts T. A knowledge graph approach to predict and interpret disease-causing gene interactions. BMC Bioinformatics 2023; 24:324. [PMID: 37644440 PMCID: PMC10463539 DOI: 10.1186/s12859-023-05451-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2023] [Accepted: 08/22/2023] [Indexed: 08/31/2023] Open
Abstract
BACKGROUND Understanding the impact of gene interactions on disease phenotypes is increasingly recognised as a crucial aspect of genetic disease research. This trend is reflected by the growing amount of clinical research on oligogenic diseases, where disease manifestations are influenced by combinations of variants on a few specific genes. Although statistical machine-learning methods have been developed to identify relevant genetic variant or gene combinations associated with oligogenic diseases, they rely on abstract features and black-box models, posing challenges to interpretability for medical experts and impeding their ability to comprehend and validate predictions. In this work, we present a novel, interpretable predictive approach based on a knowledge graph that not only provides accurate predictions of disease-causing gene interactions but also offers explanations for these results. RESULTS We introduce BOCK, a knowledge graph constructed to explore disease-causing genetic interactions, integrating curated information on oligogenic diseases from clinical cases with relevant biomedical networks and ontologies. Using this graph, we developed a novel predictive framework based on heterogenous paths connecting gene pairs. This method trains an interpretable decision set model that not only accurately predicts pathogenic gene interactions, but also unveils the patterns associated with these diseases. A unique aspect of our approach is its ability to offer, along with each positive prediction, explanations in the form of subgraphs, revealing the specific entities and relationships that led to each pathogenic prediction. CONCLUSION Our method, built with interpretability in mind, leverages heterogenous path information in knowledge graphs to predict pathogenic gene interactions and generate meaningful explanations. This not only broadens our understanding of the molecular mechanisms underlying oligogenic diseases, but also presents a novel application of knowledge graphs in creating more transparent and insightful predictors for genetic research.
Collapse
Affiliation(s)
- Alexandre Renaux
- Interuniversity Institute of Bioinformatics in Brussels, Université Libre de Bruxelles - Vrije Universiteit Brussel, Brussels, Belgium
- Machine Learning Group, Université Libre de Bruxelles, Brussels, Belgium
- Artificial Intelligence lab, Vrije Universiteit Brussel, Brussels, Belgium
| | - Chloé Terwagne
- Interuniversity Institute of Bioinformatics in Brussels, Université Libre de Bruxelles - Vrije Universiteit Brussel, Brussels, Belgium
- Machine Learning Group, Université Libre de Bruxelles, Brussels, Belgium
| | - Michael Cochez
- Computer Science, Vrije Universiteit Amsterdam, Amsterdam, The Netherlands
- Discovery Lab, Elsevier, Amsterdam, The Netherlands
| | - Ilaria Tiddi
- Computer Science, Vrije Universiteit Amsterdam, Amsterdam, The Netherlands
| | - Ann Nowé
- Interuniversity Institute of Bioinformatics in Brussels, Université Libre de Bruxelles - Vrije Universiteit Brussel, Brussels, Belgium
- Artificial Intelligence lab, Vrije Universiteit Brussel, Brussels, Belgium
| | - Tom Lenaerts
- Interuniversity Institute of Bioinformatics in Brussels, Université Libre de Bruxelles - Vrije Universiteit Brussel, Brussels, Belgium
- Machine Learning Group, Université Libre de Bruxelles, Brussels, Belgium
- Artificial Intelligence lab, Vrije Universiteit Brussel, Brussels, Belgium
| |
Collapse
|
13
|
Lobentanzer S, Aloy P, Baumbach J, Bohar B, Carey VJ, Charoentong P, Danhauser K, Doğan T, Dreo J, Dunham I, Farr E, Fernandez-Torras A, Gyori BM, Hartung M, Hoyt CT, Klein C, Korcsmaros T, Maier A, Mann M, Ochoa D, Pareja-Lorente E, Popp F, Preusse M, Probul N, Schwikowski B, Sen B, Strauss MT, Turei D, Ulusoy E, Waltemath D, Wodke JAH, Saez-Rodriguez J. Democratizing knowledge representation with BioCypher. Nat Biotechnol 2023; 41:1056-1059. [PMID: 37337100 DOI: 10.1038/s41587-023-01848-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/21/2023]
Affiliation(s)
- Sebastian Lobentanzer
- Heidelberg University, Faculty of Medicine, and Heidelberg University Hospital, Institute for Computational Biomedicine, Bioquant, Heidelberg, Germany.
| | - Patrick Aloy
- Institute for Research in Biomedicine (IRB Barcelona), the Barcelona Institute of Science and Technology, Barcelona, Spain
- Institució Catalana de Recerca i Estudis Avançats (ICREA), Barcelona, Spain
| | - Jan Baumbach
- Institute for Computational Systems Biology, University of Hamburg, Hamburg, Germany
| | - Balazs Bohar
- Earlham Institute, Norwich, UK
- Biological Research Centre, Szeged, Hungary
| | - Vincent J Carey
- Channing Division of Network Medicine, Mass General Brigham, Harvard Medical School, Boston, MA, USA
| | - Pornpimol Charoentong
- Centre for Quantitative Analysis of Molecular and Cellular Biosystems (Bioquant), Heidelberg University, Heidelberg, Germany
- Department of Medical Oncology, National Centre for Tumour Diseases (NCT), Heidelberg University Hospital (UKHD), Heidelberg, Germany
| | - Katharina Danhauser
- Department of Pediatrics, Dr. von Hauner Children's Hospital, University Hospital, LMU Munich, Munich, Germany
| | - Tunca Doğan
- Biological Data Science Lab, Department of Computer Engineering, Hacettepe University, Ankara, Turkey
- Department of Bioinformatics, Graduate School of Health Sciences, Hacettepe University, Ankara, Turkey
| | - Johann Dreo
- Computational Systems Biomedicine Lab, Department of Computational Biology, Institut Pasteur, Université Paris Cité, Paris, France
- Bioinformatics and Biostatistics Hub, Institut Pasteur, Université Paris Cité, Paris, France
| | - Ian Dunham
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, UK
- Open Targets, Wellcome Genome Campus, Hinxton, UK
| | - Elias Farr
- Heidelberg University, Faculty of Medicine, and Heidelberg University Hospital, Institute for Computational Biomedicine, Bioquant, Heidelberg, Germany
| | - Adrià Fernandez-Torras
- Institute for Research in Biomedicine (IRB Barcelona), the Barcelona Institute of Science and Technology, Barcelona, Spain
| | - Benjamin M Gyori
- Laboratory of Systems Pharmacology, Harvard Medical School, Boston, MA, USA
| | - Michael Hartung
- Institute for Computational Systems Biology, University of Hamburg, Hamburg, Germany
| | | | - Christoph Klein
- Department of Pediatrics, Dr. von Hauner Children's Hospital, University Hospital, LMU Munich, Munich, Germany
| | - Tamas Korcsmaros
- Earlham Institute, Norwich, UK
- Imperial College London, London, UK
- Quadram Institute Bioscience, Norwich, UK
| | - Andreas Maier
- Institute for Computational Systems Biology, University of Hamburg, Hamburg, Germany
| | - Matthias Mann
- Proteomics Program, Novo Nordisk Foundation Centre for Protein Research, University of Copenhagen, Copenhagen, Denmark
- Department of Proteomics and Signal Transduction, Max Planck Institute of Biochemistry, Martinsried, Germany
| | - David Ochoa
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, UK
- Open Targets, Wellcome Genome Campus, Hinxton, UK
| | - Elena Pareja-Lorente
- Institute for Research in Biomedicine (IRB Barcelona), the Barcelona Institute of Science and Technology, Barcelona, Spain
| | - Ferdinand Popp
- Applied Tumour Immunity Clinical Cooperation Unit, National Centre for Tumour Diseases (NCT), German Cancer Research Centre (DKFZ), Heidelberg, Germany
| | - Martin Preusse
- German Centre for Diabetes Research (DZD), Neuherberg, Germany
| | - Niklas Probul
- Institute for Computational Systems Biology, University of Hamburg, Hamburg, Germany
| | - Benno Schwikowski
- Computational Systems Biomedicine Lab, Department of Computational Biology, Institut Pasteur, Université Paris Cité, Paris, France
| | - Bünyamin Sen
- Biological Data Science Lab, Department of Computer Engineering, Hacettepe University, Ankara, Turkey
- Department of Bioinformatics, Graduate School of Health Sciences, Hacettepe University, Ankara, Turkey
| | - Maximilian T Strauss
- Proteomics Program, Novo Nordisk Foundation Centre for Protein Research, University of Copenhagen, Copenhagen, Denmark
| | - Denes Turei
- Heidelberg University, Faculty of Medicine, and Heidelberg University Hospital, Institute for Computational Biomedicine, Bioquant, Heidelberg, Germany
| | - Erva Ulusoy
- Biological Data Science Lab, Department of Computer Engineering, Hacettepe University, Ankara, Turkey
- Department of Bioinformatics, Graduate School of Health Sciences, Hacettepe University, Ankara, Turkey
| | - Dagmar Waltemath
- Medical Informatics Laboratory, University Medicine Greifswald, Greifswald, Germany
| | - Judith A H Wodke
- Medical Informatics Laboratory, University Medicine Greifswald, Greifswald, Germany
| | - Julio Saez-Rodriguez
- Heidelberg University, Faculty of Medicine, and Heidelberg University Hospital, Institute for Computational Biomedicine, Bioquant, Heidelberg, Germany.
| |
Collapse
|
14
|
Mittal A, Ahuja G. Advancing chemical carcinogenicity prediction modeling: opportunities and challenges. Trends Pharmacol Sci 2023; 44:400-410. [PMID: 37183054 DOI: 10.1016/j.tips.2023.04.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2023] [Revised: 04/11/2023] [Accepted: 04/18/2023] [Indexed: 05/16/2023]
Abstract
Carcinogenicity assessment of any compound is a laborious and expensive exercise with several associated ethical and practical concerns. While artificial intelligence (AI) offers promising solutions, unfortunately, it is contingent on several challenges concerning the inadequacy of available experimentally validated (non)carcinogen datasets and variabilities within bioassays, which contribute to the compromised model training. Existing AI solutions that leverage classical chemistry-driven descriptors do not provide adequate biological interpretability involved in imparting carcinogenicity. This highlights the urgency to devise alternative AI strategies. We propose multiple strategies, including implementing data-driven (integrated databases) and known carcinogen-characteristic-derived features to overcome these apparent shortcomings. In summary, these next-generation approaches will continue facilitating robust chemical carcinogenicity prediction, concomitant with deeper mechanistic insights.
Collapse
Affiliation(s)
- Aayushi Mittal
- Department of Computational Biology, Indraprastha Institute of Information Technology-Delhi (IIIT-Delhi), Okhla, Phase III, New Delhi, 110020, India.
| | - Gaurav Ahuja
- Department of Computational Biology, Indraprastha Institute of Information Technology-Delhi (IIIT-Delhi), Okhla, Phase III, New Delhi, 110020, India.
| |
Collapse
|
15
|
Juan H, Huang H. Quantitative analysis of high‐throughput biological data. WIRES COMPUTATIONAL MOLECULAR SCIENCE 2023. [DOI: 10.1002/wcms.1658] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/04/2023]
Affiliation(s)
- Hsueh‐Fen Juan
- Department of Life Science, Institute of Biomedical Electronics and Bioinformatics, and Center for Systems Biology National Taiwan University Taipei Taiwan
- Taiwan AI Labs Taipei Taiwan
| | - Hsuan‐Cheng Huang
- Institute of Biomedical Informatics National Yang Ming Chiao Tung University Taipei Taiwan
| |
Collapse
|
16
|
Duran-Frigola M, Cigler M, Winter GE. Advancing Targeted Protein Degradation via Multiomics Profiling and Artificial Intelligence. J Am Chem Soc 2023; 145:2711-2732. [PMID: 36706315 PMCID: PMC9912273 DOI: 10.1021/jacs.2c11098] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023]
Abstract
Only around 20% of the human proteome is considered to be druggable with small-molecule antagonists. This leaves some of the most compelling therapeutic targets outside the reach of ligand discovery. The concept of targeted protein degradation (TPD) promises to overcome some of these limitations. In brief, TPD is dependent on small molecules that induce the proximity between a protein of interest (POI) and an E3 ubiquitin ligase, causing ubiquitination and degradation of the POI. In this perspective, we want to reflect on current challenges in the field, and discuss how advances in multiomics profiling, artificial intelligence, and machine learning (AI/ML) will be vital in overcoming them. The presented roadmap is discussed in the context of small-molecule degraders but is equally applicable for other emerging proximity-inducing modalities.
Collapse
Affiliation(s)
- Miquel Duran-Frigola
- CeMM
Research Center for Molecular Medicine of the Austrian Academy of
Sciences, 1090 Vienna, Austria,Ersilia
Open Source Initiative, 28 Belgrave Road, CB1 3DE, Cambridge, United Kingdom,
| | - Marko Cigler
- CeMM
Research Center for Molecular Medicine of the Austrian Academy of
Sciences, 1090 Vienna, Austria
| | - Georg E. Winter
- CeMM
Research Center for Molecular Medicine of the Austrian Academy of
Sciences, 1090 Vienna, Austria,
| |
Collapse
|
17
|
Himmelstein DS, Zietz M, Rubinetti V, Kloster K, Heil BJ, Alquaddoomi F, Hu D, Nicholson DN, Hao Y, Sullivan BD, Nagle MW, Greene CS. Hetnet connectivity search provides rapid insights into how two biomedical entities are related. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.01.05.522941. [PMID: 36711546 PMCID: PMC9882000 DOI: 10.1101/2023.01.05.522941] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/09/2023]
Abstract
Hetnets, short for "heterogeneous networks", contain multiple node and relationship types and offer a way to encode biomedical knowledge. One such example, Hetionet connects 11 types of nodes - including genes, diseases, drugs, pathways, and anatomical structures - with over 2 million edges of 24 types. Previous work has demonstrated that supervised machine learning methods applied to such networks can identify drug repurposing opportunities. However, a training set of known relationships does not exist for many types of node pairs, even when it would be useful to examine how nodes of those types are meaningfully connected. For example, users may be curious not only how metformin is related to breast cancer, but also how the GJA1 gene might be involved in insomnia. We developed a new procedure, termed hetnet connectivity search, that proposes important paths between any two nodes without requiring a supervised gold standard. The algorithm behind connectivity search identifies types of paths that occur more frequently than would be expected by chance (based on node degree alone). We find that predictions are broadly similar to those from previously described supervised approaches for certain node type pairs. Scoring of individual paths is based on the most specific paths of a given type. Several optimizations were required to precompute significant instances of node connectivity at the scale of large knowledge graphs. We implemented the method on Hetionet and provide an online interface at https://het.io/search . We provide an open source implementation of these methods in our new Python package named hetmatpy .
Collapse
Affiliation(s)
- Daniel S. Himmelstein
- Department of Systems Pharmacology and Translational Therapeutics, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America; Related Sciences
| | - Michael Zietz
- Department of Systems Pharmacology and Translational Therapeutics, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America; Department of Biomedical Informatics, Columbia University, New York, New York, United States of America
| | - Vincent Rubinetti
- Department of Systems Pharmacology and Translational Therapeutics, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America; Center for Health AI, University of Colorado School of Medicine, Aurora, Colorado, United States of America
| | - Kyle Kloster
- Carbon, Inc.; Department of Computer Science, North Carolina State University, Raleigh, North Carolina, United States of America
| | - Benjamin J. Heil
- Genomics and Computational Biology Graduate Group, Perelman School of Medicine, University of Pennsylvania
| | - Faisal Alquaddoomi
- Department of Biochemistry and Molecular Genetics, University of Colorado School of Medicine, Aurora, Colorado, United States of America; Center for Health AI, University of Colorado School of Medicine, Aurora, Colorado, United States of America
| | - Dongbo Hu
- Department of Pathology, Perelman School of Medicine University of Pennsylvania, Philadelphia PA, USA
| | - David N. Nicholson
- Department of Systems Pharmacology and Translational Therapeutics, Perelman School of Medicine University of Pennsylvania, Philadelphia PA, USA
| | - Yun Hao
- Genomics and Computational Biology Graduate Group, Perelman School of Medicine, University of Pennsylvania, Philadelphia PA, USA
| | | | - Michael W. Nagle
- Integrative Biology, Internal Medicine Research Unit, Worldwide Research, Development, and Medicine, Pfizer Inc, Cambridge, Massachusetts, United States of America; Neurogenomics, Translational Sciences, Neurology Business Group, Eisai Inc, Cambridge, Massachusetts, United States of America
| | - Casey S. Greene
- Department of Systems Pharmacology and Translational Therapeutics, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America; Department of Biochemistry and Molecular Genetics, University of Colorado School of Medicine, Aurora, Colorado, United States of America; Center for Health AI, University of Colorado School of Medicine, Aurora, Colorado, United States of America
| |
Collapse
|
18
|
Thakur M, Bateman A, Brooksbank C, Freeberg M, Harrison M, Hartley M, Keane T, Kleywegt G, Leach A, Levchenko M, Morgan S, McDonagh E, Orchard S, Papatheodorou I, Velankar S, Vizcaino J, Witham R, Zdrazil B, McEntyre J. EMBL's European Bioinformatics Institute (EMBL-EBI) in 2022. Nucleic Acids Res 2023; 51:D9-D17. [PMID: 36477213 PMCID: PMC9825486 DOI: 10.1093/nar/gkac1098] [Citation(s) in RCA: 17] [Impact Index Per Article: 17.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2022] [Revised: 10/21/2022] [Accepted: 10/31/2022] [Indexed: 12/13/2022] Open
Abstract
The European Molecular Biology Laboratory's European Bioinformatics Institute (EMBL-EBI) is one of the world's leading sources of public biomolecular data. Based at the Wellcome Genome Campus in Hinxton, UK, EMBL-EBI is one of six sites of the European Molecular Biology Laboratory (EMBL), Europe's only intergovernmental life sciences organisation. This overview summarises the status of services that EMBL-EBI data resources provide to scientific communities globally. The scale, openness, rich metadata and extensive curation of EMBL-EBI added-value databases makes them particularly well-suited as training sets for deep learning, machine learning and artificial intelligence applications, a selection of which are described here. The data resources at EMBL-EBI can catalyse such developments because they offer sustainable, high-quality data, collected in some cases over decades and made openly availability to any researcher, globally. Our aim is for EMBL-EBI data resources to keep providing the foundations for tools and research insights that transform fields across the life sciences.
Collapse
Affiliation(s)
| | - Alex Bateman
- Data Services Teams, EMBL’s European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB10 1SD, UK
| | - Cath Brooksbank
- Data Services Teams, EMBL’s European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB10 1SD, UK
| | - Mallory Freeberg
- Data Services Teams, EMBL’s European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB10 1SD, UK
| | - Melissa Harrison
- Data Services Teams, EMBL’s European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB10 1SD, UK
| | - Matthew Hartley
- Data Services Teams, EMBL’s European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB10 1SD, UK
| | - Thomas Keane
- Data Services Teams, EMBL’s European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB10 1SD, UK
| | - Gerard Kleywegt
- Data Services Teams, EMBL’s European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB10 1SD, UK
| | - Andrew Leach
- Data Services Teams, EMBL’s European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB10 1SD, UK
| | - Mariia Levchenko
- Data Services Teams, EMBL’s European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB10 1SD, UK
| | - Sarah Morgan
- Data Services Teams, EMBL’s European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB10 1SD, UK
| | - Ellen M McDonagh
- Data Services Teams, EMBL’s European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB10 1SD, UK
- OpenTargets, EMBL’s European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB10 1SD, UK
| | - Sandra Orchard
- Data Services Teams, EMBL’s European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB10 1SD, UK
| | - Irene Papatheodorou
- Data Services Teams, EMBL’s European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB10 1SD, UK
| | - Sameer Velankar
- Data Services Teams, EMBL’s European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB10 1SD, UK
| | - Juan Antonio Vizcaino
- Data Services Teams, EMBL’s European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB10 1SD, UK
| | - Rick Witham
- Data Services Teams, EMBL’s European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB10 1SD, UK
| | - Barbara Zdrazil
- Data Services Teams, EMBL’s European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB10 1SD, UK
| | | |
Collapse
|