1
|
Sidorenko D, Pushkov S, Sakip A, Leung GHD, Lok SWY, Urban A, Zagirova D, Veviorskiy A, Tihonova N, Kalashnikov A, Kozlova E, Naumov V, Pun FW, Aliper A, Ren F, Zhavoronkov A. Precious2GPT: the combination of multiomics pretrained transformer and conditional diffusion for artificial multi-omics multi-species multi-tissue sample generation. NPJ AGING 2024; 10:37. [PMID: 39117678 PMCID: PMC11310469 DOI: 10.1038/s41514-024-00163-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/19/2024] [Accepted: 07/22/2024] [Indexed: 08/10/2024]
Abstract
Synthetic data generation in omics mimics real-world biological data, providing alternatives for training and evaluation of genomic analysis tools, controlling differential expression, and exploring data architecture. We previously developed Precious1GPT, a multimodal transformer trained on transcriptomic and methylation data, along with metadata, for predicting biological age and identifying dual-purpose therapeutic targets potentially implicated in aging and age-associated diseases. In this study, we introduce Precious2GPT, a multimodal architecture that integrates Conditional Diffusion (CDiffusion) and decoder-only Multi-omics Pretrained Transformer (MoPT) models trained on gene expression and DNA methylation data. Precious2GPT excels in synthetic data generation, outperforming Conditional Generative Adversarial Networks (CGANs), CDiffusion, and MoPT. We demonstrate that Precious2GPT is capable of generating representative synthetic data that captures tissue- and age-specific information from real transcriptomics and methylomics data. Notably, Precious2GPT surpasses other models in age prediction accuracy using the generated data, and it can generate data beyond 120 years of age. Furthermore, we showcase the potential of using this model in identifying gene signatures and potential therapeutic targets in a colorectal cancer case study.
Collapse
Affiliation(s)
- Denis Sidorenko
- Insilico Medicine Hong Kong Ltd., Unit 310, 3/F, Building 8W Hong Kong Science and Technology Park, Hong Kong SAR, China
| | - Stefan Pushkov
- Insilico Medicine Hong Kong Ltd., Unit 310, 3/F, Building 8W Hong Kong Science and Technology Park, Hong Kong SAR, China
| | - Akhmed Sakip
- Insilico Medicine Hong Kong Ltd., Unit 310, 3/F, Building 8W Hong Kong Science and Technology Park, Hong Kong SAR, China
| | - Geoffrey Ho Duen Leung
- Insilico Medicine Hong Kong Ltd., Unit 310, 3/F, Building 8W Hong Kong Science and Technology Park, Hong Kong SAR, China
| | - Sarah Wing Yan Lok
- Insilico Medicine Hong Kong Ltd., Unit 310, 3/F, Building 8W Hong Kong Science and Technology Park, Hong Kong SAR, China
| | - Anatoly Urban
- Insilico Medicine Hong Kong Ltd., Unit 310, 3/F, Building 8W Hong Kong Science and Technology Park, Hong Kong SAR, China
| | - Diana Zagirova
- Insilico Medicine Hong Kong Ltd., Unit 310, 3/F, Building 8W Hong Kong Science and Technology Park, Hong Kong SAR, China
| | - Alexander Veviorskiy
- Insilico Medicine AI Limited, Level 6, Unit 08, Block A, IRENA HQ Building, Masdar City, Abu Dhabi, UAE
| | - Nina Tihonova
- Insilico Medicine AI Limited, Level 6, Unit 08, Block A, IRENA HQ Building, Masdar City, Abu Dhabi, UAE
| | - Aleksandr Kalashnikov
- Insilico Medicine AI Limited, Level 6, Unit 08, Block A, IRENA HQ Building, Masdar City, Abu Dhabi, UAE
| | - Ekaterina Kozlova
- Insilico Medicine Hong Kong Ltd., Unit 310, 3/F, Building 8W Hong Kong Science and Technology Park, Hong Kong SAR, China
| | - Vladimir Naumov
- Insilico Medicine Hong Kong Ltd., Unit 310, 3/F, Building 8W Hong Kong Science and Technology Park, Hong Kong SAR, China
| | - Frank W Pun
- Insilico Medicine Hong Kong Ltd., Unit 310, 3/F, Building 8W Hong Kong Science and Technology Park, Hong Kong SAR, China
| | - Alex Aliper
- Insilico Medicine AI Limited, Level 6, Unit 08, Block A, IRENA HQ Building, Masdar City, Abu Dhabi, UAE
| | - Feng Ren
- Insilico Medicine Shanghai Ltd., Suite 902, Tower C, Changtai Plaza, 2889 Jinke Road, Pudong, Shanghai, 201203, China
| | - Alex Zhavoronkov
- Insilico Medicine Hong Kong Ltd., Unit 310, 3/F, Building 8W Hong Kong Science and Technology Park, Hong Kong SAR, China.
- Insilico Medicine AI Limited, Level 6, Unit 08, Block A, IRENA HQ Building, Masdar City, Abu Dhabi, UAE.
- Buck Institute for Research on Aging, Novato, CA, 94945, USA.
| |
Collapse
|
2
|
Xin J, Wang M, Qu L, Chen Q, Wang W, Wang Z. BIC-LP: A Hybrid Higher-Order Dynamic Bayesian Network Score Function for Gene Regulatory Network Reconstruction. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2024; 21:188-199. [PMID: 38127613 DOI: 10.1109/tcbb.2023.3345317] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/23/2023]
Abstract
Reconstructing gene regulatory networks(GRNs) is an increasingly hot topic in bioinformatics. Dynamic Bayesian network(DBN) is a stochastic graph model commonly used as a vital model for GRN reconstruction. But probabilistic characteristics of biological networks and the existence of data noise bring great challenges to GRN reconstruction and always lead to many false positive/negative edges. ScoreLasso is a hybrid DBN score function combining DBN and linear regression with good performance. Its performance is, however, limited by first-order assumption and ignorance of the initial network of DBN. In this article, an integrated model based on higher-order DBN model, higher-order Lasso linear regression model and Pearson correlation model is proposed. Based on this, a hybrid higher-order DBN score function for GRN reconstruction is proposed, namely BIC-LP. BIC-LP score function is constructed by adding terms based on Lasso linear regression coefficients and Pearson correlation coefficients on classical BIC score function. Therefore, it could capture more information from dataset and curb information loss, compared with both many existing Bayesian family score functions and many state-of-the-art methods for GRN reconstruction. Experimental results show that BIC-LP can reasonably eliminate some false positive edges while retaining most true positive edges, so as to achieve better GRN reconstruction performance.
Collapse
|