1
|
Rajan S, Schwarz E. Network-based artificial intelligence approaches for advancing personalized psychiatry. Am J Med Genet B Neuropsychiatr Genet 2024; 195:e32997. [PMID: 39031613 DOI: 10.1002/ajmg.b.32997] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/14/2023] [Revised: 05/24/2024] [Accepted: 06/06/2024] [Indexed: 07/22/2024]
Abstract
Psychiatric disorders have a complex biological underpinning likely involving an interplay of genetic and environmental risk contributions. Substantial efforts are being made to use artificial intelligence approaches to integrate features within and across data types to increase our etiological understanding and advance personalized psychiatry. Network science offers a conceptual framework for exploring the often complex relationships across different levels of biological organization, from cellular mechanistic to brain-functional and phenotypic networks. Utilizing such network information effectively as part of artificial intelligence approaches is a promising route toward a more in-depth understanding of illness biology, the deciphering of patient heterogeneity, and the identification of signatures that may be sufficiently predictive to be clinically useful. Here, we present examples of how network information has been used as part of artificial intelligence within psychiatry and beyond and outline future perspectives on how personalized psychiatry approaches may profit from a closer integration of psychiatric research, artificial intelligence development, and network science.
Collapse
Affiliation(s)
- Sivanesan Rajan
- Hector Institute for Artificial Intelligence in Psychiatry, Central Institute of Mental Health, Medical Faculty Mannheim, Heidelberg University, Mannheim, Germany
- Department of Psychiatry and Psychotherapy, Central Institute of Mental Health, Medical Faculty Mannheim, University of Heidelberg, Mannheim, Germany
| | - Emanuel Schwarz
- Hector Institute for Artificial Intelligence in Psychiatry, Central Institute of Mental Health, Medical Faculty Mannheim, Heidelberg University, Mannheim, Germany
- Department of Psychiatry and Psychotherapy, Central Institute of Mental Health, Medical Faculty Mannheim, University of Heidelberg, Mannheim, Germany
- German Center for Mental Health (DZPG), partner site Mannheim-Heidelberg-Ulm, Mannheim, Germany
| |
Collapse
|
2
|
Yang X, Liu G, Feng G, Bu D, Wang P, Jiang J, Chen S, Yang Q, Miao H, Zhang Y, Man Z, Liang Z, Wang Z, Li Y, Li Z, Liu Y, Tian Y, Liu W, Li C, Li A, Dong J, Hu Z, Fang C, Cui L, Deng Z, Jiang H, Cui W, Zhang J, Yang Z, Li H, He X, Zhong L, Zhou J, Wang Z, Long Q, Xu P, Wang H, Meng Z, Wang X, Wang Y, Wang Y, Zhang S, Guo J, Zhao Y, Zhou Y, Li F, Liu J, Chen Y, Yang G, Li X. GeneCompass: deciphering universal gene regulatory mechanisms with a knowledge-informed cross-species foundation model. Cell Res 2024:10.1038/s41422-024-01034-y. [PMID: 39375485 DOI: 10.1038/s41422-024-01034-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2024] [Accepted: 09/13/2024] [Indexed: 10/09/2024] Open
Abstract
Deciphering universal gene regulatory mechanisms in diverse organisms holds great potential for advancing our knowledge of fundamental life processes and facilitating clinical applications. However, the traditional research paradigm primarily focuses on individual model organisms and does not integrate various cell types across species. Recent breakthroughs in single-cell sequencing and deep learning techniques present an unprecedented opportunity to address this challenge. In this study, we built an extensive dataset of over 120 million human and mouse single-cell transcriptomes. After data preprocessing, we obtained 101,768,420 single-cell transcriptomes and developed a knowledge-informed cross-species foundation model, named GeneCompass. During pre-training, GeneCompass effectively integrated four types of prior biological knowledge to enhance our understanding of gene regulatory mechanisms in a self-supervised manner. By fine-tuning for multiple downstream tasks, GeneCompass outperformed state-of-the-art models in diverse applications for a single species and unlocked new realms of cross-species biological investigations. We also employed GeneCompass to search for key factors associated with cell fate transition and showed that the predicted candidate genes could successfully induce the differentiation of human embryonic stem cells into the gonadal fate. Overall, GeneCompass demonstrates the advantages of using artificial intelligence technology to decipher universal gene regulatory mechanisms and shows tremendous potential for accelerating the discovery of critical cell fate regulators and candidate drug targets.
Collapse
Affiliation(s)
- Xiaodong Yang
- State Key Laboratory of Stem Cell and Reproductive Biology, Institute of Zoology, Chinese Academy of Sciences, Beijing, China
- Beijing Key Laboratory of Mobile Computing and Pervasive Device, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Guole Liu
- State Key Laboratory of Multimodal Artificial Intelligence Systems, Institute of Automation, Chinese Academy of Sciences, Beijing, China
- School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing, China
| | - Guihai Feng
- State Key Laboratory of Stem Cell and Reproductive Biology, Institute of Zoology, Chinese Academy of Sciences, Beijing, China
- Institute for Stem Cell and Regenerative Medicine, Chinese Academy of Sciences, Beijing, China
- Beijing Institute for Stem Cell and Regenerative Medicine, Beijing, China
| | - Dechao Bu
- Beijing Key Laboratory of Mobile Computing and Pervasive Device, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China
- University of Chinese Academy of Sciences, Beijing, China
- Research Center for Ubiquitous Computing Systems, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China
| | - Pengfei Wang
- University of Chinese Academy of Sciences, Beijing, China
- Computer Network Information Center, Chinese Academy of Sciences, Beijing, China
| | - Jie Jiang
- Institute of Automation, Chinese Academy of Sciences, Beijing, China
| | - Shubai Chen
- Beijing Key Laboratory of Mobile Computing and Pervasive Device, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Qinmeng Yang
- Computer Network Information Center, Chinese Academy of Sciences, Beijing, China
| | - Hefan Miao
- State Key Laboratory of Stem Cell and Reproductive Biology, Institute of Zoology, Chinese Academy of Sciences, Beijing, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Yiyang Zhang
- University of Chinese Academy of Sciences, Beijing, China
- CEMS, NCMIS, HCMS, MDIS, RCSDS, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, China
| | - Zhenpeng Man
- University of Chinese Academy of Sciences, Beijing, China
- CEMS, NCMIS, HCMS, MDIS, RCSDS, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, China
| | - Zhongming Liang
- University of Chinese Academy of Sciences, Beijing, China
- CEMS, NCMIS, HCMS, MDIS, RCSDS, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, China
| | - Zichen Wang
- State Key Laboratory of Multimodal Artificial Intelligence Systems, Institute of Automation, Chinese Academy of Sciences, Beijing, China
- School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing, China
| | - Yaning Li
- Beijing Key Laboratory of Mobile Computing and Pervasive Device, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Zheng Li
- Computer Network Information Center, Chinese Academy of Sciences, Beijing, China
| | - Yana Liu
- State Key Laboratory of Stem Cell and Reproductive Biology, Institute of Zoology, Chinese Academy of Sciences, Beijing, China
| | - Yao Tian
- State Key Laboratory of Stem Cell and Reproductive Biology, Institute of Zoology, Chinese Academy of Sciences, Beijing, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Wenhao Liu
- State Key Laboratory of Stem Cell and Reproductive Biology, Institute of Zoology, Chinese Academy of Sciences, Beijing, China
| | - Cong Li
- State Key Laboratory of Stem Cell and Reproductive Biology, Institute of Zoology, Chinese Academy of Sciences, Beijing, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Ao Li
- State Key Laboratory of Multimodal Artificial Intelligence Systems, Institute of Automation, Chinese Academy of Sciences, Beijing, China
- School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing, China
| | - Jingxi Dong
- State Key Laboratory of Stem Cell and Reproductive Biology, Institute of Zoology, Chinese Academy of Sciences, Beijing, China
| | - Zhilong Hu
- University of Chinese Academy of Sciences, Beijing, China
- Computer Network Information Center, Chinese Academy of Sciences, Beijing, China
| | - Chen Fang
- State Key Laboratory of Stem Cell and Reproductive Biology, Institute of Zoology, Chinese Academy of Sciences, Beijing, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Lina Cui
- State Key Laboratory of Stem Cell and Reproductive Biology, Institute of Zoology, Chinese Academy of Sciences, Beijing, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Zixu Deng
- Beijing Key Laboratory of Mobile Computing and Pervasive Device, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Haiping Jiang
- State Key Laboratory of Stem Cell and Reproductive Biology, Institute of Zoology, Chinese Academy of Sciences, Beijing, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Wentao Cui
- University of Chinese Academy of Sciences, Beijing, China
- Computer Network Information Center, Chinese Academy of Sciences, Beijing, China
| | - Jiahao Zhang
- University of Chinese Academy of Sciences, Beijing, China
- CEMS, NCMIS, HCMS, MDIS, RCSDS, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, China
| | - Zhaohui Yang
- Beijing Key Laboratory of Mobile Computing and Pervasive Device, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China
- University of Chinese Academy of Sciences, Beijing, China
- Research Center for Ubiquitous Computing Systems, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China
| | - Handong Li
- School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing, China
- Institute of Automation, Chinese Academy of Sciences, Beijing, China
| | - Xingjian He
- Institute of Automation, Chinese Academy of Sciences, Beijing, China
| | - Liqun Zhong
- State Key Laboratory of Multimodal Artificial Intelligence Systems, Institute of Automation, Chinese Academy of Sciences, Beijing, China
- School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing, China
| | - Jiaheng Zhou
- State Key Laboratory of Multimodal Artificial Intelligence Systems, Institute of Automation, Chinese Academy of Sciences, Beijing, China
- School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing, China
| | - Zijian Wang
- Computer Network Information Center, Chinese Academy of Sciences, Beijing, China
| | - Qingqing Long
- Computer Network Information Center, Chinese Academy of Sciences, Beijing, China
| | - Ping Xu
- University of Chinese Academy of Sciences, Beijing, China
- Computer Network Information Center, Chinese Academy of Sciences, Beijing, China
| | - Hongmei Wang
- State Key Laboratory of Stem Cell and Reproductive Biology, Institute of Zoology, Chinese Academy of Sciences, Beijing, China
- Institute for Stem Cell and Regenerative Medicine, Chinese Academy of Sciences, Beijing, China
- Beijing Institute for Stem Cell and Regenerative Medicine, Beijing, China
| | - Zhen Meng
- University of Chinese Academy of Sciences, Beijing, China
- Computer Network Information Center, Chinese Academy of Sciences, Beijing, China
| | - Xuezhi Wang
- University of Chinese Academy of Sciences, Beijing, China
- Computer Network Information Center, Chinese Academy of Sciences, Beijing, China
| | - Yangang Wang
- University of Chinese Academy of Sciences, Beijing, China
- Computer Network Information Center, Chinese Academy of Sciences, Beijing, China
| | - Yong Wang
- University of Chinese Academy of Sciences, Beijing, China
- CEMS, NCMIS, HCMS, MDIS, RCSDS, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, China
| | - Shihua Zhang
- University of Chinese Academy of Sciences, Beijing, China
- CEMS, NCMIS, HCMS, MDIS, RCSDS, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, China
| | - Jingtao Guo
- State Key Laboratory of Stem Cell and Reproductive Biology, Institute of Zoology, Chinese Academy of Sciences, Beijing, China
- University of Chinese Academy of Sciences, Beijing, China
- Institute for Stem Cell and Regenerative Medicine, Chinese Academy of Sciences, Beijing, China
- Beijing Institute for Stem Cell and Regenerative Medicine, Beijing, China
| | - Yi Zhao
- Beijing Key Laboratory of Mobile Computing and Pervasive Device, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China.
- University of Chinese Academy of Sciences, Beijing, China.
- Research Center for Ubiquitous Computing Systems, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China.
| | - Yuanchun Zhou
- University of Chinese Academy of Sciences, Beijing, China.
- Computer Network Information Center, Chinese Academy of Sciences, Beijing, China.
| | - Fei Li
- University of Chinese Academy of Sciences, Beijing, China.
- Computer Network Information Center, Chinese Academy of Sciences, Beijing, China.
| | - Jing Liu
- School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing, China.
- Institute of Automation, Chinese Academy of Sciences, Beijing, China.
| | - Yiqiang Chen
- Beijing Key Laboratory of Mobile Computing and Pervasive Device, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China.
- University of Chinese Academy of Sciences, Beijing, China.
| | - Ge Yang
- State Key Laboratory of Multimodal Artificial Intelligence Systems, Institute of Automation, Chinese Academy of Sciences, Beijing, China.
- School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing, China.
| | - Xin Li
- State Key Laboratory of Stem Cell and Reproductive Biology, Institute of Zoology, Chinese Academy of Sciences, Beijing, China.
- University of Chinese Academy of Sciences, Beijing, China.
- Institute for Stem Cell and Regenerative Medicine, Chinese Academy of Sciences, Beijing, China.
- Beijing Institute for Stem Cell and Regenerative Medicine, Beijing, China.
| |
Collapse
|
3
|
Dabouei A, Mishra I, Kapur K, Cao C, Bridges AA, Xu M. Deep Video Analysis for Bacteria Genotype Prediction. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.09.16.613253. [PMID: 39345538 PMCID: PMC11429917 DOI: 10.1101/2024.09.16.613253] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/01/2024]
Abstract
Genetic modification of microbes is central to many biotechnology fields, such as industrial microbiology, bioproduction, and drug discovery. Understanding how specific genetic modifications influence observable bacterial behaviors is crucial for advancing these fields. In this study, we propose a supervised model to classify bacteria harboring single gene modifications to draw connections between phenotype and genotype. In particular, we demonstrate that the spatiotemporal patterns of Vibrio cholerae growth, recorded in terms of low-resolution bright-field microscopy videos, are highly predictive of the genotype class. Additionally, we introduce a weakly supervised approach to identify key moments in culture growth that significantly contribute to prediction accuracy. By focusing on the temporal expressions of bacterial behavior, our findings offer valuable insights into the underlying mechanisms and developmental stages by which specific genes control observable phenotypes. This research opens new avenues for automating the analysis of phenotypes, with potential applications for drug discovery, disease management, etc. Furthermore, this work highlights the potential of using machine learning techniques to explore the functional roles of specific genes using a low-resolution light microscope.
Collapse
|
4
|
Liu J, Chen Y, Huang K, Guan X. Enhancing Missense Variant Pathogenicity Prediction with MissenseNet: Integrating Structural Insights and ShuffleNet-Based Deep Learning Techniques. Biomolecules 2024; 14:1105. [PMID: 39334871 PMCID: PMC11429773 DOI: 10.3390/biom14091105] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2024] [Revised: 07/17/2024] [Accepted: 07/22/2024] [Indexed: 09/30/2024] Open
Abstract
The classification of missense variant pathogenicity continues to pose significant challenges in human genetics, necessitating precise predictions of functional impacts for effective disease diagnosis and personalized treatment strategies. Traditional methods, often compromised by suboptimal feature selection and limited generalizability, are outpaced by the enhanced classification model, MissenseNet (Missense Classification Network). This model, advancing beyond standard predictive features, incorporates structural insights from AlphaFold2 protein predictions, thus optimizing structural data utilization. MissenseNet, built on the ShuffleNet architecture, incorporates an encoder-decoder framework and a Squeeze-and-Excitation (SE) module designed to adaptively adjust channel weights and enhance feature fusion and interaction. The model's efficacy in classifying pathogenicity has been validated through superior accuracy compared to conventional methods and by achieving the highest areas under the Receiver Operating Characteristic (ROC) and Precision-Recall (PR) curves (Area Under the Curve and Area Under the Precision-Recall Curve) in an independent test set, thus underscoring its superiority.
Collapse
Affiliation(s)
- Jing Liu
- College of Information Engineering, Shanghai Maritime University, Shanghai 201306, China
| | - Yingying Chen
- College of Information Engineering, Shanghai Maritime University, Shanghai 201306, China
| | - Kai Huang
- School of Health Science and Engineering, University of Shanghai for Science and Technology, Shanghai 200093, China
- National Grain Industry (Urban Grain and Oil Security) Technology Innovation Center, Shanghai 200093, China
| | - Xiao Guan
- School of Health Science and Engineering, University of Shanghai for Science and Technology, Shanghai 200093, China
- National Grain Industry (Urban Grain and Oil Security) Technology Innovation Center, Shanghai 200093, China
| |
Collapse
|
5
|
Kang M, Kim DK, Le VV, Ko SR, Lee JJ, Choi IC, Shin Y, Kim K, Ahn CY. Microcystis abundance is predictable through ambient bacterial communities: A data-oriented approach. JOURNAL OF ENVIRONMENTAL MANAGEMENT 2024; 368:122128. [PMID: 39126846 DOI: 10.1016/j.jenvman.2024.122128] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/20/2023] [Revised: 08/03/2024] [Accepted: 08/05/2024] [Indexed: 08/12/2024]
Abstract
The number of cyanobacterial harmful algal blooms (cyanoHABs) has increased, leading to the widespread development of prediction models for cyanoHABs. Although bacteria interact closely with cyanobacteria and directly affect cyanoHABs occurrence, related modeling studies have rarely utilized microbial community data compared to environmental data such as water quality. In this study, we built a machine learning model, the multilayer perceptron (MLP), for the prediction of Microcystis dynamics using both bacterial community and weekly water quality data from the Daechung Reservoir and Nakdong River, South Korea. The modeling performance, indicated by the R2 value, improved to 0.97 in the model combining bacterial community data with environmental factors, compared to 0.78 in the model using only environmental factors. This underscores the importance of microbial communities in cyanoHABs prediction. Through the post-hoc analysis of the MLP models, we revealed that nitrogen sources played a more critical role than phosphorus sources in Microcystis blooms, whereas the bacterial amplicon sequence variants did not have significant differences in their contribution to each other. Similar to the MLP model results, bacterial data also had higher predictability in multiple linear regression (MLR) than environmental data. In both the MLP and MLR models, Microscillaceae showed the strongest association with Microcystis. This modeling approach provides a better understanding of the interactions between bacteria and cyanoHABs, facilitating the development of more accurate and reliable models for cyanoHABs prediction using ambient bacterial data.
Collapse
Affiliation(s)
- Mingyeong Kang
- Cell Factory Research Center, Korea Research Institute of Bioscience and Biotechnology, 125 Gwahak-ro, Yuseong-gu, Daejeon, 34141, Republic of Korea; Department of Environmental Biotechnology, KRIBB School of Biotechnology, Korea University of Science and Technology (UST), 217 Gajeong-ro, Yuseong-gu, Daejeon, 34113, Republic of Korea
| | - Dong-Kyun Kim
- K-water Research Institute, 169 Yuseong-daero, Yuseong-gu, Daejeon, 34045, Republic of Korea
| | - Ve Van Le
- Cell Factory Research Center, Korea Research Institute of Bioscience and Biotechnology, 125 Gwahak-ro, Yuseong-gu, Daejeon, 34141, Republic of Korea; Department of Environmental Biotechnology, KRIBB School of Biotechnology, Korea University of Science and Technology (UST), 217 Gajeong-ro, Yuseong-gu, Daejeon, 34113, Republic of Korea
| | - So-Ra Ko
- Cell Factory Research Center, Korea Research Institute of Bioscience and Biotechnology, 125 Gwahak-ro, Yuseong-gu, Daejeon, 34141, Republic of Korea
| | - Jay Jung Lee
- Geum River Environment Research Center, National Institute of Environmental Research, Chungbuk, 29027, Republic of Korea
| | - In-Chan Choi
- Geum River Environment Research Center, National Institute of Environmental Research, Chungbuk, 29027, Republic of Korea
| | - Yuna Shin
- Water Quality Assessment Research Division, National Institute of Environmental Research, Incheon, 22689, Republic of Korea
| | - Kyunghyun Kim
- Water Quality Assessment Research Division, National Institute of Environmental Research, Incheon, 22689, Republic of Korea
| | - Chi-Yong Ahn
- Cell Factory Research Center, Korea Research Institute of Bioscience and Biotechnology, 125 Gwahak-ro, Yuseong-gu, Daejeon, 34141, Republic of Korea; Department of Environmental Biotechnology, KRIBB School of Biotechnology, Korea University of Science and Technology (UST), 217 Gajeong-ro, Yuseong-gu, Daejeon, 34113, Republic of Korea.
| |
Collapse
|
6
|
Lu H, Xiao L, Liao W, Yan X, Nielsen J. Cell factory design with advanced metabolic modelling empowered by artificial intelligence. Metab Eng 2024; 85:61-72. [PMID: 39038602 DOI: 10.1016/j.ymben.2024.07.003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2024] [Revised: 07/06/2024] [Accepted: 07/06/2024] [Indexed: 07/24/2024]
Abstract
Advances in synthetic biology and artificial intelligence (AI) have provided new opportunities for modern biotechnology. High-performance cell factories, the backbone of industrial biotechnology, are ultimately responsible for determining whether a bio-based product succeeds or fails in the fierce competition with petroleum-based products. To date, one of the greatest challenges in synthetic biology is the creation of high-performance cell factories in a consistent and efficient manner. As so-called white-box models, numerous metabolic network models have been developed and used in computational strain design. Moreover, great progress has been made in AI-powered strain engineering in recent years. Both approaches have advantages and disadvantages. Therefore, the deep integration of AI with metabolic models is crucial for the construction of superior cell factories with higher titres, yields and production rates. The detailed applications of the latest advanced metabolic models and AI in computational strain design are summarized in this review. Additionally, approaches for the deep integration of AI and metabolic models are discussed. It is anticipated that advanced mechanistic metabolic models powered by AI will pave the way for the efficient construction of powerful industrial chassis strains in the coming years.
Collapse
Affiliation(s)
- Hongzhong Lu
- State Key Laboratory of Microbial Metabolism, School of Life Science and Biotechnology, Shanghai Jiao Tong University, Shanghai, 200240, PR China.
| | - Luchi Xiao
- State Key Laboratory of Microbial Metabolism, School of Life Science and Biotechnology, Shanghai Jiao Tong University, Shanghai, 200240, PR China
| | - Wenbin Liao
- State Key Laboratory of Microbial Metabolism, School of Life Science and Biotechnology, Shanghai Jiao Tong University, Shanghai, 200240, PR China; Key Laboratory of Smart Manufacturing in Energy Chemical Process, Ministry of Education, East China University of Science and Technology, Shanghai, 200237, PR China
| | - Xuefeng Yan
- Key Laboratory of Smart Manufacturing in Energy Chemical Process, Ministry of Education, East China University of Science and Technology, Shanghai, 200237, PR China
| | - Jens Nielsen
- BioInnovation Institute, Ole Måløes Vej, DK2200, Copenhagen N, Denmark; Department of Biology and Biological Engineering, Chalmers University of Technology, Kemivägen 10, SE412 96, Gothenburg, Sweden.
| |
Collapse
|
7
|
Septiandri AA, Constantinides M, Quercia D. The potential impact of AI innovations on US occupations. PNAS NEXUS 2024; 3:pgae320. [PMID: 39319327 PMCID: PMC11421150 DOI: 10.1093/pnasnexus/pgae320] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 02/12/2024] [Accepted: 07/09/2024] [Indexed: 09/26/2024]
Abstract
An occupation is comprised of interconnected tasks, and it is these tasks, not occupations themselves, that are affected by Artificial Intelligence (AI). To evaluate how tasks may be impacted, previous approaches utilized manual annotations or coarse-grained matching. Leveraging recent advancements in machine learning, we replace coarse-grained matching with more precise deep learning approaches. Introducing the AI Impact measure, we employ Deep Learning Natural Language Processing to automatically identify AI patents that may impact various occupational tasks at scale. Our methodology relies on a comprehensive dataset of 17,879 task descriptions and quantifies AI's potential impact through analysis of 24,758 AI patents filed with the United States Patent and Trademark Office between 2015 and 2022. Our results reveal that some occupations will potentially be impacted, and that impact is intricately linked to specific skills. These include not only routine tasks (codified as a series of steps), as previously thought but also nonroutine ones (e.g. diagnosing health conditions, programming computers, and tracking flight routes). However, AI's impact on labor is limited by the fact that some of the occupations affected are augmented rather than replaced (e.g. neurologists, software engineers, air traffic controllers), and the sectors affected are experiencing labor shortages (e.g. IT, Healthcare, Transport).
Collapse
Affiliation(s)
| | | | - Daniele Quercia
- Nokia Bell Labs, Cambridge CB3 0FA, United Kingdom
- King’s College London, London WC2R 2LS, United Kingdom
| |
Collapse
|
8
|
Paz-Ruza J, Freitas AA, Alonso-Betanzos A, Guijarro-Berdiñas B. Positive-Unlabelled learning for identifying new candidate Dietary Restriction-related genes among ageing-related genes. Comput Biol Med 2024; 180:108999. [PMID: 39137672 DOI: 10.1016/j.compbiomed.2024.108999] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2024] [Revised: 07/25/2024] [Accepted: 08/05/2024] [Indexed: 08/15/2024]
Abstract
Dietary Restriction (DR) is one of the most popular anti-ageing interventions; recently, Machine Learning (ML) has been explored to identify potential DR-related genes among ageing-related genes, aiming to minimize costly wet lab experiments needed to expand our knowledge on DR. However, to train a model from positive (DR-related) and negative (non-DR-related) examples, the existing ML approach naively labels genes without known DR relation as negative examples, assuming that lack of DR-related annotation for a gene represents evidence of absence of DR-relatedness, rather than absence of evidence. This hinders the reliability of the negative examples (non-DR-related genes) and the method's ability to identify novel DR-related genes. This work introduces a novel gene prioritization method based on the two-step Positive-Unlabelled (PU) Learning paradigm: using a similarity-based, KNN-inspired approach, our method first selects reliable negative examples among the genes without known DR associations. Then, these reliable negatives and all known positives are used to train a classifier that effectively differentiates DR-related and non-DR-related genes, which is finally employed to generate a more reliable ranking of promising genes for novel DR-relatedness. Our method significantly outperforms (p<0.05) the existing state-of-the-art approach in three predictive accuracy metrics with up to ∼40% lower computational cost in the best case, and we identify 4 new promising DR-related genes (PRKAB1, PRKAB2, IRS2, PRKAG1), all with evidence from the existing literature supporting their potential DR-related role.
Collapse
Affiliation(s)
- Jorge Paz-Ruza
- LIDIA Group, CITIC, Universidade da Coruña, Campus de Elviña s/n, A Coruña 15071, Spain.
| | - Alex A Freitas
- School of Computing, University of Kent, Canterbury CT2 7FS, United Kingdom.
| | - Amparo Alonso-Betanzos
- LIDIA Group, CITIC, Universidade da Coruña, Campus de Elviña s/n, A Coruña 15071, Spain.
| | | |
Collapse
|
9
|
Masuda K, Abdullah AA, Pflughaupt P, Sahakyan AB. Quantum mechanical electronic and geometric parameters for DNA k-mers as features for machine learning. Sci Data 2024; 11:911. [PMID: 39174574 PMCID: PMC11341866 DOI: 10.1038/s41597-024-03772-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2024] [Accepted: 08/13/2024] [Indexed: 08/24/2024] Open
Abstract
We are witnessing a steep increase in model development initiatives in genomics that employ high-end machine learning methodologies. Of particular interest are models that predict certain genomic characteristics based solely on DNA sequence. These models, however, treat the DNA as a mere collection of four, A, T, G and C, letters, dismissing the past advancements in science that can enable the use of more intricate information from nucleic acid sequences. Here, we provide a comprehensive database of quantum mechanical (QM) and geometric features for all the permutations of 7-meric DNA in their representative B, A and Z conformations. The database is generated by employing the applicable high-cost and time-consuming QM methodologies. This can thus make it seamless to associate a wealth of novel molecular features to any DNA sequence, by scanning it with a matching k-meric window and pulling the pre-computed values from our database for further use in modelling. We demonstrate the usefulness of our deposited features through their exclusive use in developing a model for A->C mutation rates.
Collapse
Affiliation(s)
- Kairi Masuda
- MRC WIMM Centre for Computational Biology, MRC Weatherall Institute of Molecular Medicine, Radcliffe Department of Medicine, University of Oxford, Oxford, OX3 9DS, UK
| | - Adib A Abdullah
- MRC WIMM Centre for Computational Biology, MRC Weatherall Institute of Molecular Medicine, Radcliffe Department of Medicine, University of Oxford, Oxford, OX3 9DS, UK
| | - Patrick Pflughaupt
- MRC WIMM Centre for Computational Biology, MRC Weatherall Institute of Molecular Medicine, Radcliffe Department of Medicine, University of Oxford, Oxford, OX3 9DS, UK
| | - Aleksandr B Sahakyan
- MRC WIMM Centre for Computational Biology, MRC Weatherall Institute of Molecular Medicine, Radcliffe Department of Medicine, University of Oxford, Oxford, OX3 9DS, UK.
| |
Collapse
|
10
|
Fu DS, Adili A, Chen X, Li JZ, Muheremu A. Abnormal genes and pathways that drive muscle contracture from brachial plexus injuries: Towards machine learning approach. SLAS Technol 2024; 29:100166. [PMID: 39033877 DOI: 10.1016/j.slast.2024.100166] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2024] [Revised: 06/24/2024] [Accepted: 07/18/2024] [Indexed: 07/23/2024]
Abstract
In order to clarify the pathways closely linked to denervated muscle contracture, this work uses IoMT-enabled healthcare stratergies to examine changes in gene expression patterns inside atrophic muscles following brachial plexus damage. The gene expression Omnibus (GEO) database searching was used to locate the dataset GSE137606, which is connected to brachial plexus injuries. Strict criteria (|logFC|≥2 & adj.p < 0.05) were used to extract differentially expressed genes (DEGs). To identify dysregulated activities and pathways in denervated muscles, gene ontology (GO), Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment analysis, and Gene Set Enrichment Analysis (GSEA) were used. Hub genes were found using Cytoscape software's algorithms, which took into account parameters like as proximity, degree, and MNC. Their expression, enriched pathways, and correlations were then examined. The results showed that 316 DEGs were predominantly concentrated in muscle-related processes such as tissue formation and contraction pathways. Of these, 297 DEGs were highly expressed in denervated muscles, whereas 19 DEGs were weakly expressed. GSEA showed improvements in the contraction of striated and skeletal muscles. In addition, it was shown that in denervated muscles, Myod1, Myog, Myh7, Myl2, Tnnt2, and Tnni1 were elevated hub genes with enriched pathways such adrenergic signaling and tight junction. These results point to possible therapeutic targets for denervated muscular contracture, including Myod1, Myog, Myh7, Myl2, Tnnt2, and Tnni1. This highlights treatment options for this ailment which enhances the mental state of patient.
Collapse
Affiliation(s)
- Dong-Sheng Fu
- Department of Hand and foot microsurgery, The sixth affiliated hospital of Xinjiang Medical University, Urumqi, Xinjiang, 830002, China
| | - Alimujiang Adili
- Department of Hand and foot microsurgery, The sixth affiliated hospital of Xinjiang Medical University, Urumqi, Xinjiang, 830002, China
| | - Xuan Chen
- Department of Hand and foot microsurgery, The sixth affiliated hospital of Xinjiang Medical University, Urumqi, Xinjiang, 830002, China
| | - Jian-Zhu Li
- Department of Hand and foot microsurgery, The sixth affiliated hospital of Xinjiang Medical University, Urumqi, Xinjiang, 830002, China
| | - Aikeremu Muheremu
- Department of Hand and foot microsurgery, The sixth affiliated hospital of Xinjiang Medical University, Urumqi, Xinjiang, 830002, China.
| |
Collapse
|
11
|
Salam A, Ullah F, Amin F, Ahmad Khan I, Garcia Villena E, Kuc Castilla A, de la Torre I. Efficient prediction of anticancer peptides through deep learning. PeerJ Comput Sci 2024; 10:e2171. [PMID: 39145253 PMCID: PMC11323142 DOI: 10.7717/peerj-cs.2171] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2024] [Accepted: 06/11/2024] [Indexed: 08/16/2024]
Abstract
Background Cancer remains one of the leading causes of mortality globally, with conventional chemotherapy often resulting in severe side effects and limited effectiveness. Recent advancements in bioinformatics and machine learning, particularly deep learning, offer promising new avenues for cancer treatment through the prediction and identification of anticancer peptides. Objective This study aimed to develop and evaluate a deep learning model utilizing a two-dimensional convolutional neural network (2D CNN) to enhance the prediction accuracy of anticancer peptides, addressing the complexities and limitations of current prediction methods. Methods A diverse dataset of peptide sequences with annotated anticancer activity labels was compiled from various public databases and experimental studies. The sequences were preprocessed and encoded using one-hot encoding and additional physicochemical properties. The 2D CNN model was trained and optimized using this dataset, with performance evaluated through metrics such as accuracy, precision, recall, F1-score, and area under the receiver operating characteristic curve (AUC-ROC). Results The proposed 2D CNN model achieved superior performance compared to existing methods, with an accuracy of 0.87, precision of 0.85, recall of 0.89, F1-score of 0.87, and an AUC-ROC value of 0.91. These results indicate the model's effectiveness in accurately predicting anticancer peptides and capturing intricate spatial patterns within peptide sequences. Conclusion The findings demonstrate the potential of deep learning, specifically 2D CNNs, in advancing the prediction of anticancer peptides. The proposed model significantly improves prediction accuracy, offering a valuable tool for identifying effective peptide candidates for cancer treatment. Future Work Further research should focus on expanding the dataset, exploring alternative deep learning architectures, and validating the model's predictions through experimental studies. Efforts should also aim at optimizing computational efficiency and translating these predictions into clinical applications.
Collapse
Affiliation(s)
- Abdu Salam
- Department of Computer Science, Abdul Wali Khan University, Mardan, Pakistan
| | - Faizan Ullah
- Department of Computer Science, Bacha Khan University, Charsadda, Pakistan
| | - Farhan Amin
- School of Computer Science and Engineering, Yeungnam University, Gyeongsan, Republic of Korea
| | - Izaz Ahmad Khan
- Department of Computer Science, Bacha Khan University, Charsadda, Pakistan
| | | | | | | |
Collapse
|
12
|
Ahmed FS, Aly S, Liu X. EPI-Trans: an effective transformer-based deep learning model for enhancer promoter interaction prediction. BMC Bioinformatics 2024; 25:216. [PMID: 38890584 PMCID: PMC11184834 DOI: 10.1186/s12859-024-05784-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2023] [Accepted: 04/15/2024] [Indexed: 06/20/2024] Open
Abstract
BACKGROUND Recognition of enhancer-promoter Interactions (EPIs) is crucial for human development. EPIs in the genome play a key role in regulating transcription. However, experimental approaches for classifying EPIs are too expensive in terms of effort, time, and resources. Therefore, more and more studies are being done on developing computational techniques, particularly using deep learning and other machine learning techniques, to address such problems. Unfortunately, the majority of current computational methods are based on convolutional neural networks, recurrent neural networks, or a combination of them, which don't take into consideration contextual details and the long-range interactions between the enhancer and promoter sequences. A new transformer-based model called EPI-Trans is presented in this study to overcome the aforementioned limitations. The multi-head attention mechanism in the transformer model automatically learns features that represent the long interrelationships between enhancer and promoter sequences. Furthermore, a generic model is created with transferability that can be utilized as a pre-trained model for various cell lines. Moreover, the parameters of the generic model are fine-tuned using a particular cell line dataset to improve performance. RESULTS Based on the results obtained from six benchmark cell lines, the average AUROC for the specific, generic, and best models is 94.2%, 95%, and 95.7%, while the average AUPR is 80.5%, 66.1%, and 79.6% respectively. CONCLUSIONS This study proposed a transformer-based deep learning model for EPI prediction. The comparative results on certain cell lines show that EPI-Trans outperforms other cutting-edge techniques and can provide superior performance on the challenge of recognizing EPI.
Collapse
Affiliation(s)
- Fatma S Ahmed
- Department of Computer Science and Technology, Xiamen University, Xiamen, 361005, China.
- Department of Electrical Engineering, Aswan University, Aswan, 81542, Egypt.
| | - Saleh Aly
- Department of Electrical Engineering, Aswan University, Aswan, 81542, Egypt.
- Department of Information Technology, Majmaah University, 11952, Majmaah, Saudi Arabia.
| | - Xiangrong Liu
- Department of Computer Science and Technology, Xiamen University, Xiamen, 361005, China
| |
Collapse
|
13
|
Qiu C, Su K, Luo Z, Tian Q, Zhao L, Wu L, Deng H, Shen H. Developing and comparing deep learning and machine learning algorithms for osteoporosis risk prediction. Front Artif Intell 2024; 7:1355287. [PMID: 38919268 PMCID: PMC11196804 DOI: 10.3389/frai.2024.1355287] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2023] [Accepted: 05/31/2024] [Indexed: 06/27/2024] Open
Abstract
Introduction Osteoporosis, characterized by low bone mineral density (BMD), is an increasingly serious public health issue. So far, several traditional regression models and machine learning (ML) algorithms have been proposed for predicting osteoporosis risk. However, these models have shown relatively low accuracy in clinical implementation. Recently proposed deep learning (DL) approaches, such as deep neural network (DNN), which can discover knowledge from complex hidden interactions, offer a new opportunity to improve predictive performance. In this study, we aimed to assess whether DNN can achieve a better performance in osteoporosis risk prediction. Methods By utilizing hip BMD and extensive demographic and routine clinical data of 8,134 subjects with age more than 40 from the Louisiana Osteoporosis Study (LOS), we developed and constructed a novel DNN framework for predicting osteoporosis risk and compared its performance in osteoporosis risk prediction with four conventional ML models, namely random forest (RF), artificial neural network (ANN), k-nearest neighbor (KNN), and support vector machine (SVM), as well as a traditional regression model termed osteoporosis self-assessment tool (OST). Model performance was assessed by area under 'receiver operating curve' (AUC) and accuracy. Results By using 16 discriminative variables, we observed that the DNN approach achieved the best predictive performance (AUC = 0.848) in classifying osteoporosis (hip BMD T-score ≤ -1.0) and non-osteoporosis risk (hip BMD T-score > -1.0) subjects, compared to the other approaches. Feature importance analysis showed that the top 10 most important variables identified by the DNN model were weight, age, gender, grip strength, height, beer drinking, diastolic pressure, alcohol drinking, smoke years, and economic level. Furthermore, we performed subsampling analysis to assess the effects of varying number of sample size and variables on the predictive performance of these tested models. Notably, we observed that the DNN model performed equally well (AUC = 0.846) even by utilizing only the top 10 most important variables for osteoporosis risk prediction. Meanwhile, the DNN model can still achieve a high predictive performance (AUC = 0.826) when sample size was reduced to 50% of the original dataset. Conclusion In conclusion, we developed a novel DNN model which was considered to be an effective algorithm for early diagnosis and intervention of osteoporosis in the aging population.
Collapse
Affiliation(s)
| | | | | | | | | | | | - Hongwen Deng
- Tulane Center for Biomedical Informatics and Genomics, Deming Department of Medicine, School of Medicine, Tulane University, New Orleans, LA, United States
| | - Hui Shen
- Tulane Center for Biomedical Informatics and Genomics, Deming Department of Medicine, School of Medicine, Tulane University, New Orleans, LA, United States
| |
Collapse
|
14
|
Thorp EB, Karlstaedt A. Intersection of Immunology and Metabolism in Myocardial Disease. Circ Res 2024; 134:1824-1840. [PMID: 38843291 DOI: 10.1161/circresaha.124.323660] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/15/2024] [Accepted: 04/15/2024] [Indexed: 06/12/2024]
Abstract
Immunometabolism is an emerging field at the intersection of immunology and metabolism. Immune cell activation plays a critical role in the pathogenesis of cardiovascular diseases and is integral for regeneration during cardiac injury. We currently possess a limited understanding of the processes governing metabolic interactions between immune cells and cardiomyocytes. The impact of this intercellular crosstalk can manifest as alterations to the steady state flux of metabolites and impact cardiac contractile function. Although much of our knowledge is derived from acute inflammatory response, recent work emphasizes heterogeneity and flexibility in metabolism between cardiomyocytes and immune cells during pathological states, including ischemic, cardiometabolic, and cancer-associated disease. Metabolic adaptation is crucial because it influences immune cell activation, cytokine release, and potential therapeutic vulnerabilities. This review describes current concepts about immunometabolic regulation in the heart, focusing on intercellular crosstalk and intrinsic factors driving cellular regulation. We discuss experimental approaches to measure the cardio-immunologic crosstalk, which are necessary to uncover unknown mechanisms underlying the immune and cardiac interface. Deeper insight into these axes holds promise for therapeutic strategies that optimize cardioimmunology crosstalk for cardiac health.
Collapse
Affiliation(s)
- Edward B Thorp
- Department of Pathology, Feinberg School of Medicine, Northwestern University, Chicago, IL (E.B.T.)
| | - Anja Karlstaedt
- Department of Cardiology, Smidt Heart Institute, Cedars Sinai Medical Center, Los Angeles, CA (A.K.)
| |
Collapse
|
15
|
Rivero-Garcia I, Torres M, Sánchez-Cabo F. Deep generative models in single-cell omics. Comput Biol Med 2024; 176:108561. [PMID: 38749321 DOI: 10.1016/j.compbiomed.2024.108561] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2024] [Revised: 04/30/2024] [Accepted: 05/05/2024] [Indexed: 05/31/2024]
Abstract
Deep Generative Models (DGMs) are becoming instrumental for inferring probability distributions inherent to complex processes, such as most questions in biomedical research. For many years, there was a lack of mathematical methods that would allow this inference in the scarce data scenario of biomedical research. The advent of single-cell omics has finally made square the so-called "skinny matrix", allowing to apply mathematical methods already extensively used in other areas. Moreover, it is now possible to integrate data at different molecular levels in thousands or even millions of samples, thanks to the number of single-cell atlases being collaboratively generated. Additionally, DGMs have proven useful in other frequent tasks in single-cell analysis pipelines, from dimensionality reduction, cell type annotation to RNA velocity inference. In spite of its promise, DGMs need to be used with caution in biomedical research, paying special attention to its use to answer the right questions and the definition of appropriate error metrics and validation check points that confirm not only its correct use but also its relevance. All in all, DGMs provide an exciting tool that opens a bright future for the integrative analysis of single-cell -omics to understand health and disease.
Collapse
Affiliation(s)
- Inés Rivero-Garcia
- Universidad Politécnica de Madrid, Madrid, 28040, Spain; Centro Nacional de Investigaciones Cardiovasculares (CNIC), Madrid, 28029, Spain
| | - Miguel Torres
- Centro Nacional de Investigaciones Cardiovasculares (CNIC), Madrid, 28029, Spain
| | - Fátima Sánchez-Cabo
- Centro Nacional de Investigaciones Cardiovasculares (CNIC), Madrid, 28029, Spain.
| |
Collapse
|
16
|
Armingol E, Baghdassarian HM, Lewis NE. The diversification of methods for studying cell-cell interactions and communication. Nat Rev Genet 2024; 25:381-400. [PMID: 38238518 PMCID: PMC11139546 DOI: 10.1038/s41576-023-00685-8] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 12/01/2023] [Indexed: 05/20/2024]
Abstract
No cell lives in a vacuum, and the molecular interactions between cells define most phenotypes. Transcriptomics provides rich information to infer cell-cell interactions and communication, thus accelerating the discovery of the roles of cells within their communities. Such research relies heavily on algorithms that infer which cells are interacting and the ligands and receptors involved. Specific pressures on different research niches are driving the evolution of next-generation computational tools, enabling new conceptual opportunities and technological advances. More sophisticated algorithms now account for the heterogeneity and spatial organization of cells, multiple ligand types and intracellular signalling events, and enable the use of larger and more complex datasets, including single-cell and spatial transcriptomics. Similarly, new high-throughput experimental methods are increasing the number and resolution of interactions that can be analysed simultaneously. Here, we explore recent progress in cell-cell interaction research and highlight the diversification of the next generation of tools, which have yielded a rich ecosystem of tools for different applications and are enabling invaluable discoveries.
Collapse
Affiliation(s)
- Erick Armingol
- Bioinformatics and Systems Biology Graduate Program, University of California, San Diego, La Jolla, CA, USA.
- Department of Paediatrics, University of California, San Diego, La Jolla, CA, USA.
| | - Hratch M Baghdassarian
- Bioinformatics and Systems Biology Graduate Program, University of California, San Diego, La Jolla, CA, USA
- Department of Paediatrics, University of California, San Diego, La Jolla, CA, USA
| | - Nathan E Lewis
- Department of Paediatrics, University of California, San Diego, La Jolla, CA, USA.
- Department of Bioengineering, University of California, San Diego, La Jolla, CA, USA.
| |
Collapse
|
17
|
Su Z, Dhusia K, Wu Y. Encoding the space of protein-protein binding interfaces by artificial intelligence. Comput Biol Chem 2024; 110:108080. [PMID: 38643609 DOI: 10.1016/j.compbiolchem.2024.108080] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2023] [Revised: 04/03/2024] [Accepted: 04/17/2024] [Indexed: 04/23/2024]
Abstract
The physical interactions between proteins are largely determined by the structural properties at their binding interfaces. It was found that the binding interfaces in distinctive protein complexes are highly similar. The structural properties underlying different binding interfaces could be further captured by artificial intelligence. In order to test this hypothesis, we broke protein-protein binding interfaces into pairs of interacting fragments. We employed a generative model to encode these interface fragment pairs in a low-dimensional latent space. After training, new conformations of interface fragment pairs were generated. We found that, by only using a small number of interface fragment pairs that were generated by artificial intelligence, we were able to guide the assembly of protein complexes into their native conformations. These results demonstrate that the conformational space of fragment pairs at protein-protein binding interfaces is highly degenerate. Features in this degenerate space can be well characterized by artificial intelligence. In summary, our machine learning method will be potentially useful to search for and predict the conformations of unknown protein-protein interactions.
Collapse
Affiliation(s)
- Zhaoqian Su
- Data Science Institute, Vanderbilt University, 1001 19th Ave S, Nashville, TN 37212, USA
| | - Kalyani Dhusia
- Department of Systems and Computational Biology, Albert Einstein College of Medicine, 1300 Morris Park Avenue, Bronx, NY 10461, USA
| | - Yinghao Wu
- Department of Systems and Computational Biology, Albert Einstein College of Medicine, 1300 Morris Park Avenue, Bronx, NY 10461, USA.
| |
Collapse
|
18
|
Chen J, Ji Y, Liu Y, Cen Z, Chen Y, Zhang Y, Li X, Li X. Exhaled volatolomics profiling facilitates personalized screening for gastric cancer. Cancer Lett 2024; 590:216881. [PMID: 38614384 DOI: 10.1016/j.canlet.2024.216881] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2024] [Revised: 04/02/2024] [Accepted: 04/09/2024] [Indexed: 04/15/2024]
Abstract
Gastric cancer (GC) is one of the most fatal cancers, characterized by non-specific early symptoms and difficulty in detection. However, there are no valid non-invasive screening tools available for GC. Here we establish a non-invasive method that employs exhaled volatolomics and ensemble learning to detect GC. We developed a comprehensive mass spectrometry-based procedure and determined of a wide range of volatolomics from 314 breath samples. The discovery, identification and verification research screened a biomarker panel to distinguish GC from controls. This panel has achieved 0.90 (0.87-0.94, 95%CI) accuracy, with an area under curve (AUC) of 0.92 (0.89-0.94, 95%CI) in discovery cohort and 0.88 (0.83-0.91, 95%CI) accuracy with an AUC of 0.91 (0.87-0.93, 95%CI) in replication cohort, which outperformed traditional serum markers. Single-cell sequencing and gene set enrichment analysis revealed that these exhaled markers originated from aldehyde oxidation and pyruvate metabolism. Our approach advances the design of exhaled analysis for GC detection and holds promise as a non-invasive method to the clinic.
Collapse
Affiliation(s)
- Jian Chen
- Department of Environmental Science & Engineering, Fudan University, Shanghai, 200438, PR China
| | - Yongyan Ji
- Department of Environmental Science & Engineering, Fudan University, Shanghai, 200438, PR China
| | - Yongqian Liu
- Department of Environmental Science & Engineering, Fudan University, Shanghai, 200438, PR China
| | - Zhengnan Cen
- Department of Environmental Science & Engineering, Fudan University, Shanghai, 200438, PR China
| | - Yuanwen Chen
- Department of Gastroenterology, Huadong Hospital Affiliated to Fudan University, Shanghai, 200040, PR China
| | - Yixuan Zhang
- Department of Gastroenterology, Huadong Hospital Affiliated to Fudan University, Shanghai, 200040, PR China
| | - Xiaowen Li
- Department of Gastroenterology, Huadong Hospital Affiliated to Fudan University, Shanghai, 200040, PR China.
| | - Xiang Li
- Department of Environmental Science & Engineering, Fudan University, Shanghai, 200438, PR China.
| |
Collapse
|
19
|
Goles M, Daza A, Cabas-Mora G, Sarmiento-Varón L, Sepúlveda-Yañez J, Anvari-Kazemabad H, Davari MD, Uribe-Paredes R, Olivera-Nappa Á, Navarrete MA, Medina-Ortiz D. Peptide-based drug discovery through artificial intelligence: towards an autonomous design of therapeutic peptides. Brief Bioinform 2024; 25:bbae275. [PMID: 38856172 PMCID: PMC11163380 DOI: 10.1093/bib/bbae275] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2024] [Revised: 04/23/2024] [Accepted: 06/04/2024] [Indexed: 06/11/2024] Open
Abstract
With their diverse biological activities, peptides are promising candidates for therapeutic applications, showing antimicrobial, antitumour and hormonal signalling capabilities. Despite their advantages, therapeutic peptides face challenges such as short half-life, limited oral bioavailability and susceptibility to plasma degradation. The rise of computational tools and artificial intelligence (AI) in peptide research has spurred the development of advanced methodologies and databases that are pivotal in the exploration of these complex macromolecules. This perspective delves into integrating AI in peptide development, encompassing classifier methods, predictive systems and the avant-garde design facilitated by deep-generative models like generative adversarial networks and variational autoencoders. There are still challenges, such as the need for processing optimization and careful validation of predictive models. This work outlines traditional strategies for machine learning model construction and training techniques and proposes a comprehensive AI-assisted peptide design and validation pipeline. The evolving landscape of peptide design using AI is emphasized, showcasing the practicality of these methods in expediting the development and discovery of novel peptides within the context of peptide-based drug discovery.
Collapse
Affiliation(s)
- Montserrat Goles
- Departamento de Ingeniería en Computación, Universidad de Magallanes, Av. Pdte. Manuel Bulnes 01855, 6210427, Punta Arenas, Chile
- Departamento de Ingeniería Química, Biotecnología y Materiales, Universidad de Chile, Beauchef 851, 8370456, Santiago, Chile
| | - Anamaría Daza
- Centre for Biotechnology and Bioengineering, CeBiB, Universidad de Chile, Beauchef 851, 8370456, Santiago, Chile
| | - Gabriel Cabas-Mora
- Departamento de Ingeniería en Computación, Universidad de Magallanes, Av. Pdte. Manuel Bulnes 01855, 6210427, Punta Arenas, Chile
| | - Lindybeth Sarmiento-Varón
- Centro Asistencial de Docencia e Investigación, CADI, Universidad de Magallanes, Av. Los Flamencos 01364, 6210005, Punta Arenas, Chile
| | - Julieta Sepúlveda-Yañez
- Facultad de Ciencias de la Salud, Universidad de Magallanes, Av. Pdte. Manuel Bulnes 01855, 6210427, Punta Arenas, Chile
| | - Hoda Anvari-Kazemabad
- Departamento de Ingeniería en Computación, Universidad de Magallanes, Av. Pdte. Manuel Bulnes 01855, 6210427, Punta Arenas, Chile
| | - Mehdi D Davari
- Department of Bioorganic Chemistry, Leibniz Institute of Plant Biochemistry, Weinberg 3, 06120, Halle, Germany
| | - Roberto Uribe-Paredes
- Departamento de Ingeniería en Computación, Universidad de Magallanes, Av. Pdte. Manuel Bulnes 01855, 6210427, Punta Arenas, Chile
| | - Álvaro Olivera-Nappa
- Centre for Biotechnology and Bioengineering, CeBiB, Universidad de Chile, Beauchef 851, 8370456, Santiago, Chile
| | - Marcelo A Navarrete
- Centro Asistencial de Docencia e Investigación, CADI, Universidad de Magallanes, Av. Los Flamencos 01364, 6210005, Punta Arenas, Chile
- Escuela de Medicina, Universidad de Magallanes, Av. Pdte. Manuel Bulnes 01855, 6210427, Punta Arenas, Chile
| | - David Medina-Ortiz
- Departamento de Ingeniería en Computación, Universidad de Magallanes, Av. Pdte. Manuel Bulnes 01855, 6210427, Punta Arenas, Chile
- Centre for Biotechnology and Bioengineering, CeBiB, Universidad de Chile, Beauchef 851, 8370456, Santiago, Chile
| |
Collapse
|
20
|
Yuan Y, Tang X, Li H, Lang X, Song Y, Yang Y, Zhou Z. BiLSTM- and CNN-Based m6A Modification Prediction Model for circRNAs. Molecules 2024; 29:2429. [PMID: 38893304 PMCID: PMC11173551 DOI: 10.3390/molecules29112429] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2024] [Revised: 05/13/2024] [Accepted: 05/20/2024] [Indexed: 06/21/2024] Open
Abstract
m6A methylation, a ubiquitous modification on circRNAs, exerts a profound influence on RNA function, intracellular behavior, and diverse biological processes, including disease development. While prediction algorithms exist for mRNA m6A modifications, a critical gap remains in the prediction of circRNA m6A modifications. Therefore, accurate identification and prediction of m6A sites are imperative for understanding RNA function and regulation. This study presents a novel hybrid model combining a convolutional neural network (CNN) and a bidirectional long short-term memory network (BiLSTM) for precise m6A methylation site prediction in circular RNAs (circRNAs) based on data from HEK293 cells. This model exploits the synergy between CNN's ability to extract intricate sequence features and BiLSTM's strength in capturing long-range dependencies. Furthermore, the integrated attention mechanism empowers the model to pinpoint critical biological information for studying circRNA m6A methylation. Our model, exhibiting over 78% prediction accuracy on independent datasets, offers not only a valuable tool for scientific research but also a strong foundation for future biomedical applications. This work not only furthers our understanding of gene expression regulation but also opens new avenues for the exploration of circRNA methylation in biological research.
Collapse
Affiliation(s)
- Yuqian Yuan
- School of Artificial Intelligence and Information Technology, Nanjing University of Chinese Medicine, Nanjing 210023, China; (Y.Y.); (H.L.); (X.L.); (Y.S.)
| | - Xiaozhu Tang
- School of Medicine & Holistic Integrative Medicine, Nanjing University of Chinese Medicine, Nanjing 210023, China;
| | - Hongyan Li
- School of Artificial Intelligence and Information Technology, Nanjing University of Chinese Medicine, Nanjing 210023, China; (Y.Y.); (H.L.); (X.L.); (Y.S.)
| | - Xufeng Lang
- School of Artificial Intelligence and Information Technology, Nanjing University of Chinese Medicine, Nanjing 210023, China; (Y.Y.); (H.L.); (X.L.); (Y.S.)
| | - Yihua Song
- School of Artificial Intelligence and Information Technology, Nanjing University of Chinese Medicine, Nanjing 210023, China; (Y.Y.); (H.L.); (X.L.); (Y.S.)
| | - Ye Yang
- School of Medicine & Holistic Integrative Medicine, Nanjing University of Chinese Medicine, Nanjing 210023, China;
| | - Zuojian Zhou
- School of Artificial Intelligence and Information Technology, Nanjing University of Chinese Medicine, Nanjing 210023, China; (Y.Y.); (H.L.); (X.L.); (Y.S.)
| |
Collapse
|
21
|
Rennie S. Deep Learning for Elucidating Modifications to RNA-Status and Challenges Ahead. Genes (Basel) 2024; 15:629. [PMID: 38790258 PMCID: PMC11121098 DOI: 10.3390/genes15050629] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2024] [Revised: 05/11/2024] [Accepted: 05/11/2024] [Indexed: 05/26/2024] Open
Abstract
RNA-binding proteins and chemical modifications to RNA play vital roles in the co- and post-transcriptional regulation of genes. In order to fully decipher their biological roles, it is an essential task to catalogue their precise target locations along with their preferred contexts and sequence-based determinants. Recently, deep learning approaches have significantly advanced in this field. These methods can predict the presence or absence of modification at specific genomic regions based on diverse features, particularly sequence and secondary structure, allowing us to decipher the highly non-linear sequence patterns and structures that underlie site preferences. This article provides an overview of how deep learning is being applied to this area, with a particular focus on the problem of mRNA-RBP binding, while also considering other types of chemical modification to RNA. It discusses how different types of model can handle sequence-based and/or secondary-structure-based inputs, the process of model training, including choice of negative regions and separating sets for testing and training, and offers recommendations for developing biologically relevant models. Finally, it highlights four key areas that are crucial for advancing the field.
Collapse
Affiliation(s)
- Sarah Rennie
- Section for Computational and RNA Biology, Department of Biology, University of Copenhagen, 2200 Copenhagen, Denmark
| |
Collapse
|
22
|
Chen C, Zhou Y, Tong L, Pang Y, Xu J. Emerging 2D Ferroelectric Devices for In-Sensor and In-Memory Computing. ADVANCED MATERIALS (DEERFIELD BEACH, FLA.) 2024:e2400332. [PMID: 38739927 DOI: 10.1002/adma.202400332] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/08/2024] [Revised: 04/19/2024] [Indexed: 05/16/2024]
Abstract
The quantity of sensor nodes within current computing systems is rapidly increasing in tandem with the sensing data. The presence of a bottleneck in data transmission between the sensors, computing, and memory units obstructs the system's efficiency and speed. To minimize the latency of data transmission between units, novel in-memory and in-sensor computing architectures are proposed as alternatives to the conventional von Neumann architecture, aiming for data-intensive sensing and computing applications. The integration of 2D materials and 2D ferroelectric materials has been expected to build these novel sensing and computing architectures due to the dangling-bond-free surface, ultra-fast polarization flipping, and ultra-low power consumption of the 2D ferroelectrics. Here, the recent progress of 2D ferroelectric devices for in-sensing and in-memory neuromorphic computing is reviewed. Experimental and theoretical progresses on 2D ferroelectric devices, including passive ferroelectrics-integrated 2D devices and active ferroelectrics-integrated 2D devices, are reviewed followed by the integration of perception, memory, and computing application. Notably, 2D ferroelectric devices have been used to simulate synaptic weights, neuronal model functions, and neural networks for image processing. As an emerging device configuration, 2D ferroelectric devices have the potential to expand into the sensor-memory and computing integration application field, leading to new possibilities for modern electronics.
Collapse
Affiliation(s)
- Chunsheng Chen
- Department of Electronic Engineering and Materials Science and Technology Research Center, The Chinese University of Hong Kong, Hong Kong SAR, China
| | - Yaoqiang Zhou
- Department of Electronic Engineering and Materials Science and Technology Research Center, The Chinese University of Hong Kong, Hong Kong SAR, China
| | - Lei Tong
- Department of Electronic Engineering and Materials Science and Technology Research Center, The Chinese University of Hong Kong, Hong Kong SAR, China
| | - Yue Pang
- Department of Electronic Engineering and Materials Science and Technology Research Center, The Chinese University of Hong Kong, Hong Kong SAR, China
| | - Jianbin Xu
- Department of Electronic Engineering and Materials Science and Technology Research Center, The Chinese University of Hong Kong, Hong Kong SAR, China
| |
Collapse
|
23
|
Tian D, Li Q, Liu F, Khan J, Abbas MQ, Du Z. VOC data-driven evaluation of vehicle cabin odor: from ANN to CNN-BiLSTM. ENVIRONMENTAL SCIENCE AND POLLUTION RESEARCH INTERNATIONAL 2024; 31:32826-32841. [PMID: 38668943 DOI: 10.1007/s11356-024-33293-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/07/2024] [Accepted: 04/08/2024] [Indexed: 05/29/2024]
Abstract
Emissions of volatile organic compounds (VOCs) in vehicles represent a significant problem, causing unpleasant odors. To mitigate VOCs and odors in vehicles, it is critical to choose interior parts with low odor and VOC emissions. However, prevailing odor evaluation methods are subjective, costly, and potentially harmful to the health of evaluators. In this study, we analyzed 139 automotive interior parts and 92 vehicles, establishing a cost-effective, data-driven method for odor evaluation. The contents of benzene, toluene, ethylbenzene, xylene, styrene, formaldehyde, acetaldehyde, acrolein, and total volatile organic compounds (TVOC) were detected by thermal desorption gas chromatography-mass spectrometry (TD-GC/MS) and high-performance liquid chromatography with an ultraviolet detector (HPLC-UV). Professional odor evaluators assessed the odors, identifying intensity levels from 2.0 to 4.5 in interior parts and 2.5 to 3.5 in whole vehicles. Leveraging this data, we applied four supervised learning algorithms to develop predictive models for the odor intensity of both interior parts and entire vehicles. During model training, we implemented early stopping techniques for the artificial neural network (ANN) and convolutional neural network-bidirectional long short-term memory (CNN-BiLSTM) models, while optimizing the support vector machine (SVM) and extreme gradient boosting (XGBoost) models using the GridSearch algorithm. The evaluation results reveal that the CNN-BiLSTM model performs the best, achieving an average accuracy of 89% for unknown samples within an odor intensity level of 0.5. The root mean square error (RMSE) is 0.24, and the mean absolute error (MAE) is 0.08. The model also underwent a sevenfold cross-validation, achieving an accuracy of 83.43%. Additionally, we employed SHapley Additive exPlanations (SHAP) for the interpretative analysis of the model, which confirmed the consistency of each VOC's odor contribution with human olfactory rules. By predicting odors based on VOCs through supervised learning, this study reduces the costs and enhances the efficiency and applicability of odor assessment across various vehicle interiors.
Collapse
Affiliation(s)
- Dingwei Tian
- College of Chemistry, Beijing University of Chemical Technology, Beijing, 100029, People's Republic of China
| | - Qi Li
- China Automotive Engineering Research Institute Co. Ltd., Chongqing, 401122, People's Republic of China
| | - Fang Liu
- Beijing Chehejia Automobile Technology Co. Ltd., Beijing, 101399, People's Republic of China
| | - Jehangir Khan
- College of Chemistry, Beijing University of Chemical Technology, Beijing, 100029, People's Republic of China
| | - Muhammad Qamer Abbas
- College of Chemistry, Beijing University of Chemical Technology, Beijing, 100029, People's Republic of China
| | - Zhenxia Du
- College of Chemistry, Beijing University of Chemical Technology, Beijing, 100029, People's Republic of China.
| |
Collapse
|
24
|
Yu YW. On Minimizers and Convolutional Filters: Theoretical Connections and Applications to Genome Analysis. J Comput Biol 2024; 31:381-395. [PMID: 38687333 DOI: 10.1089/cmb.2024.0483] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/02/2024] Open
Abstract
Minimizers and convolutional neural networks (CNNs) are two quite distinct popular techniques that have both been employed to analyze categorical biological sequences. At face value, the methods seem entirely dissimilar. Minimizers use min-wise hashing on a rolling window to extract a single important k-mer feature per window. CNNs start with a wide array of randomly initialized convolutional filters, paired with a pooling operation, and then multiple additional neural layers to learn both the filters themselves and how they can be used to classify the sequence. In this study, our main result is a careful mathematical analysis of hash function properties showing that for sequences over a categorical alphabet, random Gaussian initialization of convolutional filters with max-pooling is equivalent to choosing a minimizer ordering such that selected k-mers are (in Hamming distance) far from the k-mers within the sequence but close to other minimizers. In empirical experiments, we find that this property manifests as decreased density in repetitive regions, both in simulation and on real human telomeres. We additionally train from scratch a CNN embedding of synthetic short-reads from the SARS-CoV-2 genome into 3D Euclidean space that locally recapitulates the linear sequence distance of the read origins, a modest step toward building a deep learning assembler, although it is at present too slow to be practical. In total, this article provides a partial explanation for the effectiveness of CNNs in categorical sequence analysis.
Collapse
Affiliation(s)
- Yun William Yu
- Department of Mathematics, University of Toronto, Toronto, Ontario, Canada
- Department of Ray and Stephanie Lane Computational Biology, Carnegie Mellon University, Pittsburgh, Pennsylvania, USA
| |
Collapse
|
25
|
Yoshikawa C, Nguyen DA, Nakaji-Hirabayashi T, Takigawa I, Mamitsuka H. Graph Network-Based Simulation of Multicellular Dynamics Driven by Concentrated Polymer Brush-Modified Cellulose Nanofibers. ACS Biomater Sci Eng 2024; 10:2165-2176. [PMID: 38546298 DOI: 10.1021/acsbiomaterials.3c01888] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/09/2024]
Abstract
Manipulating the three-dimensional (3D) structures of cells is important for facilitating to repair or regenerate tissues. A self-assembly system of cells with cellulose nanofibers (CNFs) and concentrated polymer brushes (CPBs) has been developed to fabricate various cell 3D structures. To further generate tissues at an implantable level, it is necessary to carry out a large number of experiments using different cell culture conditions and material properties; however this is practically intractable. To address this issue, we present a graph-neural network-based simulator (GNS) that can be trained by using assembly process images to predict the assembly status of future time steps. A total of 24 (25 steps) time-series images were recorded (four repeats for each of six different conditions), and each image was transformed into a graph by regarding the cells as nodes and the connecting neighboring cells as edges. Using the obtained data, the performances of the GNS were examined under three scenarios (i.e., changing a pair of the training and testing data) to verify the possibility of using the GNS as a predictor for further time steps. It was confirmed that the GNS could reasonably reproduce the assembly process, even under the toughest scenario, in which the experimental conditions differed between the training and testing data. Practically, this means that the GNS trained by the first 24 h images could predict the cell types obtained 3 weeks later. This result could reduce the number of experiments required to find the optimal conditions for generating cells with desired 3D structures. Ultimately, our approach could accelerate progress in regenerative medicine.
Collapse
Affiliation(s)
- Chiaki Yoshikawa
- Research Center for Functional Materials, National Institute for Materials Science (NIMS), Tsukuba, Ibaraki 305-0047, Japan
| | - Duc Anh Nguyen
- Bioinformatics Center, Institute for Chemical Research, Kyoto University, Uji, Kyoto 611-0011, Japan
| | - Tadashi Nakaji-Hirabayashi
- Graduate School of Science and Engineering, University of Toyama, Toyama, Toyama 930-8555, Japan
- Graduate School of Innovative Life Science, University of Toyama, Toyama, Toyama 930-0194, Japan
| | - Ichigaku Takigawa
- Center for Innovative Research and Education in Data Science (CIREDS), Institute for Liberal Arts and Sciences, Kyoto University, Kyoto, Kyoto 606-8315, Japan
| | - Hiroshi Mamitsuka
- Bioinformatics Center, Institute for Chemical Research, Kyoto University, Uji, Kyoto 611-0011, Japan
| |
Collapse
|
26
|
Su Z, Griffin B, Emmons S, Wu Y. Prediction of interactions between cell surface proteins by machine learning. Proteins 2024; 92:567-580. [PMID: 38050713 DOI: 10.1002/prot.26648] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2023] [Revised: 11/15/2023] [Accepted: 11/20/2023] [Indexed: 12/06/2023]
Abstract
Cells detect changes in their external environments or communicate with each other through proteins on their surfaces. These cell surface proteins form a complicated network of interactions in order to fulfill their functions. The interactions between cell surface proteins are highly dynamic and, thus, challenging to detect using traditional experimental techniques. Here, we tackle this challenge using a computational framework. The primary focus of the framework is to develop new tools to identify interactions between domains in the immunoglobulin (Ig) fold, which is the most abundant domain family in cell surface proteins. These interactions could be formed between ligands and receptors from different cells or between proteins on the same cell surface. In practice, we collected all structural data on Ig domain interactions and transformed them into an interface fragment pair library. A high-dimensional profile can then be constructed from the library for a given pair of query protein sequences. Multiple machine learning models were used to read this profile so that the probability of interaction between the query proteins could be predicted. We tested our models on an experimentally derived dataset that contains 564 cell surface proteins in humans. The cross-validation results show that we can achieve higher than 70% accuracy in identifying the PPIs within this dataset. We then applied this method to a group of 46 cell surface proteins in Caenorhabditis elegans. We screened every possible interaction between these proteins. Many interactions recognized by our machine learning classifiers have been experimentally confirmed in the literature. In conclusion, our computational platform serves as a useful tool to help identify potential new interactions between cell surface proteins in addition to current state-of-the-art experimental techniques. The tool is freely accessible for use by the scientific community. Moreover, the general framework of the machine learning classification can also be extended to study the interactions of proteins in other domain superfamilies.
Collapse
Affiliation(s)
- Zhaoqian Su
- Department of Systems and Computational Biology, Albert Einstein College of Medicine, Bronx, New York, USA
| | - Brian Griffin
- Department of Genetics, Albert Einstein College of Medicine, Bronx, New York, USA
| | - Scott Emmons
- Department of Genetics, Albert Einstein College of Medicine, Bronx, New York, USA
| | - Yinghao Wu
- Department of Systems and Computational Biology, Albert Einstein College of Medicine, Bronx, New York, USA
| |
Collapse
|
27
|
Achudhan AB, Kannan P, Gupta A, Saleena LM. A Review of Web-Based Metagenomics Platforms for Analysing Next-Generation Sequence Data. Biochem Genet 2024; 62:621-632. [PMID: 37507643 DOI: 10.1007/s10528-023-10467-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2023] [Accepted: 07/18/2023] [Indexed: 07/30/2023]
Abstract
Metagenomics has now evolved as a promising technology for understanding the microbial population in the environment. By metagenomics, a number of extreme and complex environment has been explored for their microbial population. Using this technology, researchers have brought out novel genes and their potential characteristics, which have robust applications in food, pharmaceutical, scientific research, and other biotechnological fields. A sequencing platform can provide a sequence of microbial populations in any given environment. The sequence needs to be analysed computationally to derive meaningful information. It is presumed that only bioinformaticians with extensive computational skills can process the sequencing data till the downstream end. However, numerous open-source software and online servers are available to analyse the metagenomic data developed for a biologist with less computational skills. This review is focused on bioinformatics tools such as Galaxy, CSI-NGS portal, ANASTASIA and SHAMAN, EBI- metagenomics, IDseq, and MG-RAST for analysing metagenomic data.
Collapse
Affiliation(s)
- Arunmozhi Bharathi Achudhan
- Department of Biotechnology, School of Bioengineering, College of Engineering and Technology, SRM Institute of Science and Technology, Kattankulathur, Tamil Nadu, India
| | - Priya Kannan
- Department of Biotechnology, School of Bioengineering, College of Engineering and Technology, SRM Institute of Science and Technology, Kattankulathur, Tamil Nadu, India
| | - Annapurna Gupta
- Department of Biotechnology, School of Bioengineering, College of Engineering and Technology, SRM Institute of Science and Technology, Kattankulathur, Tamil Nadu, India
| | - Lilly M Saleena
- Department of Biotechnology, School of Bioengineering, College of Engineering and Technology, SRM Institute of Science and Technology, Kattankulathur, Tamil Nadu, India.
| |
Collapse
|
28
|
Rosa LAS, Brugnago EL, Delben GJ, Rost JM, Beims MW. The influence of hyperchaoticity, synchronization, and Shannon entropy on the performance of a physical reservoir computer. CHAOS (WOODBURY, N.Y.) 2024; 34:043120. [PMID: 38579146 DOI: 10.1063/5.0175001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/04/2023] [Accepted: 03/21/2024] [Indexed: 04/07/2024]
Abstract
In this paper, we analyze the dynamic effect of a reservoir computer (RC) on its performance. Modified Kuramoto's coupled oscillators are used to model the RC, and synchronization, Lyapunov spectrum (and dimension), Shannon entropy, and the upper bound of the Kolmogorov-Sinai entropy are employed to characterize the dynamics of the RC. The performance of the RC is analyzed by reproducing the distribution of random, Gaussian, and quantum jumps series (shelved states) since a replica of the time evolution of a completely random series is not possible to generate. We demonstrate that hyperchaotic motion, moderate Shannon entropy, and a higher degree of synchronization of Kuramoto's oscillators lead to the best performance of the RC. Therefore, an appropriate balance of irregularity and order in the oscillator's dynamics leads to better performances.
Collapse
Affiliation(s)
- Lucas A S Rosa
- Departamento de Física, Universidade Federal do Paraná, 81531-980 Curitiba, Paraná, Brazil
| | - Eduardo L Brugnago
- Instituto de Fí sica, Universidade de São Paulo, 05508-090 São Paulo, SP, Brazil
| | - Guilherme J Delben
- Departamento de Ciências Naturais e Sociais, Universidade Federal de Santa Catarina, 89520-000 Curitibanos, SC, Brazil
| | - Jan-Michael Rost
- Max-Planck Institute for the Physics of Complex Systems, Nöthnitzerstr.38, 01187 Dresden, Germany
| | - Marcus W Beims
- Departamento de Física, Universidade Federal do Paraná, 81531-980 Curitiba, Paraná, Brazil
- Max-Planck Institute for the Physics of Complex Systems, Nöthnitzerstr.38, 01187 Dresden, Germany
| |
Collapse
|
29
|
Eledkawy A, Hamza T, El-Metwally S. Precision cancer classification using liquid biopsy and advanced machine learning techniques. Sci Rep 2024; 14:5841. [PMID: 38462648 PMCID: PMC10925597 DOI: 10.1038/s41598-024-56419-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2023] [Accepted: 03/06/2024] [Indexed: 03/12/2024] Open
Abstract
Cancer presents a significant global health burden, resulting in millions of annual deaths. Timely detection is critical for improving survival rates, offering a crucial window for timely medical interventions. Liquid biopsy, analyzing genetic variations, and mutations in circulating cell-free, circulating tumor DNA (cfDNA/ctDNA) or molecular biomarkers, has emerged as a tool for early detection. This study focuses on cancer detection using mutations in plasma cfDNA/ctDNA and protein biomarker concentrations. The proposed system initially calculates the correlation coefficient to identify correlated features, while mutual information assesses each feature's relevance to the target variable, eliminating redundant features to improve efficiency. The eXtrem Gradient Boosting (XGBoost) feature importance method iteratively selects the top ten features, resulting in a 60% dataset dimensionality reduction. The Light Gradient Boosting Machine (LGBM) model is employed for classification, optimizing its performance through a random search for hyper-parameters. Final predictions are obtained by ensembling LGBM models from tenfold cross-validation, weighted by their respective balanced accuracy, and averaged to get final predictions. Applying this methodology, the proposed system achieves 99.45% accuracy and 99.95% AUC for detecting the presence of cancer while achieving 93.94% accuracy and 97.81% AUC for cancer-type classification. Our methodology leads to enhanced healthcare outcomes for cancer patients.
Collapse
Affiliation(s)
- Amr Eledkawy
- Department of Computer Science, Faculty of Computers and Information, Mansoura University, P.O. Box: 35516, Mansoura, Egypt
| | - Taher Hamza
- Department of Computer Science, Faculty of Computers and Information, Mansoura University, P.O. Box: 35516, Mansoura, Egypt
| | - Sara El-Metwally
- Department of Computer Science, Faculty of Computers and Information, Mansoura University, P.O. Box: 35516, Mansoura, Egypt.
- Biomedical Informatics Department, Faculty of Computer Science and Engineering, New Mansoura University, Gamasa, 35712, Egypt.
| |
Collapse
|
30
|
Malakar S, Sutaoney P, Madhyastha H, Shah K, Chauhan NS, Banerjee P. Understanding gut microbiome-based machine learning platforms: A review on therapeutic approaches using deep learning. Chem Biol Drug Des 2024; 103:e14505. [PMID: 38491814 DOI: 10.1111/cbdd.14505] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2023] [Revised: 02/21/2024] [Accepted: 03/04/2024] [Indexed: 03/18/2024]
Abstract
Human beings possess trillions of microbial cells in a symbiotic relationship. This relationship benefits both partners for a long time. The gut microbiota helps in many bodily functions from harvesting energy from digested food to strengthening biochemical barriers of the gut and intestine. But the changes in microbiota composition and bacteria that can enter the gastrointestinal tract can cause infection. Several approaches like culture-independent techniques such as high-throughput and meta-omics projects targeting 16S ribosomal RNA (rRNA) sequencing are popular methods to investigate the composition of the human gastrointestinal tract microbiota and taxonomically characterizing microbial communities. The microbiota conformation and diversity should be provided by whole-genome shotgun metagenomic sequencing of site-specific community DNA associating genome mapping, gene inventory, and metabolic remodelling and reformation, to ease the functional study of human microbiota. Preliminary examination of the therapeutic potency for dysbiosis-associated diseases permits investigation of pharmacokinetic-pharmacodynamic changes in microbial communities for escalation of treatment and dosage plan. Gut microbiome study is an integration of metagenomics which has influenced the field in the last two decades. And the incorporation of artificial intelligence and deep learning through "omics-based" methods and microfluidic evaluation enhanced the capability of identification of thousands of microbes.
Collapse
Affiliation(s)
- Shilpa Malakar
- Department of Microbiology, Kalinga University, Raipur, Chhattisgarh, India
| | - Priya Sutaoney
- Department of Microbiology, Kalinga University, Raipur, Chhattisgarh, India
| | - Harishkumar Madhyastha
- Department of Cardiovascular Physiology, Faculty of Medicine, University of Miyazaki, Miyazaki, Japan
| | - Kamal Shah
- Institute of Pharmaceutical Research, GLA University, Mathura, Uttar Pradesh, India
| | - Nagendra Singh Chauhan
- Department of Medical education, Drugs Testing Laboratory Avam Anusandhan Kendra, Raipur, Chhattisgarh, India
| | - Paromita Banerjee
- Department of Cardiology, AIIMS Rishikesh, Rishikesh, Uttarkhand, India
| |
Collapse
|
31
|
Zhang S, Li YD, Cai YR, Kang XP, Feng Y, Li YC, Chen YH, Li J, Bao LL, Jiang T. Compositional features analysis by machine learning in genome represents linear adaptation of monkeypox virus. Front Genet 2024; 15:1361952. [PMID: 38495668 PMCID: PMC10940399 DOI: 10.3389/fgene.2024.1361952] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/27/2023] [Accepted: 02/21/2024] [Indexed: 03/19/2024] Open
Abstract
Introduction: The global headlines have been dominated by the sudden and widespread outbreak of monkeypox, a rare and endemic zoonotic disease caused by the monkeypox virus (MPXV). Genomic composition based machine learning (ML) methods have recently shown promise in identifying host adaptability and evolutionary patterns of virus. Our study aimed to analyze the genomic characteristics and evolutionary patterns of MPXV using ML methods. Methods: The open reading frame (ORF) regions of full-length MPXV genomes were filtered and 165 ORFs were selected as clusters with the highest homology. Unsupervised machine learning methods of t-distributed stochastic neighbor embedding (t-SNE), Principal Component Analysis (PCA), and hierarchical clustering were performed to observe the DCR characteristics of the selected ORF clusters. Results: The results showed that MPXV sequences post-2022 showed an obvious linear adaptive evolution, indicating that it has become more adapted to the human host after accumulating mutations. For further accurate analysis, the ORF regions with larger variations were filtered out based on the ranking of homology difference to narrow down the key ORF clusters, which drew the same conclusion of linear adaptability. Then key differential protein structures were predicted by AlphaFold 2, which meant that difference in main domains might be one of the internal reasons for linear adaptive evolution. Discussion: Understanding the process of linear adaptation is critical in the constant evolutionary struggle between viruses and their hosts, playing a significant role in crafting effective measures to tackle viral diseases. Therefore, the present study provides valuable insights into the evolutionary patterns of the MPXV in 2022 from the perspective of genomic composition characteristics analysis through ML methods.
Collapse
Affiliation(s)
- Sen Zhang
- State Key Laboratory of Pathogen and Biosecurity, Beijing Institute of Microbiology and Epidemiology, Academy of Military Medical Sciences, Beijing, China
| | - Ya-Dan Li
- College of Basic Medical Sciences, Anhui Medical University, Hefei, China
| | - Yu-Rong Cai
- State Key Laboratory of Pathogen and Biosecurity, Beijing Institute of Microbiology and Epidemiology, Academy of Military Medical Sciences, Beijing, China
- College of the First Clinical Medical, Inner Mongolia Medical University, Hohhot, China
| | - Xiao-Ping Kang
- State Key Laboratory of Pathogen and Biosecurity, Beijing Institute of Microbiology and Epidemiology, Academy of Military Medical Sciences, Beijing, China
| | - Ye Feng
- State Key Laboratory of Pathogen and Biosecurity, Beijing Institute of Microbiology and Epidemiology, Academy of Military Medical Sciences, Beijing, China
| | - Yu-Chang Li
- State Key Laboratory of Pathogen and Biosecurity, Beijing Institute of Microbiology and Epidemiology, Academy of Military Medical Sciences, Beijing, China
| | - Yue-Hong Chen
- State Key Laboratory of Pathogen and Biosecurity, Beijing Institute of Microbiology and Epidemiology, Academy of Military Medical Sciences, Beijing, China
| | - Jing Li
- State Key Laboratory of Pathogen and Biosecurity, Beijing Institute of Microbiology and Epidemiology, Academy of Military Medical Sciences, Beijing, China
- College of Basic Medical Sciences, Anhui Medical University, Hefei, China
| | - Li-Li Bao
- College of Basic Medical Sciences, Inner Mongolia Medical University, Hohhot, China
| | - Tao Jiang
- College of Basic Medical Sciences, Anhui Medical University, Hefei, China
| |
Collapse
|
32
|
DeLuca M, Sensale S, Lin PA, Arya G. Prediction and Control in DNA Nanotechnology. ACS APPLIED BIO MATERIALS 2024; 7:626-645. [PMID: 36880799 DOI: 10.1021/acsabm.2c01045] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/08/2023]
Abstract
DNA nanotechnology is a rapidly developing field that uses DNA as a building material for nanoscale structures. Key to the field's development has been the ability to accurately describe the behavior of DNA nanostructures using simulations and other modeling techniques. In this Review, we present various aspects of prediction and control in DNA nanotechnology, including the various scales of molecular simulation, statistical mechanics, kinetic modeling, continuum mechanics, and other prediction methods. We also address the current uses of artificial intelligence and machine learning in DNA nanotechnology. We discuss how experiments and modeling are synergistically combined to provide control over device behavior, allowing scientists to design molecular structures and dynamic devices with confidence that they will function as intended. Finally, we identify processes and scenarios where DNA nanotechnology lacks sufficient prediction ability and suggest possible solutions to these weak areas.
Collapse
Affiliation(s)
- Marcello DeLuca
- Thomas Lord Department of Mechanical Engineering and Materials Science, Duke University, Durham, North Carolina 27708, United States
| | - Sebastian Sensale
- Department of Physics, Cleveland State University, Cleveland, Ohio 44115, United States
| | - Po-An Lin
- Thomas Lord Department of Mechanical Engineering and Materials Science, Duke University, Durham, North Carolina 27708, United States
| | - Gaurav Arya
- Thomas Lord Department of Mechanical Engineering and Materials Science, Duke University, Durham, North Carolina 27708, United States
| |
Collapse
|
33
|
Wu W, Peng Y, Xu M, Yan T, Zhang D, Chen Y, Mei K, Chen Q, Wang X, Qiao Z, Wang C, Wu S, Zhang Q. Deep-Learning-Based Nanomechanical Vibration for Rapid and Label-Free Assay of Epithelial Mesenchymal Transition. ACS NANO 2024; 18:3480-3496. [PMID: 38169507 DOI: 10.1021/acsnano.3c10811] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/05/2024]
Abstract
Cancer is a profound danger to our life and health. The classification and related studies of epithelial and mesenchymal phenotypes of cancer cells are key scientific questions in cancer research. Here, we investigated cancer cell colonies from a mechanical perspective and developed an assay for classifying epithelial/mesenchymal cancer cell colonies using the biomechanical fingerprint in the form of "nanovibration" in combination with deep learning. The classification method requires only 1 s of vibration data and has a classification accuracy of nearly 92.5%. The method has also been validated for the screening of anticancer drugs. Compared with traditional methods, the method has the advantages of being nondestructive, label-free, and highly sensitive. Furthermore, we proposed a perspective that subcellular structure influences the amplitude and spectrum of nanovibrations and demonstrated it using experiments and numerical simulation. These findings allow internal changes in the cell colony to be manifested by nanovibrations. This work provides a perspective and an ancillary method for cancer cell phenotype diagnosis and promotes the study of biomechanical mechanisms of cancer progression.
Collapse
Affiliation(s)
- Wenjie Wu
- CAS Key Laboratory of Mechanical Behavior and Design of Material, Department of Modern Mechanics, University of Science and Technology of China, Hefei, Anhui 230027, People's Republic of China
| | - Yongpei Peng
- CAS Key Laboratory of Mechanical Behavior and Design of Material, Department of Modern Mechanics, University of Science and Technology of China, Hefei, Anhui 230027, People's Republic of China
| | - Mengjun Xu
- Department of Electronic Engineering and Information Science, University of Science and Technology of China, Hefei, Anhui 230027, People's Republic of China
| | - Tianhao Yan
- Department of Cell Biology, College of Basic Medical Sciences, Jilin University, Changchun 130021, People's Republic of China
| | - Duo Zhang
- CAS Key Laboratory of Mechanical Behavior and Design of Material, Department of Modern Mechanics, University of Science and Technology of China, Hefei, Anhui 230027, People's Republic of China
| | - Ye Chen
- CAS Key Laboratory of Mechanical Behavior and Design of Material, Department of Modern Mechanics, University of Science and Technology of China, Hefei, Anhui 230027, People's Republic of China
| | - Kainan Mei
- CAS Key Laboratory of Mechanical Behavior and Design of Material, Department of Modern Mechanics, University of Science and Technology of China, Hefei, Anhui 230027, People's Republic of China
| | - Qiubo Chen
- CAS Key Laboratory of Mechanical Behavior and Design of Material, Department of Modern Mechanics, University of Science and Technology of China, Hefei, Anhui 230027, People's Republic of China
| | - Xiapeng Wang
- CAS Key Laboratory of Mechanical Behavior and Design of Material, Department of Modern Mechanics, University of Science and Technology of China, Hefei, Anhui 230027, People's Republic of China
| | - Zihan Qiao
- CAS Key Laboratory of Mechanical Behavior and Design of Material, Department of Modern Mechanics, University of Science and Technology of China, Hefei, Anhui 230027, People's Republic of China
| | - Chen Wang
- CAS Key Laboratory of Mechanical Behavior and Design of Material, Department of Modern Mechanics, University of Science and Technology of China, Hefei, Anhui 230027, People's Republic of China
| | - Shangquan Wu
- CAS Key Laboratory of Mechanical Behavior and Design of Material, Department of Modern Mechanics, University of Science and Technology of China, Hefei, Anhui 230027, People's Republic of China
| | - Qingchuan Zhang
- CAS Key Laboratory of Mechanical Behavior and Design of Material, Department of Modern Mechanics, University of Science and Technology of China, Hefei, Anhui 230027, People's Republic of China
| |
Collapse
|
34
|
Wang H, Huang T, Wang D, Zeng W, Sun Y, Zhang L. MSCAN: multi-scale self- and cross-attention network for RNA methylation site prediction. BMC Bioinformatics 2024; 25:32. [PMID: 38233745 PMCID: PMC10795237 DOI: 10.1186/s12859-024-05649-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2023] [Accepted: 01/11/2024] [Indexed: 01/19/2024] Open
Abstract
BACKGROUND Epi-transcriptome regulation through post-transcriptional RNA modifications is essential for all RNA types. Precise recognition of RNA modifications is critical for understanding their functions and regulatory mechanisms. However, wet experimental methods are often costly and time-consuming, limiting their wide range of applications. Therefore, recent research has focused on developing computational methods, particularly deep learning (DL). Bidirectional long short-term memory (BiLSTM), convolutional neural network (CNN), and the transformer have demonstrated achievements in modification site prediction. However, BiLSTM cannot achieve parallel computation, leading to a long training time, CNN cannot learn the dependencies of the long distance of the sequence, and the Transformer lacks information interaction with sequences at different scales. This insight underscores the necessity for continued research and development in natural language processing (NLP) and DL to devise an enhanced prediction framework that can effectively address the challenges presented. RESULTS This study presents a multi-scale self- and cross-attention network (MSCAN) to identify the RNA methylation site using an NLP and DL way. Experiment results on twelve RNA modification sites (m6A, m1A, m5C, m5U, m6Am, m7G, Ψ, I, Am, Cm, Gm, and Um) reveal that the area under the receiver operating characteristic of MSCAN obtains respectively 98.34%, 85.41%, 97.29%, 96.74%, 99.04%, 79.94%, 76.22%, 65.69%, 92.92%, 92.03%, 95.77%, 89.66%, which is better than the state-of-the-art prediction model. This indicates that the model has strong generalization capabilities. Furthermore, MSCAN reveals a strong association among different types of RNA modifications from an experimental perspective. A user-friendly web server for predicting twelve widely occurring human RNA modification sites (m6A, m1A, m5C, m5U, m6Am, m7G, Ψ, I, Am, Cm, Gm, and Um) is available at http://47.242.23.141/MSCAN/index.php . CONCLUSIONS A predictor framework has been developed through binary classification to predict RNA methylation sites.
Collapse
Affiliation(s)
- Honglei Wang
- School of Information and Control Engineering, China University of Mining and Technology, Xuzhou, 221116, China
- School of Information Engineering, Xuzhou College of Industrial Technology, Xuzhou, 221400, China
| | - Tao Huang
- School of Information and Control Engineering, China University of Mining and Technology, Xuzhou, 221116, China
| | - Dong Wang
- School of Computer Science and Technology, China University of Mining and Technology, Xuzhou, 221116, China
| | - Wenliang Zeng
- School of Information and Control Engineering, China University of Mining and Technology, Xuzhou, 221116, China
| | - Yanjing Sun
- School of Information and Control Engineering, China University of Mining and Technology, Xuzhou, 221116, China.
| | - Lin Zhang
- School of Information and Control Engineering, China University of Mining and Technology, Xuzhou, 221116, China.
| |
Collapse
|
35
|
Mu L, Song J, Akutsu T, Mori T. DiCleave: a deep learning model for predicting human Dicer cleavage sites. BMC Bioinformatics 2024; 25:13. [PMID: 38195423 PMCID: PMC10775615 DOI: 10.1186/s12859-024-05638-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2023] [Accepted: 01/03/2024] [Indexed: 01/11/2024] Open
Abstract
BACKGROUND MicroRNAs (miRNAs) are a class of non-coding RNAs that play a pivotal role as gene expression regulators. These miRNAs are typically approximately 20 to 25 nucleotides long. The maturation of miRNAs requires Dicer cleavage at specific sites within the precursor miRNAs (pre-miRNAs). Recent advances in machine learning-based approaches for cleavage site prediction, such as PHDcleav and LBSizeCleav, have been reported. ReCGBM, a gradient boosting-based model, demonstrates superior performance compared with existing methods. Nonetheless, ReCGBM operates solely as a binary classifier despite the presence of two cleavage sites in a typical pre-miRNA. Previous approaches have focused on utilizing only a fraction of the structural information in pre-miRNAs, often overlooking comprehensive secondary structure information. There is a compelling need for the development of a novel model to address these limitations. RESULTS In this study, we developed a deep learning model for predicting the presence of a Dicer cleavage site within a pre-miRNA segment. This model was enhanced by an autoencoder that learned the secondary structure embeddings of pre-miRNA. Benchmarking experiments demonstrated that the performance of our model was comparable to that of ReCGBM in the binary classification tasks. In addition, our model excelled in multi-class classification tasks, making it a more versatile and practical solution than ReCGBM. CONCLUSIONS Our proposed model exhibited superior performance compared with the current state-of-the-art model, underscoring the effectiveness of a deep learning approach in predicting Dicer cleavage sites. Furthermore, our model could be trained using only sequence and secondary structure information. Its capacity to accommodate multi-class classification tasks has enhanced the practical utility of our model.
Collapse
Affiliation(s)
- Lixuan Mu
- Bioinformatics Center, Institute for Chemical Research, Kyoto University, Kyoto, 611-0011, Japan
| | - Jiangning Song
- Monash Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC, 3800, Australia
| | - Tatsuya Akutsu
- Bioinformatics Center, Institute for Chemical Research, Kyoto University, Kyoto, 611-0011, Japan
| | - Tomoya Mori
- Bioinformatics Center, Institute for Chemical Research, Kyoto University, Kyoto, 611-0011, Japan.
| |
Collapse
|
36
|
Trost J, Haag J, Höhler D, Jacob L, Stamatakis A, Boussau B. Simulations of Sequence Evolution: How (Un)realistic They Are and Why. Mol Biol Evol 2024; 41:msad277. [PMID: 38124381 PMCID: PMC10768886 DOI: 10.1093/molbev/msad277] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2023] [Revised: 11/17/2023] [Accepted: 12/08/2023] [Indexed: 12/23/2023] Open
Abstract
MOTIVATION Simulating multiple sequence alignments (MSAs) using probabilistic models of sequence evolution plays an important role in the evaluation of phylogenetic inference tools and is crucial to the development of novel learning-based approaches for phylogenetic reconstruction, for instance, neural networks. These models and the resulting simulated data need to be as realistic as possible to be indicative of the performance of the developed tools on empirical data and to ensure that neural networks trained on simulations perform well on empirical data. Over the years, numerous models of evolution have been published with the goal to represent as faithfully as possible the sequence evolution process and thus simulate empirical-like data. In this study, we simulated DNA and protein MSAs under increasingly complex models of evolution with and without insertion/deletion (indel) events using a state-of-the-art sequence simulator. We assessed their realism by quantifying how accurately supervised learning methods are able to predict whether a given MSA is simulated or empirical. RESULTS Our results show that we can distinguish between empirical and simulated MSAs with high accuracy using two distinct and independently developed classification approaches across all tested models of sequence evolution. Our findings suggest that the current state-of-the-art models fail to accurately replicate several aspects of empirical MSAs, including site-wise rates as well as amino acid and nucleotide composition.
Collapse
Affiliation(s)
- Johanna Trost
- Biometry and Evolutionary Biology Laboratory (LBBE), University Claude Bernard Lyon 1, Lyon, France
| | - Julia Haag
- Computational Molecular Evolution Group, Heidelberg Institute for Theoretical Studies, Heidelberg, Germany
| | - Dimitri Höhler
- Computational Molecular Evolution Group, Heidelberg Institute for Theoretical Studies, Heidelberg, Germany
| | - Laurent Jacob
- CNRS, IBPS, Laboratory of Computational and Quantitative Biology (LCQB), UMR 7238, Sorbonne Université, Paris 75005, France
| | - Alexandros Stamatakis
- Computational Molecular Evolution Group, Heidelberg Institute for Theoretical Studies, Heidelberg, Germany
- Biodiversity Computing Group, Institute of Computer Science, Foundation for Research and Technology - Hellas, Heraklion, Crete, Greece
- Institute for Theoretical Informatics, Karlsruhe Institute of Technology, Karlsruhe, Germany
| | - Bastien Boussau
- Biometry and Evolutionary Biology Laboratory (LBBE), University Claude Bernard Lyon 1, Lyon, France
| |
Collapse
|
37
|
Li J, Varghese RS, Ressom HW. RNA-Seq Data Analysis. Methods Mol Biol 2024; 2822:263-290. [PMID: 38907924 DOI: 10.1007/978-1-0716-3918-4_18] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/24/2024]
Abstract
RNA-Seq data analysis stands as a vital part of genomics research, turning vast and complex datasets into meaningful biological insights. It is a field marked by rapid evolution and ongoing innovation, necessitating a thorough understanding for anyone seeking to unlock the potential of RNA-Seq data. In this chapter, we describe the intricate landscape of RNA-seq data analysis, elucidating a comprehensive pipeline that navigates through the entirety of this complex process. Beginning with quality control, the chapter underscores the paramount importance of ensuring the integrity of RNA-seq data, as it lays the groundwork for subsequent analyses. Preprocessing is then addressed, where the raw sequence data undergoes necessary modifications and enhancements, setting the stage for the alignment phase. This phase involves mapping the processed sequences to a reference genome, a step pivotal for decoding the origins and functions of these sequences.Venturing into the heart of RNA-seq analysis, the chapter then explores differential expression analysis-the process of identifying genes that exhibit varying expression levels across different conditions or sample groups. Recognizing the biological context of these differentially expressed genes is pivotal; hence, the chapter transitions into functional analysis. Here, methods and tools like Gene Ontology and pathway analyses help contextualize the roles and interactions of the identified genes within broader biological frameworks. However, the chapter does not stop at conventional analysis methods. Embracing the evolving paradigms of data science, it delves into machine learning applications for RNA-seq data, introducing advanced techniques in dimension reduction and both unsupervised and supervised learning. These approaches allow for patterns and relationships to be discerned in the data that might be imperceptible through traditional methods.
Collapse
Affiliation(s)
- James Li
- Genomics & Epigenomics Shared Resource, Lombardi Comprehensive Cancer Center, Georgetown University Medical Center, Washington, DC, USA
| | - Rency S Varghese
- Genomics & Epigenomics Shared Resource, Lombardi Comprehensive Cancer Center, Georgetown University Medical Center, Washington, DC, USA
| | - Habtom W Ressom
- Genomics & Epigenomics Shared Resource, Lombardi Comprehensive Cancer Center, Georgetown University Medical Center, Washington, DC, USA.
| |
Collapse
|
38
|
Hashizume T, Ying BW. Challenges in developing cell culture media using machine learning. Biotechnol Adv 2024; 70:108293. [PMID: 37984683 DOI: 10.1016/j.biotechadv.2023.108293] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2023] [Revised: 10/17/2023] [Accepted: 11/14/2023] [Indexed: 11/22/2023]
Abstract
Microbial and mammalian cells are widely used in the food, pharmaceutical, and medical industries. Developing or optimizing culture media is essential to improve cell culture performance as a critical technology in cell culture engineering. Methodologies for media optimization have been developed to a great extent, such as the approaches of one-factor-at-a-time (OFAT) and response surface methodology (RSM). The present review introduces the emerging machine learning (ML) technology in cell culture engineering by combining high-throughput experimental technologies to develop highly efficient and effective culture media. The commonly used ML algorithms and the successful applications of employing ML in medium optimization are summarized. This review highlights the benefits of ML-assisted medium development and guides the selection of the media optimization method appropriate for various cell culture purposes.
Collapse
Affiliation(s)
- Takamasa Hashizume
- School of Life and Environmental Sciences, University of Tsukuba, 1-1-1 Tennodai, Tsukuba, 305-8572 Ibaraki, Japan
| | - Bei-Wen Ying
- School of Life and Environmental Sciences, University of Tsukuba, 1-1-1 Tennodai, Tsukuba, 305-8572 Ibaraki, Japan.
| |
Collapse
|
39
|
Okay S. Fine-Tuning Gene Expression in Bacteria by Synthetic Promoters. Methods Mol Biol 2024; 2844:179-195. [PMID: 39068340 DOI: 10.1007/978-1-0716-4063-0_12] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/30/2024]
Abstract
Promoters are key genetic elements in the initiation and regulation of gene expression. A limited number of natural promoters has been described for the control of gene expression in synthetic biology applications. Therefore, synthetic promoters have been developed to fine-tune the transcription for the desired amount of gene product. Mostly, synthetic promoters are characterized using promoter libraries that are constructed via mutagenesis of promoter sequences. The strength of promoters in the library is determined according to the expression of a reporter gene such as gfp encoding green fluorescent protein. Gene expression can be controlled using inducers. The majority of the studies on gram-negative bacteria are conducted using the expression system of the model organism Escherichia coli while that of the model organism Bacillus subtilis is mostly used in the studies on gram-positive bacteria. Additionally, synthetic promoters for the cyanobacteria, which are phototrophic microorganisms, are evaluated, especially using the model cyanobacterium Synechocystis sp. PCC 6803. Moreover, a variety of algorithms based on machine learning methods were developed to characterize the features of promoter elements. Some of these in silico models were verified using in vitro or in vivo experiments. Identification of novel synthetic promoters with improved features compared to natural ones contributes much to the synthetic biology approaches in terms of fine-tuning gene expression.
Collapse
Affiliation(s)
- Sezer Okay
- Department of Vaccine Technology, Vaccine Institute, Hacettepe University, Ankara, Türkiye
| |
Collapse
|
40
|
Matsumoto H, Ogura H, Oda J. Analysis of comprehensive biomolecules in critically ill patients via bioinformatics technologies. Acute Med Surg 2024; 11:e944. [PMID: 38596160 PMCID: PMC11002317 DOI: 10.1002/ams2.944] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2023] [Revised: 02/23/2024] [Accepted: 03/10/2024] [Indexed: 04/11/2024] Open
Abstract
Each patient with a critical illness such as sepsis and severe trauma has a different genetic background, comorbidities, age, and sex. Moreover, pathophysiology changes dynamically over time even in the same patient. Therefore, individualized treatment is necessary to account for heterogeneity in patient backgrounds. Recently, the analysis of comprehensive biomolecular information using clinical specimens has revealed novel molecular pathological classifications called subtypes. In addition, comprehensive biomolecular information using clinical specimens has enabled reverse translational research, which is a data-driven approach to the identification of drug target molecules. The development of these methods is expected to visualize the heterogeneity of patient backgrounds and lead to personalized therapy.
Collapse
Affiliation(s)
- Hisatake Matsumoto
- Department of Traumatology and Acute Critical MedicineOsaka University Graduate School of MedicineSuitaOsakaJapan
| | - Hiroshi Ogura
- Department of Traumatology and Acute Critical MedicineOsaka University Graduate School of MedicineSuitaOsakaJapan
| | - Jun Oda
- Department of Traumatology and Acute Critical MedicineOsaka University Graduate School of MedicineSuitaOsakaJapan
| |
Collapse
|
41
|
Bordukova M, Makarov N, Rodriguez-Esteban R, Schmich F, Menden MP. Generative artificial intelligence empowers digital twins in drug discovery and clinical trials. Expert Opin Drug Discov 2024; 19:33-42. [PMID: 37887266 DOI: 10.1080/17460441.2023.2273839] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2023] [Accepted: 10/18/2023] [Indexed: 10/28/2023]
Abstract
INTRODUCTION The concept of Digital Twins (DTs) translated to drug development and clinical trials describes virtual representations of systems of various complexities, ranging from individual cells to entire humans, and enables in silico simulations and experiments. DTs increase the efficiency of drug discovery and development by digitalizing processes associated with high economic, ethical, or social burden. The impact is multifaceted: DT models sharpen disease understanding, support biomarker discovery and accelerate drug development, thus advancing precision medicine. One way to realize DTs is by generative artificial intelligence (AI), a cutting-edge technology that enables the creation of novel, realistic and complex data with desired properties. AREAS COVERED The authors provide a brief introduction to generative AI and describe how it facilitates the modeling of DTs. In addition, they compare existing implementations of generative AI for DTs in drug discovery and clinical trials. Finally, they discuss technical and regulatory challenges that should be addressed before DTs can transform drug discovery and clinical trials. EXPERT OPINION The current state of DTs in drug discovery and clinical trials does not exploit the entire power of generative AI yet and is limited to simulation of a small number of characteristics. Nonetheless, generative AI has the potential to transform the field by leveraging recent developments in deep learning and customizing models for the needs of scientists, physicians and patients.
Collapse
Affiliation(s)
- Maria Bordukova
- Data & Analytics, Pharmaceutical Research and Early Development, Roche Innovation Center Munich (RICM), Penzberg, Germany
- Institute of Computational Biology, Computational Health Center, Helmholtz Munich, Munich, Germany
- Department of Biology, Ludwig-Maximilians University Munich, Munich, Germany
| | - Nikita Makarov
- Data & Analytics, Pharmaceutical Research and Early Development, Roche Innovation Center Munich (RICM), Penzberg, Germany
- Institute of Computational Biology, Computational Health Center, Helmholtz Munich, Munich, Germany
- Department of Biology, Ludwig-Maximilians University Munich, Munich, Germany
| | - Raul Rodriguez-Esteban
- Data & Analytics, Pharmaceutical Research and Early Development, Roche Innovation Center Basel (RICB), Basel, Switzerland
| | - Fabian Schmich
- Data & Analytics, Pharmaceutical Research and Early Development, Roche Innovation Center Munich (RICM), Penzberg, Germany
| | - Michael P Menden
- Institute of Computational Biology, Computational Health Center, Helmholtz Munich, Munich, Germany
- Department of Biology, Ludwig-Maximilians University Munich, Munich, Germany
- Department of Biochemistry and Pharmacology, University of Melbourne, Melbourne, Australia
- German Center for Diabetes Research (DZD e.V.), Munich, Germany
| |
Collapse
|
42
|
Aslam I, Shah S, Jabeen S, ELAffendi M, A Abdel Latif A, Ul Haq N, Ali G. A CNN based m5c RNA methylation predictor. Sci Rep 2023; 13:21885. [PMID: 38081880 PMCID: PMC10713599 DOI: 10.1038/s41598-023-48751-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2023] [Accepted: 11/29/2023] [Indexed: 12/18/2023] Open
Abstract
Post-transcriptional modifications of RNA play a key role in performing a variety of biological processes, such as stability and immune tolerance, RNA splicing, protein translation and RNA degradation. One of these RNA modifications is m5c which participates in various cellular functions like RNA structural stability and translation efficiency, got popularity among biologists. By applying biological experiments to detect RNA m5c methylation sites would require much more efforts, time and money. Most of the researchers are using pre-processed RNA sequences of 41 nucleotides where the methylated cytosine is in the center. Therefore, it is possible that some of the information around these motif may have lost. The conventional methods are unable to process the RNA sequence directly due to high dimensionality and thus need optimized techniques for better features extraction. To handle the above challenges the goal of this study is to employ an end-to-end, 1D CNN based model to classify and interpret m5c methylated data sites. Moreover, our aim is to analyze the sequence in its full length where the methylated cytosine may not be in the center. The evaluation of the proposed architecture showed a promising results by outperforming state-of-the-art techniques in terms of sensitivity and accuracy. Our model achieve 96.70% sensitivity and 96.21% accuracy for 41 nucleotides sequences while 96.10% accuracy for full length sequences.
Collapse
Affiliation(s)
- Irum Aslam
- Department of Computer Science, COMSATS University Islamabad, Abbottabad Campus, Abbottabad, 22060, KPK, Pakistan
| | - Sajid Shah
- EIAS Data Science Lab, College of Computer and Information Sciences, Prince Sultan University, Rafha, Riyadh, 12435, Saudi Arabia
| | - Saima Jabeen
- College of Engineering, AI Research Center, Alfaisal University, Riyadh, 50927, Saudi Arabia.
| | - Mohammed ELAffendi
- EIAS Data Science Lab, College of Computer and Information Sciences, Prince Sultan University, Rafha, Riyadh, 12435, Saudi Arabia
| | - Asmaa A Abdel Latif
- Public Health and Community Medicine Department (Industrial medicine and occupational health specialty, Faculty of Medicine, Menoufia University, Shibîn el Kôm, Egypt
| | - Nuhman Ul Haq
- Department of Computer Science, COMSATS University Islamabad, Abbottabad Campus, Abbottabad, 22060, KPK, Pakistan
| | - Gauhar Ali
- EIAS Data Science Lab, College of Computer and Information Sciences, Prince Sultan University, Rafha, Riyadh, 12435, Saudi Arabia
| |
Collapse
|
43
|
Hennessey M, Barnett T. Method in limbo? Theoretical and empirical considerations in using thematic analysis by veterinary and One Health researchers. Prev Vet Med 2023; 221:106061. [PMID: 37944192 DOI: 10.1016/j.prevetmed.2023.106061] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2023] [Revised: 10/23/2023] [Accepted: 10/24/2023] [Indexed: 11/12/2023]
Abstract
This article spans a number of theoretical, empirical and practice junctures at the intersection of human and animal medicine and the social sciences. We discuss the way thematic analysis, a qualitative method borrowed from the social sciences, is being increasingly used by veterinary and One Health researchers to investigate a range of complex issues. By considering theoretical aspects of thematic analysis, we expand our discussion to question whether this tool, as well as other social science methods, is currently being used appropriately by veterinary and human health researchers. We suggest that additional engagement with social science theory would enrich research practices and improve findings. We argue that considerations of 'big theory' - ontological and epistemological positionings of the researcher - and 'small(er)' theory, the specific social theory in which research is situated, are both necessary. Our point of departure is that scientific discourse is not merely construction or ideology but a unique and continuing arena of debate, in part at least because of the elevation of self-criticism to a central tenet of its practice. We argue for further engagement with the core ideas and concepts outlined above and discuss them in what follows. In particular, and by way of focusing the point, we suggest that for veterinary, One Health, and human medical researchers to use thematic analysis to its maximum potential they should be encouraged to engage with both broader socio-economic theories and with questions of ontology and epistemology.
Collapse
Affiliation(s)
- Mathew Hennessey
- Veterinary Epidemiology, Economics and Public Health Group, Department of Pathobiology and Population Sciences, Royal Veterinary College, UK.
| | - Tony Barnett
- Veterinary Epidemiology, Economics and Public Health Group, Department of Pathobiology and Population Sciences, Royal Veterinary College, UK; Firoz Lalji Institute for Africa, London School of Economics, UK
| |
Collapse
|
44
|
Morozov A, Taratkin M, Bazarkin A, Rivas JG, Puliatti S, Checcucci E, Belenchon IR, Kowalewski KF, Shpikina A, Singla N, Teoh JYC, Kozlov V, Rodler S, Piazza P, Fajkovic H, Yakimov M, Abreu AL, Cacciamani GE, Enikeev D. A systematic review and meta-analysis of artificial intelligence diagnostic accuracy in prostate cancer histology identification and grading. Prostate Cancer Prostatic Dis 2023; 26:681-692. [PMID: 37185992 DOI: 10.1038/s41391-023-00673-3] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2023] [Accepted: 04/17/2023] [Indexed: 05/17/2023]
Abstract
BACKGROUND Artificial intelligence (AI) is a promising tool in pathology, including cancer diagnosis, subtyping, grading, and prognostic prediction. METHODS The aim of the study is to assess AI application in prostate cancer (PCa) histology. We carried out a systematic literature search in 3 databases. Primary outcome was AI accuracy in differentiating between PCa and benign hyperplasia. Secondary outcomes were AI accuracy in determining Gleason grade and agreement among AI and pathologists. RESULTS Our final sample consists of 24 studies conducted from 2007 to 2021. They aggregate data from roughly 8000 cases of prostate biopsy and 458 cases of radical prostatectomy (RP). Sensitivity for PCa diagnostic exceeded 90% and ranged from 87% to 100%, and specificity varied from 68% to 99%. Overall accuracy ranged from 83.7% to 98.3% with AUC reaching 0.99. The meta-analysis using the Mantel-Haenszel method showed pooled sensitivity of 0.96 with I2 = 80.7% and pooled specificity of 0.95 with I2 = 86.1%. Pooled positive likehood ratio was 15.3 with I2 = 87.3% and negative - was 0.04 with I2 = 78.6%. SROC (symmetric receiver operating characteristics) curve represents AUC = 0.99. For grading the accuracy of AI was lower: sensitivity for Gleason grading ranged from 77% to 87%, and specificity from 82% to 90%. CONCLUSIONS The accuracy of AI for PCa identification and grading is comparable to expert pathologists. This is a promising approach which has several possible clinical applications resulting in expedite and optimize pathology reports. AI introduction into common practice may be limited by difficult and time-consuming convolutional neural network training and tuning.
Collapse
Affiliation(s)
- Andrey Morozov
- Institute for Urology and Reproductive Health, Sechenov University, Moscow, Russia
| | - Mark Taratkin
- Institute for Urology and Reproductive Health, Sechenov University, Moscow, Russia
| | - Andrey Bazarkin
- Institute for Clinical Medicine, Sechenov University, Moscow, Russia
| | - Juan Gomez Rivas
- Department of Urology, Clinico San Carlos University Hospital, Madrid, Spain
| | - Stefano Puliatti
- Urology Department, University of Modena and Reggio Emilia, Modena, Italy
| | - Enrico Checcucci
- Department of Surgery, Candiolo Cancer Institute, FPO-IRCCS, Candiolo, Turin, Italy
| | - Ines Rivero Belenchon
- Department of Uro-Nephrology. Virgen del Rocío University Hospital. Seville, "Seville Biomedicine Institute, IBiS/ Virgen del Rocío University Hospital /CSIC/Seville University. Seville", Seville, Spain
| | - Karl-Friedrich Kowalewski
- Department of Urology, University Medical Center Mannheim, Heidelberg University, Heidelberg, Germany
| | - Anastasia Shpikina
- Institute for Urology and Reproductive Health, Sechenov University, Moscow, Russia
| | - Nirmish Singla
- Department of Urology, James Buchanan Brady Urological Institute, Johns Hopkins University School of Medicine, Baltimore, USA
| | - Jeremy Y C Teoh
- Department of Surgery, S.H. Ho Urology Centre, The Chinese University of Hong Kong, Hong Kong, China
| | - Vasiliy Kozlov
- Department of Public Health and Healthcare, Sechenov University, Moscow, Russia
| | - Severin Rodler
- Department of Urology, Ludwig-Maximilian-University Munich, Munich, Germany
| | - Pietro Piazza
- Division of Urology, IRCCS Azienda Ospedaliero-Universitaria di Bologna, Bologna, Italy
| | - Harun Fajkovic
- Karl Landsteiner Institute of Urology and Andrology, Vienna, Austria
- Department of Urology, Medical University of Vienna, Vienna, Austria
| | - Maxim Yakimov
- Pathology department, Rabin Medical Center, Petach Tikwa, Israel
| | - Andre Luis Abreu
- USC Institute of Urology and Catherine & Joseph Aresty Department of Urology, Keck School of Medicine, Los Angeles, CA, USA
- Artificial Intelligence Center at USC Urology, USC Institute of Urology, University of Southern California, Los Angeles, CA, USA
| | - Giovanni E Cacciamani
- USC Institute of Urology and Catherine & Joseph Aresty Department of Urology, Keck School of Medicine, Los Angeles, CA, USA
- Artificial Intelligence Center at USC Urology, USC Institute of Urology, University of Southern California, Los Angeles, CA, USA
| | - Dmitry Enikeev
- Institute for Urology and Reproductive Health, Sechenov University, Moscow, Russia.
- Karl Landsteiner Institute of Urology and Andrology, Vienna, Austria.
- Department of Urology, Medical University of Vienna, Vienna, Austria.
| |
Collapse
|
45
|
Esmaili F, Pourmirzaei M, Ramazi S, Shojaeilangari S, Yavari E. A Review of Machine Learning and Algorithmic Methods for Protein Phosphorylation Site Prediction. GENOMICS, PROTEOMICS & BIOINFORMATICS 2023; 21:1266-1285. [PMID: 37863385 PMCID: PMC11082408 DOI: 10.1016/j.gpb.2023.03.007] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/16/2022] [Revised: 01/16/2023] [Accepted: 03/23/2023] [Indexed: 10/22/2023]
Abstract
Post-translational modifications (PTMs) have key roles in extending the functional diversity of proteins and, as a result, regulating diverse cellular processes in prokaryotic and eukaryotic organisms. Phosphorylation modification is a vital PTM that occurs in most proteins and plays a significant role in many biological processes. Disorders in the phosphorylation process lead to multiple diseases, including neurological disorders and cancers. The purpose of this review is to organize this body of knowledge associated with phosphorylation site (p-site) prediction to facilitate future research in this field. At first, we comprehensively review all related databases and introduce all steps regarding dataset creation, data preprocessing, and method evaluation in p-site prediction. Next, we investigate p-site prediction methods, which are divided into two computational groups: algorithmic and machine learning (ML). Additionally, it is shown that there are basically two main approaches for p-site prediction by ML: conventional and end-to-end deep learning methods, both of which are given an overview. Moreover, this review introduces the most important feature extraction techniques, which have mostly been used in p-site prediction. Finally, we create three test sets from new proteins related to the released version of the database of protein post-translational modifications (dbPTM) in 2022 based on general and human species. Evaluating online p-site prediction tools on newly added proteins introduced in the dbPTM 2022 release, distinct from those in the dbPTM 2019 release, reveals their limitations. In other words, the actual performance of these online p-site prediction tools on unseen proteins is notably lower than the results reported in their respective research papers.
Collapse
Affiliation(s)
- Farzaneh Esmaili
- Department of Information Technology, Tarbiat Modares University, Tehran 14115-111, Iran
| | - Mahdi Pourmirzaei
- Department of Information Technology, Tarbiat Modares University, Tehran 14115-111, Iran
| | - Shahin Ramazi
- Department of Biophysics, Faculty of Biological Sciences, Tarbiat Modares University, Tehran 14115-111, Iran.
| | - Seyedehsamaneh Shojaeilangari
- Biomedical Engineering Group, Department of Electrical Engineering and Information Technology, Iranian Research Organization for Science and Technology (IROST), Tehran 33535-111, Iran
| | - Elham Yavari
- Department of Information Technology, Tarbiat Modares University, Tehran 14115-111, Iran
| |
Collapse
|
46
|
Singh C, Askari A, Caruana R, Gao J. Augmenting interpretable models with large language models during training. Nat Commun 2023; 14:7913. [PMID: 38036543 PMCID: PMC10689442 DOI: 10.1038/s41467-023-43713-1] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2023] [Accepted: 11/17/2023] [Indexed: 12/02/2023] Open
Abstract
Recent large language models (LLMs), such as ChatGPT, have demonstrated remarkable prediction performance for a growing array of tasks. However, their proliferation into high-stakes domains and compute-limited settings has created a burgeoning need for interpretability and efficiency. We address this need by proposing Aug-imodels, a framework for leveraging the knowledge learned by LLMs to build extremely efficient and interpretable prediction models. Aug-imodels use LLMs during fitting but not during inference, allowing complete transparency and often a speed/memory improvement of greater than 1000x for inference compared to LLMs. We explore two instantiations of Aug-imodels in natural-language processing: Aug-Linear, which augments a linear model with decoupled embeddings from an LLM and Aug-Tree, which augments a decision tree with LLM feature expansions. Across a variety of text-classification datasets, both outperform their non-augmented, interpretable counterparts. Aug-Linear can even outperform much larger models, e.g. a 6-billion parameter GPT-J model, despite having 10,000x fewer parameters and being fully transparent. We further explore Aug-imodels in a natural-language fMRI study, where they generate interesting interpretations from scientific data.
Collapse
Affiliation(s)
| | - Armin Askari
- University of California, Berkeley, Berkeley, CA, USA
| | | | | |
Collapse
|
47
|
Motoche-Monar C, Ordoñez JE, Chang O, Gonzales-Zubiate FA. gRNA Design: How Its Evolution Impacted on CRISPR/Cas9 Systems Refinement. Biomolecules 2023; 13:1698. [PMID: 38136570 PMCID: PMC10741458 DOI: 10.3390/biom13121698] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2023] [Revised: 06/05/2023] [Accepted: 06/12/2023] [Indexed: 12/24/2023] Open
Abstract
Over the past decade, genetic engineering has witnessed a revolution with the emergence of a relatively new genetic editing tool based on RNA-guided nucleases: the CRISPR/Cas9 system. Since the first report in 1987 and characterization in 2007 as a bacterial defense mechanism, this system has garnered immense interest and research attention. CRISPR systems provide immunity to bacteria against invading genetic material; however, with specific modifications in sequence and structure, it becomes a precise editing system capable of modifying the genomes of a wide range of organisms. The refinement of these modifications encompasses diverse approaches, including the development of more accurate nucleases, understanding of the cellular context and epigenetic conditions, and the re-designing guide RNAs (gRNAs). Considering the critical importance of the correct performance of CRISPR/Cas9 systems, our scope will emphasize the latter approach. Hence, we present an overview of the past and the most recent guide RNA web-based design tools, highlighting the evolution of their computational architecture and gRNA characteristics over the years. Our study explains computational approaches that use machine learning techniques, neural networks, and gRNA/target interactions data to enable predictions and classifications. This review could open the door to a dynamic community that uses up-to-date algorithms to optimize and create promising gRNAs, suitable for modern CRISPR/Cas9 engineering.
Collapse
Affiliation(s)
- Cristofer Motoche-Monar
- School of Biological Sciences and Engineering, Yachay Tech University, Urcuquí 100119, Ecuador
| | - Julián E. Ordoñez
- School of Biological Sciences and Engineering, Yachay Tech University, Urcuquí 100119, Ecuador
| | - Oscar Chang
- Departamento de Electrónica, Universidad Simon Bolivar, Caracas 1080, Venezuela
- MIND Research Group, Model Intelligent Networks Development, Urcuquí 100119, Ecuador
| | - Fernando A. Gonzales-Zubiate
- School of Biological Sciences and Engineering, Yachay Tech University, Urcuquí 100119, Ecuador
- MIND Research Group, Model Intelligent Networks Development, Urcuquí 100119, Ecuador
| |
Collapse
|
48
|
Sadria M, Layton A, Bader GD. Adversarial training improves model interpretability in single-cell RNA-seq analysis. BIOINFORMATICS ADVANCES 2023; 3:vbad166. [PMID: 38099262 PMCID: PMC10719216 DOI: 10.1093/bioadv/vbad166] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/09/2023] [Revised: 09/28/2023] [Accepted: 11/22/2023] [Indexed: 12/17/2023]
Abstract
Motivation Predictive computational models must be accurate, robust, and interpretable to be considered reliable in important areas such as biology and medicine. A sufficiently robust model should not have its output affected significantly by a slight change in the input. Also, these models should be able to explain how a decision is made to support user trust in the results. Efforts have been made to improve the robustness and interpretability of predictive computational models independently; however, the interaction of robustness and interpretability is poorly understood. Results As an example task, we explore the computational prediction of cell type based on single-cell RNA-seq data and show that it can be made more robust by adversarially training a deep learning model. Surprisingly, we find this also leads to improved model interpretability, as measured by identifying genes important for classification using a range of standard interpretability methods. Our results suggest that adversarial training may be generally useful to improve deep learning robustness and interpretability and that it should be evaluated on a range of tasks. Availability and implementation Our Python implementation of all analysis in this publication can be found at: https://github.com/MehrshadSD/robustness-interpretability. The analysis was conducted using numPy 0.2.5, pandas 2.0.3, scanpy 1.9.3, tensorflow 2.10.0, matplotlib 3.7.1, seaborn 0.12.2, sklearn 1.1.1, shap 0.42.0, lime 0.2.0.1, matplotlib_venn 0.11.9.
Collapse
Affiliation(s)
- Mehrshad Sadria
- Department of Applied Mathematics, University of Waterloo, Waterloo, Ontario N2L 3G1, Canada
| | - Anita Layton
- Department of Applied Mathematics, University of Waterloo, Waterloo, Ontario N2L 3G1, Canada
- Cheriton School of Computer Science, University of Waterloo, Waterloo, Ontario N2L 3G1, Canada
- Department of Biology, University of Waterloo, Waterloo, Ontario N2L 3G1, Canada
- School of Pharmacy, University of Waterloo, Waterloo, Ontario N2G 1C5, Canada
| | - Gary D Bader
- Department of Molecular Genetics, University of Toronto, Toronto, Ontario M5S 1A8, Canada
- The Donnelly Centre, University of Toronto, Toronto, Ontario M5S 3E1, Canada
- Department of Computer Science, University of Toronto, Toronto, Ontario M5S 2E4, Canada
- The Lunenfeld-Tanenbaum Research Institute, Sinai Health System, Toronto, Ontario M5G 1X5, Canada
- Princess Margaret Cancer Centre, University Health Network, Toronto, Ontario M5G 2M9, Canada
| |
Collapse
|
49
|
Yin ZN, Lai FL, Gao F. Unveiling human origins of replication using deep learning: accurate prediction and comprehensive analysis. Brief Bioinform 2023; 25:bbad432. [PMID: 38008420 PMCID: PMC10676776 DOI: 10.1093/bib/bbad432] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2023] [Revised: 10/11/2023] [Accepted: 11/06/2023] [Indexed: 11/28/2023] Open
Abstract
Accurate identification of replication origins (ORIs) is crucial for a comprehensive investigation into the progression of human cell growth and cancer therapy. Here, we proposed a computational approach Ori-FinderH, which can efficiently and precisely predict the human ORIs of various lengths by combining the Z-curve method with deep learning approach. Compared with existing methods, Ori-FinderH exhibits superior performance, achieving an area under the receiver operating characteristic curve (AUC) of 0.9616 for K562 cell line in 10-fold cross-validation. In addition, we also established a cross-cell-line predictive model, which yielded a further improved AUC of 0.9706. The model was subsequently employed as a fitness function to support genetic algorithm for generating artificial ORIs. Sequence analysis through iORI-Euk revealed that a vast majority of the created sequences, specifically 98% or more, incorporate at least one ORI for three cell lines (Hela, MCF7 and K562). This innovative approach could provide more efficient, accurate and comprehensive information for experimental investigation, thereby further advancing the development of this field.
Collapse
Affiliation(s)
- Zhen-Ning Yin
- Department of Physics, School of Science, Tianjin University, Tianjin 300072, China
| | - Fei-Liao Lai
- Department of Physics, School of Science, Tianjin University, Tianjin 300072, China
| | - Feng Gao
- Department of Physics, School of Science, Tianjin University, Tianjin 300072, China
- Frontiers Science Center for Synthetic Biology and Key Laboratory of Systems Bioengineering (Ministry of Education), Tianjin University, Tianjin 300072, China
- SynBio Research Platform, Collaborative Innovation Center of Chemical Science and Engineering (Tianjin), Tianjin 300072, China
| |
Collapse
|
50
|
Cornwell A, Zhang Y, Thondamal M, Johnson DW, Thakar J, Samuelson AV. The C. elegans Myc-family of transcription factors coordinate a dynamic adaptive response to dietary restriction. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.11.22.568222. [PMID: 38045350 PMCID: PMC10690244 DOI: 10.1101/2023.11.22.568222] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/05/2023]
Abstract
Dietary restriction (DR), the process of decreasing overall food consumption over an extended period of time, has been shown to increase longevity across evolutionarily diverse species and delay the onset of age-associated diseases in humans. In Caenorhabditis elegans, the Myc-family transcription factors (TFs) MXL-2 (Mlx) and MML-1 (MondoA/ChREBP), which function as obligate heterodimers, and PHA-4 (orthologous to forkhead box transcription factor A) are both necessary for the full physiological benefits of DR. However, the adaptive transcriptional response to DR and the role of MML-1::MXL-2 and PHA-4 remains elusive. We identified the transcriptional signature of C. elegans DR, using the eat-2 genetic model, and demonstrate broad changes in metabolic gene expression in eat-2 DR animals, which requires both mxl-2 and pha-4. While the requirement for these factors in DR gene expression overlaps, we found many of the DR genes exhibit an opposing change in relative gene expression in eat-2;mxl-2 animals compared to wild-type, which was not observed in eat-2 animals with pha-4 loss. We further show functional deficiencies of the mxl-2 loss in DR outside of lifespan, as eat-2;mxl-2 animals exhibit substantially smaller brood sizes and lay a proportion of dead eggs, indicating that MML-1::MXL-2 has a role in maintaining the balance between resource allocation to the soma and to reproduction under conditions of chronic food scarcity. While eat-2 animals do not show a significantly different metabolic rate compared to wild-type, we also find that loss of mxl-2 in DR does not affect the rate of oxygen consumption in young animals. The gene expression signature of eat-2 mutant animals is consistent with optimization of energy utilization and resource allocation, rather than induction of canonical gene expression changes associated with acute metabolic stress -such as induction of autophagy after TORC1 inhibition. Consistently, eat-2 animals are not substantially resistant to stress, providing further support to the idea that chronic DR may benefit healthspan and lifespan through efficient use of limited resources rather than broad upregulation of stress responses, and also indicates that MML-1::MXL-2 and PHA-4 may have different roles in promotion of benefits in response to different pro-longevity stimuli.
Collapse
Affiliation(s)
- Adam Cornwell
- Department of Biomedical Genetics, University of Rochester Medical Center, 601 Elmwood Avenue, Rochester, NY 14642, USA
| | - Yun Zhang
- Department of Biomedical Genetics, University of Rochester Medical Center, 601 Elmwood Avenue, Rochester, NY 14642, USA
| | - Manjunatha Thondamal
- Department of Biomedical Genetics, University of Rochester Medical Center, 601 Elmwood Avenue, Rochester, NY 14642, USA
- Department of Biological Sciences, GITAM University, Andhra Pradesh, India
| | - David W Johnson
- Department of Biomedical Genetics, University of Rochester Medical Center, 601 Elmwood Avenue, Rochester, NY 14642, USA
- Department of Math and Science, Genesee Community College, One College Rd Batavia, NY 14020, USA
| | - Juilee Thakar
- Department of Biomedical Genetics, University of Rochester Medical Center, 601 Elmwood Avenue, Rochester, NY 14642, USA
- Department of Biostatistics and Computational Biology, University of Rochester Medical Center, 601 Elmwood Avenue, Rochester, NY 14642, USA
- Department of Microbiology and Immunology, University of Rochester Medical Center, 601 Elmwood Avenue, Rochester, NY 14642, USA
| | - Andrew V Samuelson
- Department of Biomedical Genetics, University of Rochester Medical Center, 601 Elmwood Avenue, Rochester, NY 14642, USA
| |
Collapse
|