1
|
Yadav S, Vora DS, Sundar D, Dhanjal JK. TCR-ESM: Employing protein language embeddings to predict TCR-peptide-MHC binding. Comput Struct Biotechnol J 2024; 23:165-173. [PMID: 38146434 PMCID: PMC10749252 DOI: 10.1016/j.csbj.2023.11.037] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2023] [Revised: 11/19/2023] [Accepted: 11/20/2023] [Indexed: 12/27/2023] Open
Abstract
Cognate target identification for T-cell receptors (TCRs) is a significant barrier in T-cell therapy development, which may be overcome by accurately predicting TCR interaction with peptide-bound major histocompatibility complex (pMHC). In this study, we have employed peptide embeddings learned from a large protein language model- Evolutionary Scale Modeling (ESM), to predict TCR-pMHC binding. The TCR-ESM model presented outperforms existing predictors. The complementarity-determining region 3 (CDR3) of the hypervariable TCR is located at the center of the paratope and plays a crucial role in peptide recognition. TCR-ESM trained on paired TCR data with both CDR3α and CDR3β chain information performs significantly better than those trained on data with only CDR3β, suggesting that both TCR chains contribute to specificity, the relative importance however depends on the specific peptide-MHC targeted. The study illuminates the importance of MHC information in TCR-peptide binding which remained inconclusive so far and was thought dependent on the dataset characteristics. TCR-ESM outperforms existing approaches on external datasets, suggesting generalizability. Overall, the potential of deep learning for predicting TCR-pMHC interactions and improving the understanding of factors driving TCR specificity are highlighted. The prediction model is available at http://tcresm.dhanjal-lab.iiitd.edu.in/ as an online tool.
Collapse
Affiliation(s)
- Shashank Yadav
- Department of Biomedical Engineering, University of Arizona, Tucson 85721, AZ, USA
| | - Dhvani Sandip Vora
- Department of Biochemical Engineering and Biotechnology, Indian Institute of Technology Delhi, New Delhi 110016, India
- Department of Computational Biology, Indraprastha Institute of Information Technology, Delhi, New Delhi 110020, India
| | - Durai Sundar
- Department of Biochemical Engineering and Biotechnology, Indian Institute of Technology Delhi, New Delhi 110016, India
| | - Jaspreet Kaur Dhanjal
- Department of Computational Biology, Indraprastha Institute of Information Technology, Delhi, New Delhi 110020, India
| |
Collapse
|
2
|
Li S, Hamdi M, Dutta K, Fraum TJ, Luo J, Laforest R, Shoghi KI. FAST (fast analytical simulator of tracer)-PET: an accurate and efficient PET analytical simulation tool. Phys Med Biol 2024; 69:165020. [PMID: 39047765 DOI: 10.1088/1361-6560/ad6743] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2024] [Accepted: 07/23/2024] [Indexed: 07/27/2024]
Abstract
Objective.Simulation of positron emission tomography (PET) images is an essential tool in the development and validation of quantitative imaging workflows and advanced image processing pipelines. Existing Monte Carlo or analytical PET simulators often compromise on either efficiency or accuracy. We aim to develop and validate fast analytical simulator of tracer (FAST)-PET, a novel analytical framework, to simulate PET images accurately and efficiently.Approach. FAST-PET simulates PET images by performing precise forward projection, scatter, and random estimation that match the scanner geometry and statistics. Although the same process should be applicable to other scanner models, we focus on the Siemens Biograph Vision-600 in this work. Calibration and validation of FAST-PET were performed through comparison with an experimental scan of a National Electrical Manufacturers Association (NEMA) Image Quality (IQ) phantom. Further validation was conducted between FAST-PET and Geant4 Application for Tomographic Emission (GATE) quantitatively in clinical image simulations in terms of intensity-based and texture-based features and task-based tumor segmentation.Main results.According to the NEMA IQ phantom simulation, FAST-PET's simulated images exhibited partial volume effects and noise levels comparable to experimental images, with a relative bias of the recovery coefficient RC within 10% for all spheres and a coefficient of variation for the background region within 6% across various acquisition times. FAST-PET generated clinical PET images exhibit high quantitative accuracy and texture comparable to GATE (correlation coefficients of all features over 0.95) but with ∼100-fold lower computation time. The tumor segmentation masks comparison between both methods exhibited significant overlap and shape similarity with high concordance CCC > 0.97 across measures.Significance.FAST-PET generated PET images with high quantitative accuracy comparable to GATE, making it ideal for applications requiring extensive PET image simulations such as virtual imaging trials, and the development and validation of image processing pipelines.
Collapse
Affiliation(s)
- Suya Li
- Mallinckrodt Institute of Radiology, Washington University School of Medicine, St Louis, MO, United States of America
- Imaging Science Program, McKelvey School of Engineering, Washington University in St Louis, St Louis, MO, United States of America
| | - Mahdjoub Hamdi
- Mallinckrodt Institute of Radiology, Washington University School of Medicine, St Louis, MO, United States of America
| | - Kaushik Dutta
- Mallinckrodt Institute of Radiology, Washington University School of Medicine, St Louis, MO, United States of America
- Imaging Science Program, McKelvey School of Engineering, Washington University in St Louis, St Louis, MO, United States of America
| | - Tyler J Fraum
- Mallinckrodt Institute of Radiology, Washington University School of Medicine, St Louis, MO, United States of America
| | - Jingqin Luo
- Department of Surgery, Washington University School of Medicine, St Louis, MO, United States of America
| | - Richard Laforest
- Mallinckrodt Institute of Radiology, Washington University School of Medicine, St Louis, MO, United States of America
- Imaging Science Program, McKelvey School of Engineering, Washington University in St Louis, St Louis, MO, United States of America
| | - Kooresh I Shoghi
- Mallinckrodt Institute of Radiology, Washington University School of Medicine, St Louis, MO, United States of America
- Imaging Science Program, McKelvey School of Engineering, Washington University in St Louis, St Louis, MO, United States of America
- Department of Biomedical Engineering, Washington University in St Louis, St Louis, MO, United States of America
| |
Collapse
|
3
|
Helander M. "Dead or Alive?" Assessment of the Binary End-of-Event Outcome Indicator for the NEMSIS Public Research Dataset. PREHOSP EMERG CARE 2024:1-15. [PMID: 39106451 DOI: 10.1080/10903127.2024.2389551] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2024] [Revised: 07/23/2024] [Accepted: 07/26/2024] [Indexed: 08/09/2024]
Abstract
OBJECTIVES The National Emergency Medical Services Information Services (NEMSIS) provides a robust set of data to evaluate prehospital care. However, a major limitation is that the vast majority of the records lack a definitive outcome. We aimed to evaluate the performance of a recently proposed method ('MLB' method) to impute missing end-of-EMS-event outcomes ("dead" or "alive") for patient care reports in the NEMSIS public research dataset. METHODS This study reproduced the recently published method for patient outcome imputation in the NEMSIS database and replicated the results for years 2017 through 2022 (n = 686,075). We performed statistical analyses leveraging an array of established performance metrics for binary classification in the machine learning literature. Evaluation metrics included overall accuracy, true positive rate, true negative rate, balanced accuracy, precision, F1 score, Cohen's Kappa coefficient, Matthews' coefficient, Hamming loss, the Jaccard similarity score, and the receiver operating characteristic/area under the curve. RESULTS Extended metrics show consistently good imputation performance from year-to-year but reveal weakness in accurately indicating the minority class: e.g., after adjustments for conflicting labels, "dead" prediction accuracy was 77.7% for 2018 and 61.8% over the six-year NEMSIS sub-sample, even though overall accuracy was 98.8%. Slight over-fitting is also present. CONCLUSIONS We found that the recently published MLB method produced reasonably good "dead" or "alive" indicators. We recommend reporting of True Positive Rate ("dead" prediction accuracy) and True Negative Rate ("alive" prediction accuracy) when applying the imputation method for analyses of NEMSIS data. More attention by EMS clinicians to complete documentation of target NEMSIS elements can further improve the method's performance.
Collapse
Affiliation(s)
- MaryE Helander
- Syracuse University, Syracuse, New York, United States of America: Maxwell School of Citizenship and Public Affairs, Department of Social Science and Falk College, Department of Public Health
| |
Collapse
|
4
|
Zhang Y, Tian Y, Yan A. A SAR and QSAR study on 3CLpro inhibitors of SARS-CoV-2 using machine learning methods. SAR AND QSAR IN ENVIRONMENTAL RESEARCH 2024:1-33. [PMID: 39077983 DOI: 10.1080/1062936x.2024.2375513] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/13/2024] [Accepted: 06/27/2024] [Indexed: 07/31/2024]
Abstract
The 3C-like Proteinase (3CLpro) of novel coronaviruses is intricately linked to viral replication, making it a crucial target for antiviral agents. In this study, we employed two fingerprint descriptors (ECFP_4 and MACCS) to comprehensively characterize 889 compounds in our dataset. We constructed 24 classification models using machine learning algorithms, including Support Vector Machine (SVM), Random Forest (RF), extreme Gradient Boosting (XGBoost), and Deep Neural Networks (DNN). Among these models, the DNN- and ECFP_4-based Model 1D_2 achieved the most promising results, with a remarkable Matthews correlation coefficient (MCC) value of 0.796 in the 5-fold cross-validation and 0.722 on the test set. The application domains of the models were analysed using dSTD-PRO calculations. The collected 889 compounds were clustered by K-means algorithm, and the relationships between structural fragments and inhibitory activities of the highly active compounds were analysed for the 10 obtained subsets. In addition, based on 464 3CLpro inhibitors, 27 QSAR models were constructed using three machine learning algorithms with a minimum root mean square error (RMSE) of 0.509 on the test set. The applicability domains of the models and the structure-activity relationships responded from the descriptors were also analysed.
Collapse
Affiliation(s)
- Y Zhang
- State Key Laboratory of Chemical Resource Engineering, Department of Pharmaceutical Engineering, Beijing University of Chemical Technology, Beijing, P. R. China
| | - Y Tian
- State Key Laboratory of Chemical Resource Engineering, Department of Pharmaceutical Engineering, Beijing University of Chemical Technology, Beijing, P. R. China
| | - A Yan
- State Key Laboratory of Chemical Resource Engineering, Department of Pharmaceutical Engineering, Beijing University of Chemical Technology, Beijing, P. R. China
| |
Collapse
|
5
|
Susanty M, Mursalim MKN, Hertadi R, Purwarianti A, LE Rajab T. Leveraging protein language model embeddings and logistic regression for efficient and accurate in-silico acidophilic proteins classification. Comput Biol Chem 2024; 112:108163. [PMID: 39098138 DOI: 10.1016/j.compbiolchem.2024.108163] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2024] [Revised: 07/02/2024] [Accepted: 07/24/2024] [Indexed: 08/06/2024]
Abstract
The increasing demand for eco-friendly technologies in biotechnology necessitates effective and sustainable catalysts. Acidophilic proteins, functioning optimally in highly acidic environments, hold immense promise for various applications, including food production, biofuels, and bioremediation. However, limited knowledge about these proteins hinders their exploration. This study addresses this gap by employing in silico methods utilizing computational tools and machine learning. We propose a novel approach to predict acidophilic proteins using protein language models (PLMs), accelerating discovery without extensive lab work. Our investigation highlights the potential of PLMs in understanding and harnessing acidophilic proteins for scientific and industrial advancements. We introduce the ACE model, which combines a simple Logistic Regression model with embeddings derived from protein sequences processed by the ProtT5 PLM. This model achieves high performance on an independent test set, with accuracy (0.91), F1-score (0.93), and Matthew's correlation coefficient (0.76). To our knowledge, this is the first application of pre-trained PLM embeddings for acidophilic protein classification. The ACE model serves as a powerful tool for exploring protein acidophilicity, paving the way for future advancements in protein design and engineering.
Collapse
Affiliation(s)
- Meredita Susanty
- Institut Teknologi Bandung School of Electrical Engineering and Informatics, Jl. Ganesa 10, Bandung, Jawa Barat, Indonesia; Universitas Pertamina, School of Computer Science, Jl Teuku Nyak Arief Jakarta Selatan DKI, Jakarta, Indonesia
| | - Muhammad Khaerul Naim Mursalim
- Institut Teknologi Bandung School of Electrical Engineering and Informatics, Jl. Ganesa 10, Bandung, Jawa Barat, Indonesia; Universitas UniversalKompleks Maha Vihara Duta Maitreya Bukit Beruntung, Sei Panas Batam, Kepulauan, Riau 29456, Indonesia
| | - Rukman Hertadi
- Institut Teknologi Bandung Faculty of Math and Natural Sciences, Jl. Ganesa 10, Bandung, Jawa Barat, Indonesia
| | - Ayu Purwarianti
- Institut Teknologi Bandung School of Electrical Engineering and Informatics, Jl. Ganesa 10, Bandung, Jawa Barat, Indonesia; Center for Artificial Intelligence (U-CoE AI-VLB), Institut Teknologi Bandung, Bandung, Indonesia
| | - Tati LE Rajab
- Institut Teknologi Bandung School of Electrical Engineering and Informatics, Jl. Ganesa 10, Bandung, Jawa Barat, Indonesia.
| |
Collapse
|
6
|
Abbasian Ardakani A, Airom O, Khorshidi H, Bureau NJ, Salvi M, Molinari F, Acharya UR. Interpretation of Artificial Intelligence Models in Healthcare: A Pictorial Guide for Clinicians. JOURNAL OF ULTRASOUND IN MEDICINE : OFFICIAL JOURNAL OF THE AMERICAN INSTITUTE OF ULTRASOUND IN MEDICINE 2024. [PMID: 39032010 DOI: 10.1002/jum.16524] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/13/2024] [Revised: 06/19/2024] [Accepted: 07/01/2024] [Indexed: 07/22/2024]
Abstract
Artificial intelligence (AI) models can play a more effective role in managing patients with the explosion of digital health records available in the healthcare industry. Machine-learning (ML) and deep-learning (DL) techniques are two methods used to develop predictive models that serve to improve the clinical processes in the healthcare industry. These models are also implemented in medical imaging machines to empower them with an intelligent decision system to aid physicians in their decisions and increase the efficiency of their routine clinical practices. The physicians who are going to work with these machines need to have an insight into what happens in the background of the implemented models and how they work. More importantly, they need to be able to interpret their predictions, assess their performance, and compare them to find the one with the best performance and fewer errors. This review aims to provide an accessible overview of key evaluation metrics for physicians without AI expertise. In this review, we developed four real-world diagnostic AI models (two ML and two DL models) for breast cancer diagnosis using ultrasound images. Then, 23 of the most commonly used evaluation metrics were reviewed uncomplicatedly for physicians. Finally, all metrics were calculated and used practically to interpret and evaluate the outputs of the models. Accessible explanations and practical applications empower physicians to effectively interpret, evaluate, and optimize AI models to ensure safety and efficacy when integrated into clinical practice.
Collapse
Affiliation(s)
- Ali Abbasian Ardakani
- Department of Radiology Technology, School of Allied Medical Sciences, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| | - Omid Airom
- Department of Mathematics, University of Padova, Padova, Italy
| | - Hamid Khorshidi
- Department of Information Engineering, University of Padova, Padova, Italy
| | - Nathalie J Bureau
- Department of Radiology, Centre Hospitalier de l'Université de Montréal, Montreal, Quebec, Canada
| | - Massimo Salvi
- Biolab, PolitoBIOMedLab, Department of Electronics and Telecommunications, Politecnico di Torino, Turin, Italy
| | - Filippo Molinari
- Biolab, PolitoBIOMedLab, Department of Electronics and Telecommunications, Politecnico di Torino, Turin, Italy
| | - U Rajendra Acharya
- School of Mathematics, Physics and Computing, University of Southern Queensland, Springfield, Queensland, Australia
- Centre for Health Research, University of Southern Queensland, Springfield, Queensland, Australia
| |
Collapse
|
7
|
Kehrein J, Bunker A, Luxenhofer R. POxload: Machine Learning Estimates Drug Loadings of Polymeric Micelles. Mol Pharm 2024; 21:3356-3374. [PMID: 38805643 DOI: 10.1021/acs.molpharmaceut.4c00086] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/30/2024]
Abstract
Block copolymers, composed of poly(2-oxazoline)s and poly(2-oxazine)s, can serve as drug delivery systems; they form micelles that carry poorly water-soluble drugs. Many recent studies have investigated the effects of structural changes of the polymer and the hydrophobic cargo on drug loading. In this work, we combine these data to establish an extended formulation database. Different molecular properties and fingerprints are tested for their applicability to serve as formulation-specific mixture descriptors. A variety of classification and regression models are built for different descriptor subsets and thresholds of loading efficiency and loading capacity, with the best models achieving overall good statistics for both cross- and external validation (balanced accuracies of 0.8). Subsequently, important features are dissected for interpretation, and the DrugBank is screened for potential therapeutic use cases where these polymers could be used to develop novel formulations of hydrophobic drugs. The most promising models are provided as an open-source software tool for other researchers to test the applicability of these delivery systems for potential new drug candidates.
Collapse
Affiliation(s)
- Josef Kehrein
- Soft Matter Chemistry, Department of Chemistry, Faculty of Science, University of Helsinki, A. I. Virtasen aukio 1, 00014 Helsinki, Finland
- Drug Research Program, Division of Pharmaceutical Biosciences Faculty of Pharmacy, University of Helsinki, Viikinkaari 5 E, 00014 Helsinki, Finland
| | - Alex Bunker
- Drug Research Program, Division of Pharmaceutical Biosciences Faculty of Pharmacy, University of Helsinki, Viikinkaari 5 E, 00014 Helsinki, Finland
| | - Robert Luxenhofer
- Soft Matter Chemistry, Department of Chemistry, Faculty of Science, University of Helsinki, A. I. Virtasen aukio 1, 00014 Helsinki, Finland
| |
Collapse
|
8
|
Eken A, Nassehi F, Eroğul O. Diagnostic machine learning applications on clinical populations using functional near infrared spectroscopy: a review. Rev Neurosci 2024; 35:421-449. [PMID: 38308531 DOI: 10.1515/revneuro-2023-0117] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2023] [Accepted: 01/12/2024] [Indexed: 02/04/2024]
Abstract
Functional near-infrared spectroscopy (fNIRS) and its interaction with machine learning (ML) is a popular research topic for the diagnostic classification of clinical disorders due to the lack of robust and objective biomarkers. This review provides an overview of research on psychiatric diseases by using fNIRS and ML. Article search was carried out and 45 studies were evaluated by considering their sample sizes, used features, ML methodology, and reported accuracy. To our best knowledge, this is the first review that reports diagnostic ML applications using fNIRS. We found that there has been an increasing trend to perform ML applications on fNIRS-based biomarker research since 2010. The most studied populations are schizophrenia (n = 12), attention deficit and hyperactivity disorder (n = 7), and autism spectrum disorder (n = 6) are the most studied populations. There is a significant negative correlation between sample size (>21) and accuracy values. Support vector machine (SVM) and deep learning (DL) approaches were the most popular classifier approaches (SVM = 20) (DL = 10). Eight of these studies recruited a number of participants more than 100 for classification. Concentration changes in oxy-hemoglobin (ΔHbO) based features were used more than concentration changes in deoxy-hemoglobin (ΔHb) based ones and the most popular ΔHbO-based features were mean ΔHbO (n = 11) and ΔHbO-based functional connections (n = 11). Using ML on fNIRS data might be a promising approach to reveal specific biomarkers for diagnostic classification.
Collapse
Affiliation(s)
- Aykut Eken
- Department of Biomedical Engineering, Faculty of Engineering, TOBB University of Economics and Technology, Sogutozu, 06510, Ankara, Türkiye
| | - Farhad Nassehi
- Department of Biomedical Engineering, Faculty of Engineering, TOBB University of Economics and Technology, Sogutozu, 06510, Ankara, Türkiye
| | - Osman Eroğul
- Department of Biomedical Engineering, Faculty of Engineering, TOBB University of Economics and Technology, Sogutozu, 06510, Ankara, Türkiye
| |
Collapse
|
9
|
Parvanovova P, Hnilicova P, Kolisek M, Tatarkova Z, Halasova E, Kurca E, Holubcikova S, Koprusakova MT, Baranovicova E. Disturbances in Muscle Energy Metabolism in Patients with Amyotrophic Lateral Sclerosis. Metabolites 2024; 14:356. [PMID: 39057679 PMCID: PMC11278632 DOI: 10.3390/metabo14070356] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2024] [Revised: 06/17/2024] [Accepted: 06/19/2024] [Indexed: 07/28/2024] Open
Abstract
Amyotrophic lateral sclerosis (ALS) is a fatal neuromuscular disease type of motor neuron disorder characterized by degeneration of the upper and lower motor neurons resulting in dysfunction of the somatic muscles of the body. The ALS condition is manifested in progressive skeletal muscle atrophy and spasticity. It leads to death, mostly due to respiratory failure. Within the pathophysiology of the disease, muscle energy metabolism seems to be an important part. In our study, we used blood plasma from 25 patients with ALS diagnosed by definitive El Escorial criteria according to ALSFR-R (Revised Amyotrophic Lateral Sclerosis Functional Rating Scale) criteria and 25 age and sex-matched subjects. Aside from standard clinical biochemical parameters, we used the NMR (nuclear magnetic resonance) metabolomics approach to determine relative plasma levels of metabolites. We observed a decrease in total protein level in blood; however, despite accelerated skeletal muscle catabolism characteristic for ALS patients, we did not detect changes in plasma levels of essential amino acids. When focused on alterations in energy metabolism within muscle, compromised creatine uptake was accompanied by decreased plasma creatinine. We did not observe changes in plasma levels of BCAAs (branched chain amino acids; leucine, isoleucine, valine); however, the observed decrease in plasma levels of all three BCKAs (branched chain alpha-keto acids derived from BCAAs) suggests enhanced utilization of BCKAs as energy substrate. Glutamine, found to be increased in blood plasma in ALS patients, besides serving for ammonia detoxification, could also be considered a potential TCA (tricarboxylic acid) cycle contributor in times of decreased pyruvate utilization. When analyzing the data by using a cross-validated Random Forest algorithm, it finished with an AUC of 0.92, oob error of 8%, and an MCC (Matthew's correlation coefficient) of 0.84 when relative plasma levels of metabolites were used as input variables. Although the discriminatory power of the system used was promising, additional features are needed to create a robust discriminatory model.
Collapse
Affiliation(s)
- Petra Parvanovova
- Department of Medical Biochemistry, Jessenius Faculty of Medicine, Comenius University in Bratislava, Mala Hora 4, 036 01 Martin, Slovakia; (P.P.); (Z.T.); (S.H.)
| | - Petra Hnilicova
- Biomedical Centre Martin, Jessenius Faculty of Medicine, Comenius University in Bratislava, Mala Hora 4, 036 01 Martin, Slovakia; (P.H.); (M.K.); (E.H.)
| | - Martin Kolisek
- Biomedical Centre Martin, Jessenius Faculty of Medicine, Comenius University in Bratislava, Mala Hora 4, 036 01 Martin, Slovakia; (P.H.); (M.K.); (E.H.)
| | - Zuzana Tatarkova
- Department of Medical Biochemistry, Jessenius Faculty of Medicine, Comenius University in Bratislava, Mala Hora 4, 036 01 Martin, Slovakia; (P.P.); (Z.T.); (S.H.)
| | - Erika Halasova
- Biomedical Centre Martin, Jessenius Faculty of Medicine, Comenius University in Bratislava, Mala Hora 4, 036 01 Martin, Slovakia; (P.H.); (M.K.); (E.H.)
| | - Egon Kurca
- Department of Neurology, University Hospital Martin, Jessenius Faculty of Medicine, Comenius University in Bratislava, Kollarova 2, 036 01 Martin, Slovakia;
| | - Simona Holubcikova
- Department of Medical Biochemistry, Jessenius Faculty of Medicine, Comenius University in Bratislava, Mala Hora 4, 036 01 Martin, Slovakia; (P.P.); (Z.T.); (S.H.)
| | - Monika Turcanova Koprusakova
- Department of Neurology, University Hospital Martin, Jessenius Faculty of Medicine, Comenius University in Bratislava, Kollarova 2, 036 01 Martin, Slovakia;
| | - Eva Baranovicova
- Biomedical Centre Martin, Jessenius Faculty of Medicine, Comenius University in Bratislava, Mala Hora 4, 036 01 Martin, Slovakia; (P.H.); (M.K.); (E.H.)
| |
Collapse
|
10
|
Almotairi S, Badr E, Abdelbaky I, Elhakeem M, Abdul Salam M. Hybrid transformer-CNN model for accurate prediction of peptide hemolytic potential. Sci Rep 2024; 14:14263. [PMID: 38902287 PMCID: PMC11190137 DOI: 10.1038/s41598-024-63446-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2024] [Accepted: 05/29/2024] [Indexed: 06/22/2024] Open
Abstract
Hemolysis is a crucial factor in various biomedical and pharmaceutical contexts, driving our interest in developing advanced computational techniques for precise prediction. Our proposed approach takes advantage of the unique capabilities of convolutional neural networks (CNNs) and transformers to detect complex patterns inherent in the data. The integration of CNN and transformers' attention mechanisms allows for the extraction of relevant information, leading to accurate predictions of hemolytic potential. The proposed method was trained on three distinct data sets of peptide sequences known as recurrent neural network-hemolytic (RNN-Hem), Hlppredfuse, and Combined. Our computational results demonstrated the superior efficacy of our models compared to existing methods. The proposed approach demonstrated impressive Matthews correlation coefficients of 0.5962, 0.9111, and 0.7788 respectively, indicating its effectiveness in predicting hemolytic activity. With its potential to guide experimental efforts in peptide design and drug development, this method holds great promise for practical applications. Integrating CNNs and transformers proves to be a powerful tool in the fields of bioinformatics and therapeutic research, highlighting their potential to drive advancement in this area.
Collapse
Affiliation(s)
- Sultan Almotairi
- Department of Computer Science, Faculty of College of Computer and Information Sciences, Majmaah University, 11952, Majmaah, Saudi Arabia
- Department of Computer Science, Faculty of Computer and Information Systems, Islamic University of Madinah, 42351, Medinah, Saudi Arabia
| | - Elsayed Badr
- Scientific Computing Department, Faculty of Computers and Artificial Intelligence, Benha University, Benha, Egypt.
- The Egyptian School of Data Science (ESDS), Benha, Egypt.
| | - Ibrahim Abdelbaky
- Artificial Intelligence Department, Faculty of Computers and Artificial Intelligence, Benha University, Benha, Egypt
| | - Mohamed Elhakeem
- Artificial Intelligence Department, Faculty of Computers and Artificial Intelligence, Benha University, Benha, Egypt.
| | - Mustafa Abdul Salam
- Artificial Intelligence Department, Faculty of Computers and Artificial Intelligence, Benha University, Benha, Egypt
- Department of Computer Science, College of Arts and Science, Wadi Addawasir, Prince Sattam Bin Abdulaziz University, 16273, Al-Kharj, Saudi Arabia
| |
Collapse
|
11
|
Salvatti BA, Chagas MA, Fernandes PO, Ladeira YFX, Bozzi AS, Valadares VS, Valente AP, de Miranda AS, Rocha WR, Maltarollo VG, Moraes AH. Understanding the Enzyme ( S)-Norcoclaurine Synthase Promiscuity to Aldehydes and Ketones. J Chem Inf Model 2024; 64:4462-4474. [PMID: 38776464 DOI: 10.1021/acs.jcim.3c01773] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/25/2024]
Abstract
The (S)-norcoclaurine synthase from Thalictrum flavum (TfNCS) stereoselectively catalyzes the Pictet-Spengler reaction between dopamine and 4-hydroxyphenylacetaldehyde to give (S)-norcoclaurine. TfNCS can catalyze the Pictet-Spengler reaction with various aldehydes and ketones, leading to diverse tetrahydroisoquinolines. This substrate promiscuity positions TfNCS as a highly promising enzyme for synthesizing fine chemicals. Understanding carbonyl-containing substrates' structural and electronic signatures that influence TfNCS activity can help expand its applications in the synthesis of different compounds and aid in protein optimization strategies. In this study, we investigated the influence of the molecular properties of aldehydes and ketones on their reactivity in the TfNCS-catalyzed Pictet-Spengler reaction. Initially, we compiled a library of reactive and unreactive compounds from previous publications. We also performed enzymatic assays using nuclear magnetic resonance to identify some reactive and unreactive carbonyl compounds, which were then included in the library. Subsequently, we employed QSAR and DFT calculations to establish correlations between substrate-candidate structures and reactivity. Our findings highlight correlations of structural and stereoelectronic features, including the electrophilicity of the carbonyl group, to the reactivity of aldehydes and ketones toward the TfNCS-catalyzed Pictet-Spengler reaction. Interestingly, experimental data of seven compounds out of fifty-three did not correlate with the electrophilicity of the carbonyl group. For these seven compounds, we identified unfavorable interactions between them and the TfNCS. Our results demonstrate the applications of in silico techniques in understanding enzyme promiscuity and specificity, with a particular emphasis on machine learning methodologies, DFT electronic structure calculations, and molecular dynamic (MD) simulations.
Collapse
Affiliation(s)
- Brunno A Salvatti
- Departamento de Química, Instituto de Ciências Exatas, Universidade Federal de Minas Gerais, Belo Horizonte 31270-901, Brazil
| | - Marcelo A Chagas
- Departamento de Ciências Exatas, Universidade do Estado de Minas Gerais, João Monlevade, Minas Gerais 35930-314, Brazil
| | - Phillipe O Fernandes
- Departamento de Produtos Farmacêuticos, Faculdade de Farmácia, Universidade Federal de Minas Gerais, Belo Horizonte 31270-901, Brazil
| | - Yan F X Ladeira
- Departamento de Química, Instituto de Ciências Exatas, Universidade Federal de Minas Gerais, Belo Horizonte 31270-901, Brazil
| | - Aline S Bozzi
- Departamento de Química, Instituto de Ciências Exatas, Universidade Federal de Minas Gerais, Belo Horizonte 31270-901, Brazil
| | - Veronica S Valadares
- Departamento de Bioquímica e Imunologia, Instituto de Ciências Biológicas, Universidade Federal de Minas Gerais, Belo Horizonte 31270-901, Brazil
| | - Ana Paula Valente
- Centro Nacional de Ressonância Magnética Nuclear, Instituto de Bioquímica Médica Leopoldo de Meis, Centro de Ciências da Saúde, Universidade Federal do Rio de Janeiro, Rio de Janeiro 21.941-902, Brazil
| | - Amanda S de Miranda
- Departamento de Química, Instituto de Ciências Exatas, Universidade Federal de Minas Gerais, Belo Horizonte 31270-901, Brazil
| | - Willian R Rocha
- Departamento de Química, Instituto de Ciências Exatas, Universidade Federal de Minas Gerais, Belo Horizonte 31270-901, Brazil
| | - Vinicius G Maltarollo
- Departamento de Produtos Farmacêuticos, Faculdade de Farmácia, Universidade Federal de Minas Gerais, Belo Horizonte 31270-901, Brazil
| | - Adolfo H Moraes
- Departamento de Química, Instituto de Ciências Exatas, Universidade Federal de Minas Gerais, Belo Horizonte 31270-901, Brazil
| |
Collapse
|
12
|
Bian Z, Bao T, Sun X, Wang N, Mu Q, Jiang T, Yu Z, Ding J, Wang T, Zhou Q. Machine Learning Tools to Assist the Synthesis of Antibacterial Carbon Dots. Int J Nanomedicine 2024; 19:5213-5226. [PMID: 38855729 PMCID: PMC11162209 DOI: 10.2147/ijn.s451680] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2023] [Accepted: 05/03/2024] [Indexed: 06/11/2024] Open
Abstract
Introduction The emergence and rapid spread of multidrug-resistant bacteria (MRB) caused by the excessive use of antibiotics and the development of biofilms have been a growing threat to global public health. Nanoparticles as substitutes for antibiotics were proven to possess substantial abilities for tackling MRB infections via new antimicrobial mechanisms. Particularly, carbon dots (CDs) with unique (bio)physicochemical characteristics have been receiving considerable attention in combating MRB by damaging the bacterial wall, binding to DNA or enzymes, inducing hyperthermia locally, or forming reactive oxygen species. Methods Herein, how the physicochemical features of various CDs affect their antimicrobial capacity is investigated with the assistance of machine learning (ML) tools. Results The synthetic conditions and intrinsic properties of CDs from 121 samples are initially gathered to form the raw dataset, with Minimum inhibitory concentration (MIC) being the output. Four classification algorithms (KNN, SVM, RF, and XGBoost) are trained and validated with the input data. It is found that the ensemble learning methods turn out to be the best on our data. Also, ε-poly(L-lysine) CDs (PL-CDs) were developed to validate the practical application ability of the well-trained ML models in a laboratory with two ensemble models managing the prediction. Discussion Thus, our results demonstrate that ML-based high-throughput theoretical calculation could be used to predict and decode the relationship between CD properties and the anti-bacterial effect, accelerating the development of high-performance nanoparticles and potential clinical translation.
Collapse
Affiliation(s)
- Zirui Bian
- Department of Bone, Huangdao District Central Hospital, Qingdao, People’s Republic of China
| | - Tianzhe Bao
- Qingdao Key Laboratory of Materials for Tissue Repair and Rehabilitation, School of Rehabilitation Sciences and Engineering, University of Health and Rehabilitation Sciences, Qingdao, People’s Republic of China
| | - Xuequan Sun
- Weifang Eye Institute, Weifang Eye Hospital, Zhengda Guangming Eye Group, Weifang, People’s Republic of China
- Zhengda Guangming International Eye Research Center, Qingdao Zhengda Guangming Eye Hospital, Qingdao University, Qingdao, People’s Republic of China
| | - Ning Wang
- Department of Bone, Huangdao District Central Hospital, Qingdao, People’s Republic of China
| | - Qian Mu
- Department of Biomaterials, LongScience Biological (Qingdao) Co, LTD, Qingdao, People’s Republic of China
| | - Ting Jiang
- Heart Center, Qingdao Hiser Hospital Affiliated of Qingdao University (Qingdao Traditional Chinese Medicine Hospital), Qingdao University, Qingdao, People’s Republic of China
| | - Zhongxiang Yu
- Heart Center, Qingdao Hiser Hospital Affiliated of Qingdao University (Qingdao Traditional Chinese Medicine Hospital), Qingdao University, Qingdao, People’s Republic of China
| | - Junhang Ding
- Qingdao Key Laboratory of Materials for Tissue Repair and Rehabilitation, School of Rehabilitation Sciences and Engineering, University of Health and Rehabilitation Sciences, Qingdao, People’s Republic of China
| | - Ting Wang
- Department of Orthopaedic Surgery, The Affiliated Hospital of Qingdao University, Qingdao, People’s Republic of China
| | - Qihui Zhou
- Qingdao Key Laboratory of Materials for Tissue Repair and Rehabilitation, School of Rehabilitation Sciences and Engineering, University of Health and Rehabilitation Sciences, Qingdao, People’s Republic of China
| |
Collapse
|
13
|
Niu Y, Li Z, Chen Z, Huang W, Tan J, Tian F, Yang T, Fan Y, Wei J, Mu J. Efficient screening of pharmacological broad-spectrum anti-cancer peptides utilizing advanced bidirectional Encoder representation from Transformers strategy. Heliyon 2024; 10:e30373. [PMID: 38765108 PMCID: PMC11101728 DOI: 10.1016/j.heliyon.2024.e30373] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2023] [Revised: 04/24/2024] [Accepted: 04/24/2024] [Indexed: 05/21/2024] Open
Abstract
In the vanguard of oncological advancement, this investigation delineates the integration of deep learning paradigms to refine the screening process for Anticancer Peptides (ACPs), epitomizing a new frontier in broad-spectrum oncolytic therapeutics renowned for their targeted antitumor efficacy and specificity. Conventional methodologies for ACP identification are marred by prohibitive time and financial exigencies, representing a formidable impediment to the evolution of precision oncology. In response, our research heralds the development of a groundbreaking screening apparatus that marries Natural Language Processing (NLP) with the Pseudo Amino Acid Composition (PseAAC) technique, thereby inaugurating a comprehensive ACP compendium for the extraction of quintessential primary and secondary structural attributes. This innovative methodological approach is augmented by an optimized BERT model, meticulously calibrated for ACP detection, which conspicuously surpasses existing BERT variants and traditional machine learning algorithms in both accuracy and selectivity. Subjected to rigorous validation via five-fold cross-validation and external assessment, our model exhibited exemplary performance, boasting an average Area Under the Curve (AUC) of 0.9726 and an F1 score of 0.9385, with external validation further affirming its prowess (AUC of 0.9848 and F1 of 0.9371). These findings vividly underscore the method's unparalleled efficacy and prospective utility in the precise identification and prognostication of ACPs, significantly ameliorating the financial and temporal burdens traditionally associated with ACP research and development. Ergo, this pioneering screening paradigm promises to catalyze the discovery and clinical application of ACPs, constituting a seminal stride towards the realization of more efficacious and economically viable precision oncology interventions.
Collapse
Affiliation(s)
- Yupeng Niu
- College of Information Engineering, Sichuan Agricultural University, Ya'an 625000, China
- Artificial intelligence laboratory, Sichuan Agricultural University, Ya'an 625000, China
| | - Zhenghao Li
- College of Information Engineering, Sichuan Agricultural University, Ya'an 625000, China
- Artificial intelligence laboratory, Sichuan Agricultural University, Ya'an 625000, China
| | - Ziao Chen
- College of Law, Sichuan Agricultural University, Ya'an 625000, China
- Artificial intelligence laboratory, Sichuan Agricultural University, Ya'an 625000, China
| | - Wenyuan Huang
- College of Information Engineering, Sichuan Agricultural University, Ya'an 625000, China
- Artificial intelligence laboratory, Sichuan Agricultural University, Ya'an 625000, China
| | - Jingxuan Tan
- College of Information Engineering, Sichuan Agricultural University, Ya'an 625000, China
- Artificial intelligence laboratory, Sichuan Agricultural University, Ya'an 625000, China
| | - Fa Tian
- College of Information Engineering, Sichuan Agricultural University, Ya'an 625000, China
| | - Tao Yang
- College of Information Engineering, Sichuan Agricultural University, Ya'an 625000, China
- Artificial intelligence laboratory, Sichuan Agricultural University, Ya'an 625000, China
| | - Yamin Fan
- College of Information Engineering, Sichuan Agricultural University, Ya'an 625000, China
- Artificial intelligence laboratory, Sichuan Agricultural University, Ya'an 625000, China
| | - Jiangshu Wei
- College of Information Engineering, Sichuan Agricultural University, Ya'an 625000, China
| | - Jiong Mu
- College of Information Engineering, Sichuan Agricultural University, Ya'an 625000, China
- Artificial intelligence laboratory, Sichuan Agricultural University, Ya'an 625000, China
| |
Collapse
|
14
|
Adenis L, Mailler S, Menut L, Achim P, Generoso S. Lagrangian and Eulerian modelling of 106Ru atmospheric transport in 2017 over northern hemisphere. JOURNAL OF ENVIRONMENTAL RADIOACTIVITY 2024; 275:107416. [PMID: 38520991 DOI: 10.1016/j.jenvrad.2024.107416] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/29/2023] [Revised: 03/13/2024] [Accepted: 03/13/2024] [Indexed: 03/25/2024]
Abstract
In September 2017, numerous measurement stations recorded large surface concentrations of Ru106 in Europe. This event was well recorded by various monitoring stations worldwide and offer a valuable framework to compare the modelling strategies deployed to quickly evaluate where the plume goes and with what concentrations. In general, the source and its intensity are not known and hypotheses have to be done. Models have to be fast and accurate: Lagrangian and Eulerian are often used but rarely compared. In this study, the FLEXPART Lagrangian model and the WRF-CHIMERE Eulerian models are used to simulate the emissions, transport and deposition of this source of Ru106. First, it is shown that the hypothesis of location, timing and intensity of the source is realistic, by comparison to surface measurements. Second, sensitivity analysis performed with the Eulerian model and several transport scheme showed that this model may provide better results than the Lagrangian one. It opens the door to further development, including chemistry and mixing with other pollutants during these specific events.
Collapse
|
15
|
Susanty M, Naim Mursalim MK, Hertadi R, Purwarianti A, Rajab TLE. Classifying alkaliphilic proteins using embeddings from protein language model. Comput Biol Med 2024; 173:108385. [PMID: 38547659 DOI: 10.1016/j.compbiomed.2024.108385] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2023] [Revised: 03/22/2024] [Accepted: 03/24/2024] [Indexed: 04/17/2024]
Abstract
Alkaliphilic proteins have great potential as biocatalysts in biotechnology, especially for enzyme engineering. Extensive research has focused on exploring the enzymatic potential of alkaliphiles and characterizing alkaliphilic proteins. However, the current method employed for identifying these proteins that requires web lab experiment is time-consuming, labor-intensive, and expensive. Therefore, the development of a computational method for alkaliphilic protein identification would be invaluable for protein engineering and design. In this study, we present a novel approach that uses embeddings from a protein language model called ESM-2(3B) in a deep learning framework to classify alkaliphilic and non-alkaliphilic proteins. To our knowledge, this is the first attempt to employ embeddings from a pre-trained protein language model to classify alkaliphilic protein. A reliable dataset comprising 1,002 alkaliphilic and 1,866 non-alkaliphilic proteins was constructed for training and testing the proposed model. The proposed model, dubbed ALPACA, achieves performance scores of 0.88, 0.84, and 0.75 for accuracy, f1-score, and Matthew correlation coefficient respectively on independent dataset. ALPACA is likely to serve as a valuable resource for exploring protein alkalinity and its role in protein design and engineering.
Collapse
Affiliation(s)
- Meredita Susanty
- Institut Teknologi Bandung School of Electrical Engineering and Informatics, Jl. Ganesa 10, Bandung, Jawa Barat, Indonesia; Universitas Pertamina, School of Computer Science, Jl Teuku Nyak Arief Jakarta Selatan DKI Jakarta, Indonesia
| | - Muhammad Khaerul Naim Mursalim
- Institut Teknologi Bandung School of Electrical Engineering and Informatics, Jl. Ganesa 10, Bandung, Jawa Barat, Indonesia; Universitas Universal, Kompleks Maha Vihara Duta Maitreya Bukit Beruntung, Sei Panas Batam, 29456, Kepulauan Riau, Indonesia
| | - Rukman Hertadi
- Institut Teknologi Bandung Faculty of Math and Natural Sciences, Jl. Ganesa 10, Bandung, Jawa Barat, Indonesia
| | - Ayu Purwarianti
- Institut Teknologi Bandung School of Electrical Engineering and Informatics, Jl. Ganesa 10, Bandung, Jawa Barat, Indonesia; Center for Artificial Intelligence (U-CoE AI-VLB), Institut Teknologi Bandung, Bandung, Indonesia
| | - Tati LE Rajab
- Institut Teknologi Bandung School of Electrical Engineering and Informatics, Jl. Ganesa 10, Bandung, Jawa Barat, Indonesia.
| |
Collapse
|
16
|
Castillo-Mendieta K, Agüero-Chapin G, Marquez E, Perez-Castillo Y, Barigye SJ, Pérez-Cárdenas M, Peréz-Giménez F, Marrero-Ponce Y. Multiquery Similarity Searching Models: An Alternative Approach for Predicting Hemolytic Activity from Peptide Sequence. Chem Res Toxicol 2024; 37:580-589. [PMID: 38501392 DOI: 10.1021/acs.chemrestox.3c00408] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/20/2024]
Abstract
The desirable pharmacological properties and a broad number of therapeutic activities have made peptides promising drugs over small organic molecules and antibody drugs. Nevertheless, toxic effects, such as hemolysis, have hampered the development of such promising drugs. Hence, a reliable computational tool to predict peptide hemolytic toxicity is enormously useful before synthesis and experimental evaluation. Currently, four web servers that predict hemolytic activity using machine learning (ML) algorithms are available; however, they exhibit some limitations, such as the need for a reliable negative set and limited application domain. Hence, we developed a robust model based on a novel theoretical approach that combines network science and a multiquery similarity searching (MQSS) method. A total of 1152 initial models were constructed from 144 scaffolds generated in a previous report. These were evaluated on external data sets, and the best models were fused and improved. Our best MQSS model I1 outperformed all state-of-the-art ML-based models and was used to characterize the prevalence of hemolytic toxicity on therapeutic peptides. Based on our model's estimation, the number of hemolytic peptides might be 3.9-fold higher than the reported.
Collapse
Affiliation(s)
- Kevin Castillo-Mendieta
- School of Biological Sciences and Engineering, Yachay Tech University, Hda. San José s/n y Proyecto Yachay, Urcuquí 100119, Ecuador
| | - Guillermin Agüero-Chapin
- CIIMAR/CIMAR, Interdisciplinary Centre of Marine and Environmental Research, Terminal de Cruzeiros do Porto de Leixões, University of Porto, Av. General Norton de Matos s/n, 4450-208 Porto, Portugal
- Department of Biology, Faculty of Sciences, University of Porto, Rua do Campo Alegre, 4169-007 Porto, Portugal
| | - Edgar Marquez
- Grupo de Investigaciones en Química y Biología, Departamento de Química y Biología, Facultad de Ciencias Básicas, Universidad del Norte, Carrera 51B, Km 5, vía Puerto Colombia, Barranquilla 081007, Colombia
| | - Yunierkis Perez-Castillo
- Bio-Chemoinformatics Research Group and Escuela de Ciencias Físicas y Matemáticas. Universidad de Las Américas, Quito 170504, Ecuador
| | - Stephen J Barigye
- Departamento de Química Física Aplicada, Facultad de Ciencias, Universidad Autónoma de Madrid (UAM), 28049 Madrid, Spain
| | - Mariela Pérez-Cárdenas
- School of Biological Sciences and Engineering, Yachay Tech University, Hda. San José s/n y Proyecto Yachay, Urcuquí 100119, Ecuador
| | - Facundo Peréz-Giménez
- Unidad de Investigación de Diseño de Fármacos y Conectividad Molecular, Departamento de Química Física, Facultad de Farmacia, Universitat de València, Valencia 46100, Spain
| | - Yovani Marrero-Ponce
- Unidad de Investigación de Diseño de Fármacos y Conectividad Molecular, Departamento de Química Física, Facultad de Farmacia, Universitat de València, Valencia 46100, Spain
- Facultad de Ingeniería, Universidad Panamericana, Augusto Rodin No. 498, Insurgentes Mixcoac, Benito Juárez, CDMX, Mexico 03920, Mexico
- Grupo de Medicina Molecular y Traslacional (MeM&T), Colegio de Ciencias de la Salud (COCSA), Escuela de Medicina, Edificio de Especialidades Médicas; and Instituto de Simulación Computacional (ISC-USFQ), Diego de Robles y vía Interoceánica, Universidad San Francisco de Quito (USFQ), Quito, Pichincha 170157, Ecuador
| |
Collapse
|
17
|
Giudice L, Mohamed A, Malm T. StellarPath: Hierarchical-vertical multi-omics classifier synergizes stable markers and interpretable similarity networks for patient profiling. PLoS Comput Biol 2024; 20:e1012022. [PMID: 38607982 PMCID: PMC11042724 DOI: 10.1371/journal.pcbi.1012022] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2023] [Revised: 04/24/2024] [Accepted: 03/25/2024] [Indexed: 04/14/2024] Open
Abstract
The Patient Similarity Network paradigm implies modeling the similarity between patients based on specific data. The similarity can summarize patients' relationships from high-dimensional data, such as biological omics. The end PSN can undergo un/supervised learning tasks while being strongly interpretable, tailored for precision medicine, and ready to be analyzed with graph-theory methods. However, these benefits are not guaranteed and depend on the granularity of the summarized data, the clarity of the similarity measure, the complexity of the network's topology, and the implemented methods for analysis. To date, no patient classifier fully leverages the paradigm's inherent benefits. PSNs remain complex, unexploited, and meaningless. We present StellarPath, a hierarchical-vertical patient classifier that leverages pathway analysis and patient similarity concepts to find meaningful features for both classes and individuals. StellarPath processes omics data, hierarchically integrates them into pathways, and uses a novel similarity to measure how patients' pathway activity is alike. It selects biologically relevant molecules, pathways, and networks, considering molecule stability and topology. A graph convolutional neural network then predicts unknown patients based on known cases. StellarPath excels in classification performances and computational resources across sixteen datasets. It demonstrates proficiency in inferring the class of new patients described in external independent studies, following its initial training and testing phases on a local dataset. It advances the PSN paradigm and provides new markers, insights, and tools for in-depth patient profiling.
Collapse
Affiliation(s)
- Luca Giudice
- A.I. Virtanen Institute for Molecular Sciences, University of Eastern Finland, Kuopio, Finland
| | - Ahmed Mohamed
- A.I. Virtanen Institute for Molecular Sciences, University of Eastern Finland, Kuopio, Finland
| | - Tarja Malm
- A.I. Virtanen Institute for Molecular Sciences, University of Eastern Finland, Kuopio, Finland
| |
Collapse
|
18
|
Charest N, Lowe CN, Ramsland C, Meyer B, Samano V, Williams AJ. Improving predictions of compound amenability for liquid chromatography-mass spectrometry to enhance non-targeted analysis. Anal Bioanal Chem 2024; 416:2565-2579. [PMID: 38530399 PMCID: PMC11228616 DOI: 10.1007/s00216-024-05229-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2023] [Revised: 02/14/2024] [Accepted: 02/16/2024] [Indexed: 03/28/2024]
Abstract
Mass-spectrometry-based non-targeted analysis (NTA), in which mass spectrometric signals are assigned chemical identities based on a systematic collation of evidence, is a growing area of interest for toxicological risk assessment. Successful NTA results in better identification of potentially hazardous pollutants within the environment, facilitating the development of targeted analytical strategies to best characterize risks to human and ecological health. A supporting component of the NTA process involves assessing whether suspected chemicals are amenable to the mass spectrometric method, which is necessary in order to assign an observed signal to the chemical structure. Prior work from this group involved the development of a random forest model for predicting the amenability of 5517 unique chemical structures to liquid chromatography-mass spectrometry (LC-MS). This work improves the interpretability of the group's prior model of the same endpoint, as well as integrating 1348 more data points across negative and positive ionization modes. We enhance interpretability by feature engineering, a machine learning practice that reduces the input dimensionality while attempting to preserve performance statistics. We emphasize the importance of interpretable machine learning models within the context of building confidence in NTA identification. The novel data were curated by the labeling of compounds as amenable or unamenable by expert curators, resulting in an enhanced set of chemical compounds to expand the applicability domain of the prior model. The balanced accuracy benchmark of the newly developed model is comparable to performance previously reported (mean CV BA is 0.84 vs. 0.82 in positive mode, and 0.85 vs. 0.82 in negative mode), while on a novel external set, derived from this work's data, the Matthews correlation coefficients (MCC) for the novel models are 0.66 and 0.68 for positive and negative mode, respectively. Our group's prior published models scored MCC of 0.55 and 0.54 on the same external sets. This demonstrates appreciable improvement over the chemical space captured by the expanded dataset. This work forms part of our ongoing efforts to develop models with higher interpretability and higher performance to support NTA efforts.
Collapse
Affiliation(s)
- Nathaniel Charest
- Center for Computational Toxicology and Exposure, Office of Research and Development, U.S. Environmental Protection Agency, Research Triangle Park, North Carolina, 27711, USA.
| | - Charles N Lowe
- Center for Computational Toxicology and Exposure, Office of Research and Development, U.S. Environmental Protection Agency, Research Triangle Park, North Carolina, 27711, USA
| | | | - Brian Meyer
- Senior Environmental Employment Program, U.S. Environmental Protection Agency, Research Triangle Park, North Carolina, 27711, USA
| | - Vicente Samano
- Senior Environmental Employment Program, U.S. Environmental Protection Agency, Research Triangle Park, North Carolina, 27711, USA
| | - Antony J Williams
- Center for Computational Toxicology and Exposure, Office of Research and Development, U.S. Environmental Protection Agency, Research Triangle Park, North Carolina, 27711, USA
| |
Collapse
|
19
|
Reiter T, Schoedel R. Never miss a beep: Using mobile sensing to investigate (non-)compliance in experience sampling studies. Behav Res Methods 2024; 56:4038-4060. [PMID: 37932624 PMCID: PMC11133120 DOI: 10.3758/s13428-023-02252-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/16/2023] [Indexed: 11/08/2023]
Abstract
Given the increasing number of studies in various disciplines using experience sampling methods, it is important to examine compliance biases because related patterns of missing data could affect the validity of research findings. In the present study, a sample of 592 participants and more than 25,000 observations were used to examine whether participants responded to each specific questionnaire within an experience sampling framework. More than 400 variables from the three categories of person, behavior, and context, collected multi-methodologically via traditional surveys, experience sampling, and mobile sensing, served as predictors. When comparing different linear (logistic and elastic net regression) and non-linear (random forest) machine learning models, we found indication for compliance bias: response behavior was successfully predicted. Follow-up analyses revealed that study-related past behavior, such as previous average experience sampling questionnaire response rate, was most informative for predicting compliance, followed by physical context variables, such as being at home or at work. Based on our findings, we discuss implications for the design of experience sampling studies in applied research and future directions in methodological research addressing experience sampling methodology and missing data.
Collapse
Affiliation(s)
- Thomas Reiter
- Department of Psychology, Ludwig-Maximilians-Universität München, Leopoldstraße 13, 80802, Munich, Germany.
| | - Ramona Schoedel
- Department of Psychology, Ludwig-Maximilians-Universität München, Leopoldstraße 13, 80802, Munich, Germany
| |
Collapse
|
20
|
Winter NR, Blanke J, Leenings R, Ernsting J, Fisch L, Sarink K, Barkhau C, Emden D, Thiel K, Flinkenflügel K, Winter A, Goltermann J, Meinert S, Dohm K, Repple J, Gruber M, Leehr EJ, Opel N, Grotegerd D, Redlich R, Nitsch R, Bauer J, Heindel W, Gross J, Risse B, Andlauer TFM, Forstner AJ, Nöthen MM, Rietschel M, Hofmann SG, Pfarr JK, Teutenberg L, Usemann P, Thomas-Odenthal F, Wroblewski A, Brosch K, Stein F, Jansen A, Jamalabadi H, Alexander N, Straube B, Nenadić I, Kircher T, Dannlowski U, Hahn T. A Systematic Evaluation of Machine Learning-Based Biomarkers for Major Depressive Disorder. JAMA Psychiatry 2024; 81:386-395. [PMID: 38198165 PMCID: PMC10782379 DOI: 10.1001/jamapsychiatry.2023.5083] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/27/2023] [Accepted: 11/05/2023] [Indexed: 01/11/2024]
Abstract
Importance Biological psychiatry aims to understand mental disorders in terms of altered neurobiological pathways. However, for one of the most prevalent and disabling mental disorders, major depressive disorder (MDD), no informative biomarkers have been identified. Objective To evaluate whether machine learning (ML) can identify a multivariate biomarker for MDD. Design, Setting, and Participants This study used data from the Marburg-Münster Affective Disorders Cohort Study, a case-control clinical neuroimaging study. Patients with acute or lifetime MDD and healthy controls aged 18 to 65 years were recruited from primary care and the general population in Münster and Marburg, Germany, from September 11, 2014, to September 26, 2018. The Münster Neuroimaging Cohort (MNC) was used as an independent partial replication sample. Data were analyzed from April 2022 to June 2023. Exposure Patients with MDD and healthy controls. Main Outcome and Measure Diagnostic classification accuracy was quantified on an individual level using an extensive ML-based multivariate approach across a comprehensive range of neuroimaging modalities, including structural and functional magnetic resonance imaging and diffusion tensor imaging as well as a polygenic risk score for depression. Results Of 1801 included participants, 1162 (64.5%) were female, and the mean (SD) age was 36.1 (13.1) years. There were a total of 856 patients with MDD (47.5%) and 945 healthy controls (52.5%). The MNC replication sample included 1198 individuals (362 with MDD [30.1%] and 836 healthy controls [69.9%]). Training and testing a total of 4 million ML models, mean (SD) accuracies for diagnostic classification ranged between 48.1% (3.6%) and 62.0% (4.8%). Integrating neuroimaging modalities and stratifying individuals based on age, sex, treatment, or remission status does not enhance model performance. Findings were replicated within study sites and also observed in structural magnetic resonance imaging within MNC. Under simulated conditions of perfect reliability, performance did not significantly improve. Analyzing model errors suggests that symptom severity could be a potential focus for identifying MDD subgroups. Conclusion and Relevance Despite the improved predictive capability of multivariate compared with univariate neuroimaging markers, no informative individual-level MDD biomarker-even under extensive ML optimization in a large sample of diagnosed patients-could be identified.
Collapse
Affiliation(s)
- Nils R. Winter
- Institute for Translational Psychiatry, University of Münster, Münster, Germany
- Otto Creutzfeldt Center for Cognitive and Behavioral Neuroscience, University of Münster, Münster, Germany
| | - Julian Blanke
- Institute for Translational Psychiatry, University of Münster, Münster, Germany
| | - Ramona Leenings
- Institute for Translational Psychiatry, University of Münster, Münster, Germany
- Faculty of Mathematics and Computer Science, University of Münster, Münster, Germany
| | - Jan Ernsting
- Institute for Translational Psychiatry, University of Münster, Münster, Germany
- Faculty of Mathematics and Computer Science, University of Münster, Münster, Germany
- Institute for Geoinformatics, University of Münster, Münster, Germany
| | - Lukas Fisch
- Institute for Translational Psychiatry, University of Münster, Münster, Germany
| | - Kelvin Sarink
- Institute for Translational Psychiatry, University of Münster, Münster, Germany
| | - Carlotta Barkhau
- Institute for Translational Psychiatry, University of Münster, Münster, Germany
| | - Daniel Emden
- Institute for Translational Psychiatry, University of Münster, Münster, Germany
| | - Katharina Thiel
- Institute for Translational Psychiatry, University of Münster, Münster, Germany
| | - Kira Flinkenflügel
- Institute for Translational Psychiatry, University of Münster, Münster, Germany
| | - Alexandra Winter
- Institute for Translational Psychiatry, University of Münster, Münster, Germany
| | - Janik Goltermann
- Institute for Translational Psychiatry, University of Münster, Münster, Germany
| | - Susanne Meinert
- Institute for Translational Psychiatry, University of Münster, Münster, Germany
- Institute for Translational Neuroscience, University of Münster, Münster, Germany
| | - Katharina Dohm
- Institute for Translational Psychiatry, University of Münster, Münster, Germany
| | - Jonathan Repple
- Institute for Translational Psychiatry, University of Münster, Münster, Germany
- Department of Psychiatry, Psychosomatic Medicine and Psychotherapy, University Hospital Frankfurt, Goethe University, Frankfurt am Main, Germany
| | - Marius Gruber
- Institute for Translational Psychiatry, University of Münster, Münster, Germany
- Department of Psychiatry, Psychosomatic Medicine and Psychotherapy, University Hospital Frankfurt, Goethe University, Frankfurt am Main, Germany
| | - Elisabeth J. Leehr
- Institute for Translational Psychiatry, University of Münster, Münster, Germany
| | - Nils Opel
- Institute for Translational Psychiatry, University of Münster, Münster, Germany
- Department of Psychiatry and Psychotherapy, University Hospital Jena, Jena, Germany
- Center for Intervention and Research on Adaptive and Maladaptive Brain Circuits Underlying Mental Health, Jena, Germany
- German Center for Mental Health (DZPG), Jena, Germany
| | - Dominik Grotegerd
- Institute for Translational Psychiatry, University of Münster, Münster, Germany
| | - Ronny Redlich
- Institute for Translational Psychiatry, University of Münster, Münster, Germany
- Center for Intervention and Research on Adaptive and Maladaptive Brain Circuits Underlying Mental Health, Jena, Germany
- Department of Psychology, University of Halle, Halle, Germany
- German Center for Mental Health (DZPG), Halle, Germany
| | - Robert Nitsch
- Otto Creutzfeldt Center for Cognitive and Behavioral Neuroscience, University of Münster, Münster, Germany
- Institute for Translational Neuroscience, University of Münster, Münster, Germany
| | - Jochen Bauer
- Clinic for Radiology, University of Münster, University Hospital Münster, Münster, Germany
| | - Walter Heindel
- Clinic for Radiology, University of Münster, University Hospital Münster, Münster, Germany
| | - Joachim Gross
- Otto Creutzfeldt Center for Cognitive and Behavioral Neuroscience, University of Münster, Münster, Germany
- Institute for Biomagnetism and Biosignalanalysis, University of Münster, Münster, Germany
| | - Benjamin Risse
- Otto Creutzfeldt Center for Cognitive and Behavioral Neuroscience, University of Münster, Münster, Germany
- Faculty of Mathematics and Computer Science, University of Münster, Münster, Germany
- Institute for Geoinformatics, University of Münster, Münster, Germany
| | - Till F. M. Andlauer
- Department of Neurology, Klinikum rechts der Isar, School of Medicine, Technical University of Munich, Munich, Germany
| | - Andreas J. Forstner
- Institute of Human Genetics, University of Bonn, School of Medicine and University Hospital Bonn, Bonn, Germany
- Institute of Neuroscience and Medicine (INM-1), Research Centre Jülich, Jülich, Germany
| | - Markus M. Nöthen
- Institute of Human Genetics, University of Bonn, School of Medicine and University Hospital Bonn, Bonn, Germany
| | - Marcella Rietschel
- Department of Genetic Epidemiology, Central Institute of Mental Health, Faculty of Medicine Mannheim, University of Heidelberg, Mannheim, Germany
| | - Stefan G. Hofmann
- Department of Clinical Psychology, Philipps-University Marburg, Marburg, Germany
| | - Julia-Katharina Pfarr
- Department of Psychiatry and Psychotherapy, Philipps-University Marburg, Marburg, Germany
- Center for Mind, Brain and Behavior (CMBB), Marburg, Germany
| | - Lea Teutenberg
- Department of Psychiatry and Psychotherapy, Philipps-University Marburg, Marburg, Germany
- Center for Mind, Brain and Behavior (CMBB), Marburg, Germany
| | - Paula Usemann
- Department of Psychiatry and Psychotherapy, Philipps-University Marburg, Marburg, Germany
- Center for Mind, Brain and Behavior (CMBB), Marburg, Germany
| | - Florian Thomas-Odenthal
- Department of Psychiatry and Psychotherapy, Philipps-University Marburg, Marburg, Germany
- Center for Mind, Brain and Behavior (CMBB), Marburg, Germany
| | - Adrian Wroblewski
- Department of Psychiatry and Psychotherapy, Philipps-University Marburg, Marburg, Germany
- Center for Mind, Brain and Behavior (CMBB), Marburg, Germany
| | - Katharina Brosch
- Department of Psychiatry and Psychotherapy, Philipps-University Marburg, Marburg, Germany
- Center for Mind, Brain and Behavior (CMBB), Marburg, Germany
| | - Frederike Stein
- Department of Psychiatry and Psychotherapy, Philipps-University Marburg, Marburg, Germany
- Center for Mind, Brain and Behavior (CMBB), Marburg, Germany
| | - Andreas Jansen
- Department of Psychiatry and Psychotherapy, Philipps-University Marburg, Marburg, Germany
- Center for Mind, Brain and Behavior (CMBB), Marburg, Germany
- Core Facility Brain Imaging, Faculty of Medicine, Philipps-University Marburg, Marburg, Germany
| | - Hamidreza Jamalabadi
- Department of Psychiatry and Psychotherapy, Philipps-University Marburg, Marburg, Germany
| | - Nina Alexander
- Department of Psychiatry and Psychotherapy, Philipps-University Marburg, Marburg, Germany
- Center for Mind, Brain and Behavior (CMBB), Marburg, Germany
| | - Benjamin Straube
- Department of Psychiatry and Psychotherapy, Philipps-University Marburg, Marburg, Germany
- Center for Mind, Brain and Behavior (CMBB), Marburg, Germany
| | - Igor Nenadić
- Department of Psychiatry and Psychotherapy, Philipps-University Marburg, Marburg, Germany
- Center for Mind, Brain and Behavior (CMBB), Marburg, Germany
| | - Tilo Kircher
- Department of Psychiatry and Psychotherapy, Philipps-University Marburg, Marburg, Germany
- Center for Mind, Brain and Behavior (CMBB), Marburg, Germany
| | - Udo Dannlowski
- Institute for Translational Psychiatry, University of Münster, Münster, Germany
- Otto Creutzfeldt Center for Cognitive and Behavioral Neuroscience, University of Münster, Münster, Germany
| | - Tim Hahn
- Institute for Translational Psychiatry, University of Münster, Münster, Germany
- Otto Creutzfeldt Center for Cognitive and Behavioral Neuroscience, University of Münster, Münster, Germany
| |
Collapse
|
21
|
Guan J, Yao L, Xie P, Chung CR, Huang Y, Chiang YC, Lee TY. A two-stage computational framework for identifying antiviral peptides and their functional types based on contrastive learning and multi-feature fusion strategy. Brief Bioinform 2024; 25:bbae208. [PMID: 38706321 PMCID: PMC11070730 DOI: 10.1093/bib/bbae208] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2024] [Revised: 03/14/2024] [Accepted: 04/17/2024] [Indexed: 05/07/2024] Open
Abstract
Antiviral peptides (AVPs) have shown potential in inhibiting viral attachment, preventing viral fusion with host cells and disrupting viral replication due to their unique action mechanisms. They have now become a broad-spectrum, promising antiviral therapy. However, identifying effective AVPs is traditionally slow and costly. This study proposed a new two-stage computational framework for AVP identification. The first stage identifies AVPs from a wide range of peptides, and the second stage recognizes AVPs targeting specific families or viruses. This method integrates contrastive learning and multi-feature fusion strategy, focusing on sequence information and peptide characteristics, significantly enhancing predictive ability and interpretability. The evaluation results of the model show excellent performance, with accuracy of 0.9240 and Matthews correlation coefficient (MCC) score of 0.8482 on the non-AVP independent dataset, and accuracy of 0.9934 and MCC score of 0.9869 on the non-AMP independent dataset. Furthermore, our model can predict antiviral activities of AVPs against six key viral families (Coronaviridae, Retroviridae, Herpesviridae, Paramyxoviridae, Orthomyxoviridae, Flaviviridae) and eight viruses (FIV, HCV, HIV, HPIV3, HSV1, INFVA, RSV, SARS-CoV). Finally, to facilitate user accessibility, we built a user-friendly web interface deployed at https://awi.cuhk.edu.cn/∼dbAMP/AVP/.
Collapse
Affiliation(s)
- Jiahui Guan
- School of Medicine, The Chinese University of Hong Kong, Shenzhen, 2001 Longxiang Road, 518172 Shenzhen, China
- Kobilka Institute of Innovative Drug Discovery, School of Medicine, The Chinese University of Hong Kong, 2001 Longxiang Road, 518172 Shenzhen, China
| | - Lantian Yao
- School of Medicine, The Chinese University of Hong Kong, Shenzhen, 2001 Longxiang Road, 518172 Shenzhen, China
- School of Science and Engineering, The Chinese University of Hong Kong, 2001 Longxiang Road, 518172 Shenzhen, China
| | - Peilin Xie
- Kobilka Institute of Innovative Drug Discovery, School of Medicine, The Chinese University of Hong Kong, 2001 Longxiang Road, 518172 Shenzhen, China
| | - Chia-Ru Chung
- Department of Computer Science and Information Engineering, National Central University, 320317 Taoyuan, Taiwan
| | - Yixian Huang
- School of Medicine, The Chinese University of Hong Kong, Shenzhen, 2001 Longxiang Road, 518172 Shenzhen, China
| | - Ying-Chih Chiang
- School of Medicine, The Chinese University of Hong Kong, Shenzhen, 2001 Longxiang Road, 518172 Shenzhen, China
- Kobilka Institute of Innovative Drug Discovery, School of Medicine, The Chinese University of Hong Kong, 2001 Longxiang Road, 518172 Shenzhen, China
| | - Tzong-Yi Lee
- Institute of Bioinformatics and Systems Biology, National Yang Ming Chiao Tung University, 300093 Hsinchu, Taiwan
- Center for Intelligent Drug Systems and Smart Bio-devices (IDS2B), National Yang Ming Chiao Tung University, 300093 Hsinchu, Taiwan
| |
Collapse
|
22
|
Román L, Melis-Arcos F, Pröschle T, Saa PA, Garrido D. Genome-scale metabolic modeling of the human milk oligosaccharide utilization by Bifidobacterium longum subsp. infantis. mSystems 2024; 9:e0071523. [PMID: 38363147 PMCID: PMC10949479 DOI: 10.1128/msystems.00715-23] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2023] [Accepted: 01/10/2024] [Indexed: 02/17/2024] Open
Abstract
Bifidobacterium longum subsp. infantis is a representative and dominant species in the infant gut and is considered a beneficial microbe. This organism displays multiple adaptations to thrive in the infant gut, regarded as a model for human milk oligosaccharides (HMOs) utilization. These carbohydrates are abundant in breast milk and include different molecules based on lactose. They contain fucose, sialic acid, and N-acetylglucosamine. Bifidobacterium metabolism is complex, and a systems view of relevant metabolic pathways and exchange metabolites during HMO consumption is missing. To address this limitation, a refined genome-scale network reconstruction of this bacterium is presented using a previous reconstruction of B. infantis ATCC 15967 as a template. The latter was expanded based on an extensive revision of genome annotations, current literature, and transcriptomic data integration. The metabolic reconstruction (iLR578) accounted for 578 genes, 1,047 reactions, and 924 metabolites. Starting from this reconstruction, we built context-specific genome-scale metabolic models using RNA-seq data from cultures growing in lactose and three HMOs. The models revealed notable differences in HMO metabolism depending on the functional characteristics of the substrates. Particularly, fucosyl-lactose showed a divergent metabolism due to a fucose moiety. High yields of lactate and acetate were predicted under growth rate maximization in all conditions, whereas formate, ethanol, and 1,2-propanediol were substantially lower. Similar results were also obtained under near-optimal growth on each substrate when varying the empirically observed acetate-to-lactate production ratio. Model predictions displayed reasonable agreement between central carbon metabolism fluxes and expression data across all conditions. Flux coupling analysis revealed additional connections between succinate exchange and arginine and sulfate metabolism and a strong coupling between central carbon reactions and adenine metabolism. More importantly, specific networks of coupled reactions under each carbon source were derived and analyzed. Overall, the presented network reconstruction constitutes a valuable platform for probing the metabolism of this prominent infant gut bifidobacteria.IMPORTANCEThis work presents a detailed reconstruction of the metabolism of Bifidobacterium longum subsp. infantis, a prominent member of the infant gut microbiome, providing a systems view of its metabolism of human milk oligosaccharides.
Collapse
Affiliation(s)
- Loreto Román
- Department of Chemical and Bioprocess Engineering, School of Engineering, Pontificia Universidad Católica de Chile, Santiago, Chile
| | - Felipe Melis-Arcos
- Department of Chemical and Bioprocess Engineering, School of Engineering, Pontificia Universidad Católica de Chile, Santiago, Chile
| | - Tomás Pröschle
- Department of Chemical and Bioprocess Engineering, School of Engineering, Pontificia Universidad Católica de Chile, Santiago, Chile
| | - Pedro A. Saa
- Department of Chemical and Bioprocess Engineering, School of Engineering, Pontificia Universidad Católica de Chile, Santiago, Chile
- Institute for Mathematical and Computational Engineering, Pontificia Universidad Católica de Chile, Vicuña Mackenna, Santiago, Chile
| | - Daniel Garrido
- Department of Chemical and Bioprocess Engineering, School of Engineering, Pontificia Universidad Católica de Chile, Santiago, Chile
| |
Collapse
|
23
|
Woods AI, Primrose DM, Paiva J, Blanco AN, Alberto MF, Sánchez-Luceros A. Clinical relevance of genetic variants in the von Willebrand factor according to in-silico methods. Am J Med Genet A 2024; 194:e63430. [PMID: 37872709 DOI: 10.1002/ajmg.a.63430] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2023] [Revised: 09/03/2023] [Accepted: 09/22/2023] [Indexed: 10/25/2023]
Abstract
Clinical interpretation of genetic variants in the context of the patient's phenotype is a time-consuming and costly process. In-silico analysis using in-silico prediction tools, and molecular modeling have been developed to predict the influence of genetic variants on the quality and/or quantity of the resulting translated protein, and in this way, to alert clinicians of disease likelihood in the absence of previous evidence. Our objectives were to evaluate the success rate of the in-silico analysis in predicting the disease-causing variants as pathogenic and the single-nucleotide variants as neutral, and to establish the reliability of in-silico analysis for determining pathogenicity or neutrality of von Willebrand factor gene-associated genetic variants. Using in-silico analysis, we studied pathogenicity in 31 disease-causing variants, and neutrality in 61 single-nucleotide variants from patients previously diagnosed as type 2 von Willebrand disease. Disease-causing variants and non-synonymous single-nucleotide variants were explored by in-silico tools that analyze the amino acidic sequence. Intronic and synonymous single-nucleotide variants were analyzed by in-silico methods that evaluate the nucleotidic sequence. We found a consistent agreement between predictions achieved by in-silico prediction tools and molecular modeling, both for defining the pathogenicity of disease-causing variants and the neutrality of single-nucleotide variants. Based on our results, the in-silico analysis would help to define the pathogenicity or neutrality in novel genetic variants observed in patients with clinical and laboratory phenotypes suggestive of von Willebrand disease.
Collapse
Affiliation(s)
- Adriana Inés Woods
- Laboratorio de Hemostasia y Trombosis, IMEX-CONICET-Academia Nacional de Medicina de Buenos Aires, CABA, Argentina
| | - Débora Marina Primrose
- Escuela Superior de Ingeniería, Informática y Ciencias Agroalimentarias, Universidad de Morón, Buenos Aires, Argentina
| | - Juvenal Paiva
- Departamento de Hemostasia y Trombosis, Instituto de Investigaciones Hematológicas, Academia Nacional de Medicina de Buenos Aires, CABA, Argentina
| | - Alicia Noemí Blanco
- Departamento de Hemostasia y Trombosis, Instituto de Investigaciones Hematológicas, Academia Nacional de Medicina de Buenos Aires, CABA, Argentina
| | - María Fabiana Alberto
- Departamento de Hemostasia y Trombosis, Instituto de Investigaciones Hematológicas, Academia Nacional de Medicina de Buenos Aires, CABA, Argentina
| | - Analía Sánchez-Luceros
- Laboratorio de Hemostasia y Trombosis, IMEX-CONICET-Academia Nacional de Medicina de Buenos Aires, CABA, Argentina
- Departamento de Hemostasia y Trombosis, Instituto de Investigaciones Hematológicas, Academia Nacional de Medicina de Buenos Aires, CABA, Argentina
| |
Collapse
|
24
|
Khaleel HA, Alhilfi RA, Rawaf S, Tabche C. Identify future epidemic threshold and intensity for influenza-like illness in Iraq by using the moving epidemic method. IJID REGIONS 2024; 10:126-131. [PMID: 38260712 PMCID: PMC10801321 DOI: 10.1016/j.ijregi.2023.12.005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/30/2023] [Revised: 12/16/2023] [Accepted: 12/18/2023] [Indexed: 01/24/2024]
Abstract
Objectives Influenza-like illness (ILI) entered the Iraq surveillance system in 2021. The alert threshold was determined using the cumulative sum 2 method, which did not provide other characteristics. This study uses the moving epidemic method (MEM) to describe duration and estimate alert thresholds for ILI in Iraq for 2023-2024. Methods MEM default package was used to estimate influenza 2023-2024 epidemic thresholds. Analysis was repeated using optimum parameter of epidemic timing for fixed criteria method, which is 3.3. Arithmetic means and 95% confidence interval upper limit were used to estimate threshold. Geometric mean and 40%, 90%, and 97.3% confidence interval upper limits were used to estimate intensity levels. Aggregated Centers for Disease Control and Prevention surveillance data were used to detect epidemic thresholds, length, sensitivity, and predictive values. Results ILI activity starts at week 30 and lasts 7 weeks. Optimized epidemic threshold is 4513 cases, lower than default (4540 cases). Optimized medium-intensity level was higher than default, and high and very high-intensity levels were lower. Conclusions MEM is essential to determine an influenza epidemic's threshold and intensity levels. Despite requiring 3-5 years of data, using it on data for 2.5 years has resulted in an epidemic threshold slightly higher than the threshold calculated using the cumulative sum 2 method.
Collapse
Affiliation(s)
| | | | - Salman Rawaf
- WHO Collaborating Centre, Department of Primary Care and Public Health, Imperial College London, UK
| | - Celine Tabche
- WHO Collaborating Centre, Department of Primary Care and Public Health, Imperial College London, UK
| |
Collapse
|
25
|
Ban D, Housley SN, Matyunina LV, McDonald LD, Bae-Jump VL, Benigno BB, Skolnick J, McDonald JF. A personalized probabilistic approach to ovarian cancer diagnostics. Gynecol Oncol 2024; 182:168-175. [PMID: 38266403 PMCID: PMC10960662 DOI: 10.1016/j.ygyno.2023.12.030] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2023] [Revised: 12/18/2023] [Accepted: 12/29/2023] [Indexed: 01/26/2024]
Abstract
OBJECTIVE The identification/development of a machine learning-based classifier that utilizes metabolic profiles of serum samples to accurately identify individuals with ovarian cancer. METHODS Serum samples collected from 431 ovarian cancer patients and 133 normal women at four geographic locations were analyzed by mass spectrometry. Reliable metabolites were identified using recursive feature elimination coupled with repeated cross-validation and used to develop a consensus classifier able to distinguish cancer from non-cancer. The probabilities assigned to individuals by the model were used to create a clinical tool that assigns a likelihood that an individual patient sample is cancer or normal. RESULTS Our consensus classification model is able to distinguish cancer from control samples with 93% accuracy. The frequency distribution of individual patient scores was used to develop a clinical tool that assigns a likelihood that an individual patient does or does not have cancer. CONCLUSIONS An integrative approach using metabolomic profiles and machine learning-based classifiers has been employed to develop a clinical tool that assigns a probability that an individual patient does or does not have ovarian cancer. This personalized/probabilistic approach to cancer diagnostics is more clinically informative and accurate than traditional binary (yes/no) tests and represents a promising new direction in the early detection of ovarian cancer.
Collapse
Affiliation(s)
- Dongjo Ban
- Integrated Cancer Research Center, School of Biological Sciences, Georgia Institute of Technology, 315 Ferst Drive, Atlanta, GA 30332, USA
| | - Stephen N Housley
- Integrated Cancer Research Center, School of Biological Sciences, Georgia Institute of Technology, 315 Ferst Drive, Atlanta, GA 30332, USA
| | - Lilya V Matyunina
- Integrated Cancer Research Center, School of Biological Sciences, Georgia Institute of Technology, 315 Ferst Drive, Atlanta, GA 30332, USA
| | - L DeEtte McDonald
- Integrated Cancer Research Center, School of Biological Sciences, Georgia Institute of Technology, 315 Ferst Drive, Atlanta, GA 30332, USA
| | - Victoria L Bae-Jump
- Department of Obstetrics and Gynecology, University of North Carolina, 3009 Old Clinic Building, Chapel Hill, NC 27599, USA
| | - Benedict B Benigno
- Ovarian Cancer Institute, 1266 W. Paces Ferry Rd NW #339, Atlanta, GA 30327, USA
| | - Jeffrey Skolnick
- Integrated Cancer Research Center, School of Biological Sciences, Georgia Institute of Technology, 315 Ferst Drive, Atlanta, GA 30332, USA; Ovarian Cancer Institute, 1266 W. Paces Ferry Rd NW #339, Atlanta, GA 30327, USA; Center for the Study of Systems Biology, School of Biological Sciences, Georgia Institute of Technology, 315 Ferst Drive, Atlanta, GA 30332, USA
| | - John F McDonald
- Integrated Cancer Research Center, School of Biological Sciences, Georgia Institute of Technology, 315 Ferst Drive, Atlanta, GA 30332, USA.
| |
Collapse
|
26
|
Mollura M, Chicco D, Paglialonga A, Barbieri R. Identifying prognostic factors for survival in intensive care unit patients with SIRS or sepsis by machine learning analysis on electronic health records. PLOS DIGITAL HEALTH 2024; 3:e0000459. [PMID: 38489347 PMCID: PMC10942078 DOI: 10.1371/journal.pdig.0000459] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/09/2023] [Accepted: 02/05/2024] [Indexed: 03/17/2024]
Abstract
BACKGROUND Systemic inflammatory response syndrome (SIRS) and sepsis are the most common causes of in-hospital death. However, the characteristics associated with the improvement in the patient conditions during the ICU stay were not fully elucidated for each population as well as the possible differences between the two. GOAL The aim of this study is to highlight the differences between the prognostic clinical features for the survival of patients diagnosed with SIRS and those of patients diagnosed with sepsis by using a multi-variable predictive modeling approach with a reduced set of easily available measurements collected at the admission to the intensive care unit (ICU). METHODS Data were collected from 1,257 patients (816 non-sepsis SIRS and 441 sepsis) admitted to the ICU. We compared the performance of five machine learning models in predicting patient survival. Matthews correlation coefficient (MCC) was used to evaluate model performances and feature importance, and by applying Monte Carlo stratified Cross-Validation. RESULTS Extreme Gradient Boosting (MCC = 0.489) and Logistic Regression (MCC = 0.533) achieved the highest results for SIRS and sepsis cohorts, respectively. In order of importance, APACHE II, mean platelet volume (MPV), eosinophil counts (EoC), and C-reactive protein (CRP) showed higher importance for predicting sepsis patient survival, whereas, SOFA, APACHE II, platelet counts (PLTC), and CRP obtained higher importance in the SIRS cohort. CONCLUSION By using complete blood count parameters as predictors of ICU patient survival, machine learning models can accurately predict the survival of SIRS and sepsis ICU patients. Interestingly, feature importance highlights the role of CRP and APACHE II in both SIRS and sepsis populations. In addition, MPV and EoC are shown to be important features for the sepsis population only, whereas SOFA and PLTC have higher importance for SIRS patients.
Collapse
Affiliation(s)
- Maximiliano Mollura
- Dipartimento di Elettronica Informazione e Bioingegneria, Politecnico di Milano, Milan, Italy
| | - Davide Chicco
- Institute of Health Policy Management and Evaluation, University of Toronto, Toronto, Ontario, Canada
- Dipartimento di Informatica Sistemistica e Comunicazione, Università di Milano-Bicocca, Milan, Italy
| | - Alessia Paglialonga
- CNR-Istituto di Elettronica e di Ingegneria dell’Informazione e delle Telecomunicazioni (CNR-IEIIT), Milan, Italy
| | - Riccardo Barbieri
- Dipartimento di Elettronica Informazione e Bioingegneria, Politecnico di Milano, Milan, Italy
| |
Collapse
|
27
|
Zacometti C, Sammarco G, Massaro A, Lefevre S, Frégière-Salomon A, Lafeuille JL, Candalino IF, Piro R, Tata A, Suman M. Authenticity assessment of ground black pepper by combining headspace gas-chromatography ion mobility spectrometry and machine learning. Food Res Int 2024; 179:114023. [PMID: 38342542 DOI: 10.1016/j.foodres.2024.114023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2023] [Revised: 01/08/2024] [Accepted: 01/12/2024] [Indexed: 02/13/2024]
Abstract
Currently, the authentication of ground black pepper is a major concern, creating a need for a rapid, highly sensitive and specific detection tool to prevent the introduction of adulterated batches into the food chain. To this aim, head space gas-chromatography ion mobility spectrometry (HS-GC-IMS), combined with machine learning, is tested in this initial, proof-of-concept study. A broad variety of authentic samples originating from eight countries and three continents were collected and spiked with a range of adulterants, both endogenous sub-products and an assortment of exogenous materials. The method is characterized by no sample preparation and requires 20 min for chromatographic separation and ion mobility data acquisition. After an explorative analysis of the data, those were submitted to two different machine learning algorithms (partial least squared discriminant analysis-PLS-DA and support vector machine-SVM). While the PLS-DA model did not provide fully satisfactory performances, the combination of HS-GC-IMS and SVM successfully classified the samples as authentic, exogenously-adulterated or endogenously-adulterated with an overall accuracy of 90 % and 96 % on withheld test set 1 and withheld test set 2, respectively (at a 95 % confidence level). Some limitations, expected to be mitigated by further research, were encountered in the correct classification of endogenously adulterated ground black pepper. Correct categorization of the ground black pepper samples was not adversely affected by the operator or the time span of data collection (the method development and model challenge were carried out by two operators over 6 months of the study, using ground black pepper harvested between 2015 and 2019). Therefore, HS-GC-IMS, coupled to an intelligent tool, is proposed to: (i) aid in industrial decision-making before utilization of a new batch of ground black pepper in the production chain; (ii) reduce the use of time-consuming conventional analyses and; (iii) increase the number of ground black pepper samples analyzed within an industrial quality control frame.
Collapse
Affiliation(s)
- Carmela Zacometti
- Istituto Zooprofilattico Sperimentale delle Venezie, Laboratory of Experimental Chemistry, Vicenza, Italy
| | - Giuseppe Sammarco
- Advanced Laboratory Research, Barilla G. e R. Fratelli S.p.A., Via Mantova, 166, 43122 Parma, Italy
| | - Andrea Massaro
- Istituto Zooprofilattico Sperimentale delle Venezie, Laboratory of Experimental Chemistry, Vicenza, Italy
| | - Stephane Lefevre
- Food Integrity Laboratory, Global Quality and Food Safety Center of Excellence, McCormick & Co., Inc., 999 avenue des Marchés, 84200 Carpentras, France
| | - Aline Frégière-Salomon
- Food Integrity Laboratory, Global Quality and Food Safety Center of Excellence, McCormick & Co., Inc., 999 avenue des Marchés, 84200 Carpentras, France
| | - Jean-Louis Lafeuille
- Global Quality and Food Safety Center of Excellence, McCormick & Co., Inc., 999 avenue des Marchés, 84200 Carpentras, France
| | - Ingrid Fiordaliso Candalino
- Global Quality and Food Safety Center of Excellence, McCormick & Co., Inc., Viale Iotti Nilde, 50038 San Piero (FI), Italy
| | - Roberto Piro
- Istituto Zooprofilattico Sperimentale delle Venezie, Laboratory of Experimental Chemistry, Vicenza, Italy
| | - Alessandra Tata
- Istituto Zooprofilattico Sperimentale delle Venezie, Laboratory of Experimental Chemistry, Vicenza, Italy
| | - Michele Suman
- Advanced Laboratory Research, Barilla G. e R. Fratelli S.p.A., Via Mantova, 166, 43122 Parma, Italy; Catholic University Sacred Heart, Department for Sustainable Food Process, Piacenza, Italy.
| |
Collapse
|
28
|
Faadiya AN, Widyaningrum R, Arindra PK, Diba SF. The diagnostic performance of impacted third molars in the mandible: A review of deep learning on panoramic radiographs. Saudi Dent J 2024; 36:404-412. [PMID: 38525176 PMCID: PMC10960107 DOI: 10.1016/j.sdentj.2023.11.025] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2023] [Revised: 11/21/2023] [Accepted: 11/23/2023] [Indexed: 03/26/2024] Open
Abstract
Background Mandibular third molar is prone to impaction, resulting in its inability to erupt into the oral cavity. The radiographic examination is required to support the odontectomy of impacted teeth. The use of computer-aided diagnosis based on deep learning is emerging in the field of medical and dentistry with the advancement of artificial intelligence (AI) technology. This review describes the performance and prospects of deep learning for the detection, classification, and evaluation of third molar-mandibular canal relationships on panoramic radiographs. Methods This work was conducted using three databases: PubMed, Google Scholar, and Science Direct. Following the literature selection, 49 articles were reviewed, with the 12 main articles discussed in this review. Results Several models of deep learning are currently used for segmentation and classification of third molar impaction with or without the combination of other techniques. Deep learning has demonstrated significant diagnostic performance in identifying mandibular impacted third molars (ITM) on panoramic radiographs, with an accuracy range of 78.91% to 90.23%. Meanwhile, the accuracy of deep learning in determining the relationship between ITM and the mandibular canal (MC) ranges from 72.32% to 99%. Conclusion Deep learning-based AI with high performance for the detection, classification, and evaluation of the relationship of ITM to the MC using panoramic radiographs has been developed over the past decade. However, deep learning must be improved using large datasets, and the evaluation of diagnostic performance for deep learning models should be aligned with medical diagnostic test protocols. Future studies involving collaboration among oral radiologists, clinicians, and computer scientists are required to identify appropriate AI development models that are accurate, efficient, and applicable to clinical services.
Collapse
Affiliation(s)
- Amalia Nur Faadiya
- Dental Medicine Study Program, Faculty of Dentistry, Universitas Gadjah Mada, Yogyakarta, Indonesia
| | - Rini Widyaningrum
- Department of Dentomaxillofacial Radiology, Faculty of Dentistry, Universitas Gadjah Mada, Yogyakarta, Indonesia
| | - Pingky Krisna Arindra
- Department of Oral and Maxillofacial Surgery, Faculty of Dentistry, Universitas Gadjah Mada, Yogyakarta, Indonesia
| | - Silviana Farrah Diba
- Department of Dentomaxillofacial Radiology, Faculty of Dentistry, Universitas Gadjah Mada, Yogyakarta, Indonesia
| |
Collapse
|
29
|
Singhal V, Chou N, Lee J, Yue Y, Liu J, Chock WK, Lin L, Chang YC, Teo EML, Aow J, Lee HK, Chen KH, Prabhakar S. BANKSY unifies cell typing and tissue domain segmentation for scalable spatial omics data analysis. Nat Genet 2024; 56:431-441. [PMID: 38413725 PMCID: PMC10937399 DOI: 10.1038/s41588-024-01664-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2023] [Accepted: 01/16/2024] [Indexed: 02/29/2024]
Abstract
Spatial omics data are clustered to define both cell types and tissue domains. We present Building Aggregates with a Neighborhood Kernel and Spatial Yardstick (BANKSY), an algorithm that unifies these two spatial clustering problems by embedding cells in a product space of their own and the local neighborhood transcriptome, representing cell state and microenvironment, respectively. BANKSY's spatial feature augmentation strategy improved performance on both tasks when tested on diverse RNA (imaging, sequencing) and protein (imaging) datasets. BANKSY revealed unexpected niche-dependent cell states in the mouse brain and outperformed competing methods on domain segmentation and cell typing benchmarks. BANKSY can also be used for quality control of spatial transcriptomics data and for spatially aware batch effect correction. Importantly, it is substantially faster and more scalable than existing methods, enabling the processing of millions of cell datasets. In summary, BANKSY provides an accurate, biologically motivated, scalable and versatile framework for analyzing spatially resolved omics data.
Collapse
Affiliation(s)
- Vipul Singhal
- Spatial and Single Cell Systems Domain, Genome Institute of Singapore (GIS), Agency for Science, Technology and Research (A*STAR), Singapore, Republic of Singapore
| | - Nigel Chou
- Spatial and Single Cell Systems Domain, Genome Institute of Singapore (GIS), Agency for Science, Technology and Research (A*STAR), Singapore, Republic of Singapore
| | - Joseph Lee
- Faculty of Science, National University of Singapore, Singapore, Republic of Singapore
| | - Yifei Yue
- Department of Chemical and Biomolecular Engineering, National University of Singapore, Singapore, Republic of Singapore
| | - Jinyue Liu
- Spatial and Single Cell Systems Domain, Genome Institute of Singapore (GIS), Agency for Science, Technology and Research (A*STAR), Singapore, Republic of Singapore
| | - Wan Kee Chock
- Spatial and Single Cell Systems Domain, Genome Institute of Singapore (GIS), Agency for Science, Technology and Research (A*STAR), Singapore, Republic of Singapore
| | - Li Lin
- Bioinformatics Institute (BII), Agency for Science, Technology and Research (A*STAR), Singapore, Republic of Singapore
| | | | | | - Jonathan Aow
- Spatial and Single Cell Systems Domain, Genome Institute of Singapore (GIS), Agency for Science, Technology and Research (A*STAR), Singapore, Republic of Singapore
| | - Hwee Kuan Lee
- Bioinformatics Institute (BII), Agency for Science, Technology and Research (A*STAR), Singapore, Republic of Singapore
- School of Computing, National University of Singapore, Singapore, Republic of Singapore
- Singapore Eye Research Institute, Singapore, Republic of Singapore
- International Research Laboratory on Artificial Intelligence, Singapore, Republic of Singapore
- School of Biological Sciences, Nanyang Technological University, Singapore, Republic of Singapore
- Singapore Institute for Clinical Sciences, Agency for Science, Technology and Research, Singapore, Republic of Singapore
| | - Kok Hao Chen
- Spatial and Single Cell Systems Domain, Genome Institute of Singapore (GIS), Agency for Science, Technology and Research (A*STAR), Singapore, Republic of Singapore.
| | - Shyam Prabhakar
- Spatial and Single Cell Systems Domain, Genome Institute of Singapore (GIS), Agency for Science, Technology and Research (A*STAR), Singapore, Republic of Singapore.
- Population and Global Health, Lee Kong Chian School of Medicine, Nanyang Technological University, Singapore, Republic of Singapore.
- Cancer Science Institute of Singapore, National University of Singapore, Singapore, Republic of Singapore.
| |
Collapse
|
30
|
Alipour M, Seok S, Mednick SC, Malerba P. A classification-based generative approach to selective targeting of global slow oscillations during sleep. Front Hum Neurosci 2024; 18:1342975. [PMID: 38415278 PMCID: PMC10896842 DOI: 10.3389/fnhum.2024.1342975] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2023] [Accepted: 01/30/2024] [Indexed: 02/29/2024] Open
Abstract
Background Given sleep's crucial role in health and cognition, numerous sleep-based brain interventions are being developed, aiming to enhance cognitive function, particularly memory consolidation, by improving sleep. Research has shown that Transcranial Alternating Current Stimulation (tACS) during sleep can enhance memory performance, especially when used in a closed-loop (cl-tACS) mode that coordinates with sleep slow oscillations (SOs, 0.5-1.5Hz). However, sleep tACS research is characterized by mixed results across individuals, which are often attributed to individual variability. Objective/Hypothesis This study targets a specific type of SOs, widespread on the electrode manifold in a short delay ("global SOs"), due to their close relationship with long-term memory consolidation. We propose a model-based approach to optimize cl-tACS paradigms, targeting global SOs not only by considering their temporal properties but also their spatial profile. Methods We introduce selective targeting of global SOs using a classification-based approach. We first estimate the current elicited by various stimulation paradigms, and optimize parameters to match currents found in natural sleep during a global SO. Then, we employ an ensemble classifier trained on sleep data to identify effective paradigms. Finally, the best stimulation protocol is determined based on classification performance. Results Our study introduces a model-driven cl-tACS approach that specifically targets global SOs, with the potential to extend to other brain dynamics. This method establishes a connection between brain dynamics and stimulation optimization. Conclusion Our research presents a novel approach to optimize cl-tACS during sleep, with a focus on targeting global SOs. This approach holds promise for improving cl-tACS not only for global SOs but also for other physiological events, benefiting both research and clinical applications in sleep and cognition.
Collapse
Affiliation(s)
- Mahmoud Alipour
- Center for Biobehavioral Health, Abigail Wexner Research Institute, Nationwide Children’s Hospital, Columbus, OH, United States
- The Ohio State University School of Medicine, Columbus, OH, United States
| | - SangCheol Seok
- Center for Gene Therapy, Abigail Wexner Research Institute, Nationwide Children’s Hospital, Columbus, OH, United States
| | - Sara C. Mednick
- Department of Cognitive Sciences, University of California, Irvine, Irvine CA, United States
| | - Paola Malerba
- Center for Biobehavioral Health, Abigail Wexner Research Institute, Nationwide Children’s Hospital, Columbus, OH, United States
- The Ohio State University School of Medicine, Columbus, OH, United States
| |
Collapse
|
31
|
Harkany T, Tretiakov E, Varela L, Jarc J, Rebernik P, Newbold S, Keimpema E, Verkhratsky A, Horvath T, Romanov R. Molecularly stratified hypothalamic astrocytes are cellular foci for obesity. RESEARCH SQUARE 2024:rs.3.rs-3748581. [PMID: 38405925 PMCID: PMC10889077 DOI: 10.21203/rs.3.rs-3748581/v1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/27/2024]
Abstract
Astrocytes safeguard the homeostasis of the central nervous system1,2. Despite their prominent morphological plasticity under conditions that challenge the brain's adaptive capacity3-5, the classification of astrocytes, and relating their molecular make-up to spatially devolved neuronal operations that specify behavior or metabolism, remained mostly futile6,7. Although it seems unexpected in the era of single-cell biology, the lack of a major advance in stratifying astrocytes under physiological conditions rests on the incompatibility of 'neurocentric' algorithms that rely on stable developmental endpoints, lifelong transcriptional, neurotransmitter, and neuropeptide signatures for classification6-8 with the dynamic functional states, anatomic allocation, and allostatic plasticity of astrocytes1. Simplistically, therefore, astrocytes are still grouped as 'resting' vs. 'reactive', the latter referring to pathological states marked by various inducible genes3,9,10. Here, we introduced a machine learning-based feature recognition algorithm that benefits from the cumulative power of published single-cell RNA-seq data on astrocytes as a reference map to stepwise eliminate pleiotropic and inducible cellular features. For the healthy hypothalamus, this walk-back approach revealed gene regulatory networks (GRNs) that specified subsets of astrocytes, and could be used as landmarking tools for their anatomical assignment. The core molecular censuses retained by astrocyte subsets were sufficient to stratify them by allostatic competence, chiefly their signaling and metabolic interplay with neurons. Particularly, we found differentially expressed mitochondrial genes in insulin-sensing astrocytes and demonstrated their reciprocal signaling with neurons that work antagonistically within the food intake circuitry. As a proof-of-concept, we showed that disrupting Mfn2 expression in astrocytes reduced their ability to support dynamic circuit reorganization, a time-locked feature of satiety in the hypothalamus, thus leading to obesity in mice. Overall, our results suggest that astrocytes in the healthy brain are fundamentally more heterogeneous than previously thought and topologically mirror the specificity of local neurocircuits.
Collapse
Affiliation(s)
- Tibor Harkany
- Center for Brain Research, Medical University of Vienna
| | | | | | - Jasna Jarc
- Center for Brain Research, Medical University of Vienna
| | | | | | - Erik Keimpema
- Medical University of Vienna, Center for Brain Research
| | | | | | | |
Collapse
|
32
|
Coverdell TC, Sampson M, Zubirán R, Wolska A, Donato LJ, Meeusen JW, Jaffe AS, Remaley AT. An improved method for estimating low LDL-C based on the enhanced Sampson-NIH equation. Lipids Health Dis 2024; 23:43. [PMID: 38331834 PMCID: PMC10851542 DOI: 10.1186/s12944-024-02018-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2023] [Accepted: 01/13/2024] [Indexed: 02/10/2024] Open
Abstract
BACKGROUND The accurate measurement of Low-density lipoprotein cholesterol (LDL-C) is critical in the decision to utilize the new lipid-lowering therapies like PCSK9-inhibitors (PCSK9i) for high-risk cardiovascular disease patients that do not achieve sufficiently low LDL-C on statin therapy. OBJECTIVE To improve the estimation of low LDL-C by developing a new equation that includes apolipoprotein B (apoB) as an independent variable, along with the standard lipid panel test results. METHODS Using β-quantification (BQ) as the reference method, which was performed on a large dyslipidemic population (N = 24,406), the following enhanced Sampson-NIH equation (eS LDL-C) was developed by least-square regression analysis: [Formula: see text] RESULTS: The eS LDL-C equation was the most accurate equation for a broad range of LDL-C values based on regression related parameters and the mean absolute difference (mg/dL) from the BQ reference method (eS LDL-C: 4.51, Sampson-NIH equation [S LDL-C]: 6.07; extended Martin equation [eM LDL-C]: 6.64; Friedewald equation [F LDL-C]: 8.3). It also had the best area-under-the-curve accuracy score by Regression Error Characteristic plots for LDL-C < 100 mg/dL (eS LDL-C: 0.953; S LDL-C: 0.920; eM LDL-C: 0.915; F LDL-C: 0.874) and was the best equation for categorizing patients as being below or above the 70 mg/dL LDL-C treatment threshold for adding new lipid-lowering drugs by kappa score analysis when compared to BQ LDL-C for TG < 800 mg/dL (eS LDL-C: 0.870 (0.853-0.887); S LDL-C:0.763 (0.749-0.776); eM LDL-C:0.706 (0.690-0.722); F LDL-C:0.687 (0.672-0.701). Approximately a third of patients with an F LDL-C < 70 mg/dL had falsely low test results, but about 80% were correctly reclassified as higher (≥ 70 mg/dL) by the eS LDL-C equation, making them potentially eligible for PCSK9i treatment. The M LDL-C and S LDL-C equations had less false low results below 70 mg/dL than the F LDL-C equation but reclassification by the eS LDL-C equation still also increased the net number of patients correctly classified. CONCLUSIONS The use of the eS LDL-C equation as a confirmatory test improves the identification of high-risk cardiovascular disease patients, who could benefit from new lipid-lowering therapies but have falsely low LDL-C, as determined by the standard LDL-C equations used in current practice.
Collapse
Affiliation(s)
- Tatiana C Coverdell
- Clinical Center, Department of Laboratory Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Maureen Sampson
- Clinical Center, Department of Laboratory Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Rafael Zubirán
- Lipoprotein Metabolism Laboratory, Translational Vascular Medicine Branch, National Heart, Lung, and Blood Institute, National Institutes of Health, Bethesda, MD, USA
| | - Anna Wolska
- Lipoprotein Metabolism Laboratory, Translational Vascular Medicine Branch, National Heart, Lung, and Blood Institute, National Institutes of Health, Bethesda, MD, USA
| | - Leslie J Donato
- Department of Laboratory Medicine and Pathology, Mayo Clinic, Rochester, MN, USA
| | - Jeff W Meeusen
- Cardiovascular Laboratory Medicine, Mayo Clinic, Rochester, MN, USA
| | - Allan S Jaffe
- Division of Clinical Core Laboratory Services, Mayo Clinic, Rochester, MN, USA
| | - Alan T Remaley
- Lipoprotein Metabolism Laboratory, Translational Vascular Medicine Branch, National Heart, Lung, and Blood Institute, National Institutes of Health, Bethesda, MD, USA.
| |
Collapse
|
33
|
Unlu O, Shin J, Mailly CJ, Oates MF, Tucci MR, Varugheese M, Wagholikar K, Wang F, Scirica BM, Blood AJ, Aronson SJ. Retrieval Augmented Generation Enabled Generative Pre-Trained Transformer 4 (GPT-4) Performance for Clinical Trial Screening. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.02.08.24302376. [PMID: 38370719 PMCID: PMC10871450 DOI: 10.1101/2024.02.08.24302376] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/20/2024]
Abstract
Background Subject screening is a key aspect of all clinical trials; however, traditionally, it is a labor-intensive and error-prone task, demanding significant time and resources. With the advent of large language models (LLMs) and related technologies, a paradigm shift in natural language processing capabilities offers a promising avenue for increasing both quality and efficiency of screening efforts. This study aimed to test the Retrieval-Augmented Generation (RAG) process enabled Generative Pretrained Transformer Version 4 (GPT-4) to accurately identify and report on inclusion and exclusion criteria for a clinical trial. Methods The Co-Operative Program for Implementation of Optimal Therapy in Heart Failure (COPILOT-HF) trial aims to recruit patients with symptomatic heart failure. As part of the screening process, a list of potentially eligible patients is created through an electronic health record (EHR) query. Currently, structured data in the EHR can only be used to determine 5 out of 6 inclusion and 5 out of 17 exclusion criteria. Trained, but non-licensed, study staff complete manual chart review to determine patient eligibility and record their assessment of the inclusion and exclusion criteria. We obtained the structured assessments completed by the study staff and clinical notes for the past two years and developed a workflow of clinical note-based question answering system powered by RAG architecture and GPT-4 that we named RECTIFIER (RAG-Enabled Clinical Trial Infrastructure for Inclusion Exclusion Review). We used notes from 100 patients as a development dataset, 282 patients as a validation dataset, and 1894 patients as a test set. An expert clinician completed a blinded review of patients' charts to answer the eligibility questions and determine the "gold standard" answers. We calculated the sensitivity, specificity, accuracy, and Matthews correlation coefficient (MCC) for each question and screening method. We also performed bootstrapping to calculate the confidence intervals for each statistic. Results Both RECTIFIER and study staff answers closely aligned with the expert clinician answers across criteria with accuracy ranging between 97.9% and 100% (MCC 0.837 and 1) for RECTIFIER and 91.7% and 100% (MCC 0.644 and 1) for study staff. RECTIFIER performed better than study staff to determine the inclusion criteria of "symptomatic heart failure" with an accuracy of 97.9% vs 91.7% and an MCC of 0.924 vs 0.721, respectively. Overall, the sensitivity and specificity of determining eligibility for the RECTIFIER was 92.3% (CI) and 93.9% (CI), and study staff was 90.1% (CI) and 83.6% (CI), respectively. Conclusion GPT-4 based solutions have the potential to improve efficiency and reduce costs in clinical trial screening. When incorporating new tools such as RECTIFIER, it is important to consider the potential hazards of automating the screening process and set up appropriate mitigation strategies such as final clinician review before patient engagement.
Collapse
Affiliation(s)
- Ozan Unlu
- Accelerator for Clinical Transformation, Brigham and Women's Hospital, Boston, MA
- Division of Cardiovascular Medicine, Brigham and Women's Hospital, Boston, MA
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA
- Harvard Medical School, Boston, MA
| | - Jiyeon Shin
- Accelerator for Clinical Transformation, Brigham and Women's Hospital, Boston, MA
- Mass General Brigham Personalized Medicine, Cambridge, MA
| | - Charlotte J Mailly
- Accelerator for Clinical Transformation, Brigham and Women's Hospital, Boston, MA
- Mass General Brigham Personalized Medicine, Cambridge, MA
| | - Michael F Oates
- Accelerator for Clinical Transformation, Brigham and Women's Hospital, Boston, MA
- Mass General Brigham Personalized Medicine, Cambridge, MA
| | - Michela R Tucci
- Accelerator for Clinical Transformation, Brigham and Women's Hospital, Boston, MA
| | - Matthew Varugheese
- Accelerator for Clinical Transformation, Brigham and Women's Hospital, Boston, MA
| | - Kavishwar Wagholikar
- Accelerator for Clinical Transformation, Brigham and Women's Hospital, Boston, MA
- Research Information Science and Computing, Mass General Brigham, Somerville, MA
| | - Fei Wang
- Accelerator for Clinical Transformation, Brigham and Women's Hospital, Boston, MA
- Mass General Brigham Personalized Medicine, Cambridge, MA
| | - Benjamin M Scirica
- Accelerator for Clinical Transformation, Brigham and Women's Hospital, Boston, MA
- Division of Cardiovascular Medicine, Brigham and Women's Hospital, Boston, MA
- Harvard Medical School, Boston, MA
| | - Alexander J Blood
- Accelerator for Clinical Transformation, Brigham and Women's Hospital, Boston, MA
- Division of Cardiovascular Medicine, Brigham and Women's Hospital, Boston, MA
- Harvard Medical School, Boston, MA
| | - Samuel J Aronson
- Accelerator for Clinical Transformation, Brigham and Women's Hospital, Boston, MA
- Mass General Brigham Personalized Medicine, Cambridge, MA
| |
Collapse
|
34
|
Guha S, Ibrahim A, Wu Q, Geng P, Chou Y, Yang H, Ma J, Lu L, Wang D, Schwartz LH, Xie CM, Zhao B. Machine learning-based identification of contrast-enhancement phase of computed tomography scans. PLoS One 2024; 19:e0294581. [PMID: 38306329 PMCID: PMC10836663 DOI: 10.1371/journal.pone.0294581] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2023] [Accepted: 11/04/2023] [Indexed: 02/04/2024] Open
Abstract
Contrast-enhanced computed tomography scans (CECT) are routinely used in the evaluation of different clinical scenarios, including the detection and characterization of hepatocellular carcinoma (HCC). Quantitative medical image analysis has been an exponentially growing scientific field. A number of studies reported on the effects of variations in the contrast enhancement phase on the reproducibility of quantitative imaging features extracted from CT scans. The identification and labeling of phase enhancement is a time-consuming task, with a current need for an accurate automated labeling algorithm to identify the enhancement phase of CT scans. In this study, we investigated the ability of machine learning algorithms to label the phases in a dataset of 59 HCC patients scanned with a dynamic contrast-enhanced CT protocol. The ground truth labels were provided by expert radiologists. Regions of interest were defined within the aorta, the portal vein, and the liver. Mean density values were extracted from those regions of interest and used for machine learning modeling. Models were evaluated using accuracy, the area under the curve (AUC), and Matthew's correlation coefficient (MCC). We tested the algorithms on an external dataset (76 patients). Our results indicate that several supervised learning algorithms (logistic regression, random forest, etc.) performed similarly, and our developed algorithms can accurately classify the phase of contrast enhancement.
Collapse
Affiliation(s)
- Siddharth Guha
- Department of Radiology, Columbia University Irving Medical Center, New York, NY, United States of America
| | - Abdalla Ibrahim
- Department of Radiology, Columbia University Irving Medical Center, New York, NY, United States of America
| | - Qian Wu
- Department of Radiology, Columbia University Irving Medical Center, New York, NY, United States of America
| | - Pengfei Geng
- Department of Radiology, Columbia University Irving Medical Center, New York, NY, United States of America
| | - Yen Chou
- Department of Radiology, Columbia University Irving Medical Center, New York, NY, United States of America
| | - Hao Yang
- Department of Radiology, Columbia University Irving Medical Center, New York, NY, United States of America
| | - Jingchen Ma
- Department of Radiology, Columbia University Irving Medical Center, New York, NY, United States of America
| | - Lin Lu
- Department of Radiology, Columbia University Irving Medical Center, New York, NY, United States of America
| | - Delin Wang
- Sun Yat-Sen University Cancer Center, Guangzhou, China
| | - Lawrence H. Schwartz
- Department of Radiology, Columbia University Irving Medical Center, New York, NY, United States of America
| | | | - Binsheng Zhao
- Department of Radiology, Columbia University Irving Medical Center, New York, NY, United States of America
| |
Collapse
|
35
|
Jia J, Wu G, Li M. iGly-IDN: Identifying Lysine Glycation Sites in Proteins Based on Improved DenseNet. J Comput Biol 2024; 31:161-174. [PMID: 38016151 DOI: 10.1089/cmb.2023.0112] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2023] Open
Abstract
Lysine glycation is one of the most significant protein post-translational modifications, which changes the properties of the proteins and causes them to be dysfunctional. Accurately identifying glycation sites helps to understand the biological function and potential mechanism of glycation in disease treatments. Nonetheless, the experimental methods are ordinarily inefficient and costly, so effective computational methods need to be developed. In this study, we proposed the new model called iGly-IDN based on the improved densely connected convolutional networks (DenseNet). First, one hot encoding was adopted to obtain the original feature maps. Afterward, the improved DenseNet was adopted to capture feature information with the importance degrees during the feature learning. According to the experimental results, Acc reaches 66%, and Mathews correlation coefficient reaches 0.33 on the independent testing data set, which indicates that the iGly-IDN can provide more effective glycation site identification than the current predictors.
Collapse
Affiliation(s)
- Jianhua Jia
- School of Information Engineering, Jingdezhen Ceramic University, Jingdezhen, China
| | - Genqiang Wu
- School of Information Engineering, Jingdezhen Ceramic University, Jingdezhen, China
- College of Modern Economics and Management, Jiangxi University of Finance and Economics, Nanchang, China
| | - Meifang Li
- School of Computer Information Engineering, Nanchang Institute of Technology, Nanchang, China
| |
Collapse
|
36
|
Lee S, Lee I. Comprehensive assessment of machine learning methods for diagnosing gastrointestinal diseases through whole metagenome sequencing data. Gut Microbes 2024; 16:2375679. [PMID: 38972064 PMCID: PMC11229738 DOI: 10.1080/19490976.2024.2375679] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/29/2024] [Accepted: 06/28/2024] [Indexed: 07/09/2024] Open
Abstract
The gut microbiome, linked significantly to host diseases, offers potential for disease diagnosis through machine learning (ML) pipelines. These pipelines, crucial in modeling diseases using high-dimensional microbiome data, involve selecting profile modalities, data preprocessing techniques, and classification algorithms, each impacting the model accuracy and generalizability. Despite whole metagenome shotgun sequencing (WMS) gaining popularity for human gut microbiome profiling, a consensus on the optimal methods for ML pipelines in disease diagnosis using WMS data remains elusive. Addressing this gap, we comprehensively evaluated ML methods for diagnosing Crohn's disease and colorectal cancer, using 2,553 fecal WMS samples from 21 case-control studies. Our study uncovered crucial insights: gut-specific, species-level taxonomic features proved to be the most effective for profiling; batch correction was not consistently beneficial for model performance; compositional data transformations markedly improved the models; and while nonlinear ensemble classification algorithms typically offered superior performance, linear models with proper regularization were found to be more effective for diseases that are linearly separable based on microbiome data. An optimal ML pipeline, integrating the most effective methods, was validated for generalizability using holdout data. This research offers practical guidelines for constructing reliable disease diagnostic ML models with fecal WMS data.
Collapse
Affiliation(s)
- Sungho Lee
- Department of Biotechnology, College of Life Science and Biotechnology, Yonsei University, Seoul, Republic of Korea
| | - Insuk Lee
- Department of Biotechnology, College of Life Science and Biotechnology, Yonsei University, Seoul, Republic of Korea
- POSTECH Biotech Center, Pohang University of Science and Technology (POSTECH), Pohang, Republic of Korea
| |
Collapse
|
37
|
Karthikeyan S, Vazquez-Zapien GJ, Martinez-Cuazitl A, Delgado-Macuil RJ, Rivera-Alatorre DE, Garibay-Gonzalez F, Delgado-Gonzalez J, Valencia-Trujillo D, Guerrero-Ruiz M, Atriano-Colorado C, Lopez-Reyes A, Lopez-Mezquita DJ, Mata-Miranda MM. Two-trace two-dimensional correlation spectra (2T2D-COS) analysis using FTIR spectra to monitor the immune response by COVID-19. J Mol Med (Berl) 2024; 102:53-67. [PMID: 37947852 DOI: 10.1007/s00109-023-02390-9] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2023] [Revised: 09/22/2023] [Accepted: 10/20/2023] [Indexed: 11/12/2023]
Abstract
There is a growing trend in using saliva for SARS-CoV-2 detection with reasonable accuracy. We have studied the responses of IgA, IgG, and IgM in human saliva by directly comparing disease with control analyzing two-trace two-dimensional correlation spectra (2T2D-COS) employing Fourier transform infrared (FTIR) spectra. It explores the molecular-level variation between control and COVID-19 saliva samples. The advantage of 2T2D spectra is that it helps in discriminating remarkably subtle features between two simple pairs of spectra. It gives spectral information from highly overlapped bands associated with different systems. The clinical findings from 2T2D show the decrease of IgG and IgM salivary antibodies in the 50, 60, 65, and 75-years COVID-19 samples. Among the various COVID-19 populations studied the female 30-years group reveals defense mechanisms exhibited by IgM and IgA. Lipids and fatty acids decrease, resulting in lipid oxidation due to the SARS-CoV-2 in the samples studied. Study shows salivary thiocyanate plays defense against SARS-CoV-2 in the male population in 25 and 35 age groups. The receiver operation characteristics statistical method shows a sensitivity of 98% and a specificity of 94% for the samples studied. The measure of accuracy computed as F score and G score has a high value, supporting our study's validation. Thus, 2T2D-COS analysis can potentially monitor the progression of immunoglobulin's response function to COVID-19 with reasonable accuracy, which could help diagnose clinical trials. KEY MESSAGES: The molecular profile of salivary antibodies is well resolved and identified from 2T2D-COS FTIR spectra. The IgG antibody plays a significant role in the defense mechanism against SARS-CoV-2 in 25-40 years. 2T2D-COS reveals the absence of salivary thiocyanate in the 40-75 years COVID-19 population. The receiver operation characteristic (ROC) analysis validates our study with high sensitivity and specificity.
Collapse
Affiliation(s)
- Sivakumaran Karthikeyan
- Department of Physics, Dr. Ambedkar Government Arts College, Chennai, Tamil Nadu, 600039, India.
| | - Gustavo J Vazquez-Zapien
- Centro de Investigación y Desarrollo del Ejército y Fuerza Aérea Mexicanos, Secretaría de la Defensa Nacional, Mexico City, 11400, Mexico.
- Escuela Militar de Medicina, Centro Militar de Ciencias de la Salud, Secretaría de la Defensa Nacional, Mexico City, 11200, Mexico.
| | - Adriana Martinez-Cuazitl
- Escuela Militar de Medicina, Centro Militar de Ciencias de la Salud, Secretaría de la Defensa Nacional, Mexico City, 11200, Mexico
- Escuela Nacional de Medicina y Homeopatía, Instituto Politécnico Nacional, Mexico City, 07320, Mexico
| | - Raul J Delgado-Macuil
- Centro de Investigación en Biotecnología Aplicada, Instituto Politécnico Nacional, Tlaxcala, 90700, Mexico
| | - Daniel E Rivera-Alatorre
- Centro de Investigación y Desarrollo del Ejército y Fuerza Aérea Mexicanos, Secretaría de la Defensa Nacional, Mexico City, 11400, Mexico
| | - Francisco Garibay-Gonzalez
- Escuela Militar de Medicina, Centro Militar de Ciencias de la Salud, Secretaría de la Defensa Nacional, Mexico City, 11200, Mexico
| | - Josemaria Delgado-Gonzalez
- Escuela Militar de Medicina, Centro Militar de Ciencias de la Salud, Secretaría de la Defensa Nacional, Mexico City, 11200, Mexico
| | - Daniel Valencia-Trujillo
- Servicio de Microbiología Clínica, Instituto Nacional de Enfermedades Respiratorias, Mexico City, 14080, Mexico
| | - Melissa Guerrero-Ruiz
- Escuela Militar de Medicina, Centro Militar de Ciencias de la Salud, Secretaría de la Defensa Nacional, Mexico City, 11200, Mexico
| | - Consuelo Atriano-Colorado
- Escuela Militar de Medicina, Centro Militar de Ciencias de la Salud, Secretaría de la Defensa Nacional, Mexico City, 11200, Mexico
| | - Alberto Lopez-Reyes
- Laboratorio de Gerociencias, Instituto Nacional de Rehabilitación Luis Guillermo Ibarra Ibarra, Secretaría de Salud, Mexico City, 14389, Mexico
| | | | - Monica M Mata-Miranda
- Escuela Militar de Medicina, Centro Militar de Ciencias de la Salud, Secretaría de la Defensa Nacional, Mexico City, 11200, Mexico.
| |
Collapse
|
38
|
Almeida RL, Maltarollo VG, Coelho FGF. Overcoming class imbalance in drug discovery problems: Graph neural networks and balancing approaches. J Mol Graph Model 2024; 126:108627. [PMID: 37801808 DOI: 10.1016/j.jmgm.2023.108627] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2023] [Revised: 09/12/2023] [Accepted: 09/12/2023] [Indexed: 10/08/2023]
Abstract
This research investigates the application of Graph Neural Networks (GNNs) to enhance the cost-effectiveness of drug development, addressing the limitations of cost and time. Class imbalances within classification datasets, such as the discrepancy between active and inactive compounds, give rise to difficulties that can be resolved through strategies like oversampling, undersampling, and manipulation of the loss function. A comparison is conducted between three distinct datasets using three different GNN architectures. This benchmarking research can steer future investigations and enhance the efficacy of GNNs in drug discovery and design. Three hundred models for each combination of architecture and dataset were trained using hyperparameter tuning techniques and evaluated using a range of metrics. Notably, the oversampling technique outperforms eight experiments, showcasing its potential. While balancing techniques boost imbalanced dataset models, their efficacy depends on dataset specifics and problem type. Although oversampling aids molecular graph datasets, more research is needed to optimize its usage and explore other class imbalance solutions.
Collapse
Affiliation(s)
- Rafael Lopes Almeida
- Graduate Program in Electrical Engineering - Universidade Federal de Minas Gerais, Av. Antônio Carlos 6627, Belo Horizonte, 31270-901, MG, Brazil
| | - Vinícius Gonçalves Maltarollo
- Department of Pharmaceutical Products - Universidade Federal de Minas Gerais, Av. Antônio Carlos 6627, Belo Horizonte, 31270-901, MG, Brazil.
| | - Frederico Gualberto Ferreira Coelho
- Department of Electronical Engineering - Universidade Federal de Minas Gerais, Av. Antônio Carlos 6627, Belo Horizonte, 31270-901, MG, Brazil
| |
Collapse
|
39
|
Zavorsky GS, Agostoni P. Two is better than one: the double diffusion technique in classifying heart failure. ERJ Open Res 2024; 10:00644-2023. [PMID: 38226067 PMCID: PMC10789268 DOI: 10.1183/23120541.00644-2023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2023] [Accepted: 11/15/2023] [Indexed: 01/17/2024] Open
Abstract
Background Heart failure (HF) is a chronic condition in which the heart does not pump enough blood to meet the body's demands. Diffusing capacity of the lung for nitric oxide (DLNO) and carbon monoxide (DLCO) may be used to classify patients with HF, as DLNO and DLCO are lung function measurements that reflect pulmonary gas exchange. Our objectives were to determine 1) if DLNO added to DLCO testing predicts HF better than DLCO alone and 2) whether the binary classification of HF is better when DLNO z-scores are combined with DLCO z-scores than using DLCO z-scores alone. Methods This was a retrospective secondary data analysis in 140 New York Heart Association Class II HF patients (ejection fraction <40%) and 50 patients without HF. z-scores for DLNO, DLCO and DLNO+DLCO were created from reference equations from three articles. The model with the lowest Bayesian Information Criterion was the best predictive model. Binary HF classification was evaluated with the Matthews Correlation Coefficient (MCC). Results The top two of 12 models were combined z-score models. The highest MCC (0.51) was from combined z-score models. At most, only 32% of the variance in the odds of having HF was explained by combined z-scores. Conclusions Combined z-scores explained 32% of the variation in the likelihood of an individual having HF, which was higher than models using DLNO or DLCO z-scores alone. Combined z-score models had a moderate ability to classify patients with HF. We recommend using the NO-CO double diffusion technique to assess gas exchange impairment in those suspected of HF.
Collapse
Affiliation(s)
- Gerald S. Zavorsky
- Department of Physiology and Membrane Biology, University of California Davis, Sacramento, CA, USA
| | - Piergiuseppe Agostoni
- Department of Critical Cardiology, Centro Cardiologico Monzino IRCCS, Milan, Italy
- Department of Clinical Sciences and Community Health, Cardiovascular Section, University of Milan, Milan, Italy
| |
Collapse
|
40
|
Guan J, Yao L, Chung CR, Xie P, Zhang Y, Deng J, Chiang YC, Lee TY. Predicting Anti-inflammatory Peptides by Ensemble Machine Learning and Deep Learning. J Chem Inf Model 2023; 63:7886-7898. [PMID: 38054927 DOI: 10.1021/acs.jcim.3c01602] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/07/2023]
Abstract
Inflammation is a biological response to harmful stimuli, aiding in the maintenance of tissue homeostasis. However, excessive or persistent inflammation can precipitate a myriad of pathological conditions. Although current treatments such as NSAIDs, corticosteroids, and immunosuppressants are effective, they can have side effects and resistance issues. In this backdrop, anti-inflammatory peptides (AIPs) have emerged as a promising therapeutic approach against inflammation. Leveraging machine learning methods, we have the opportunity to accelerate the discovery and investigation of these AIPs more effectively. In this study, we proposed an advanced framework by ensemble machine learning and deep learning for AIP prediction. Initially, we constructed three individual models with extremely randomized trees (ET), gated recurrent unit (GRU), and convolutional neural networks (CNNs) with attention mechanism and then used stacking architecture to build the final predictor. By utilizing various sequence encodings and combining the strengths of different algorithms, our predictor demonstrated exemplary performance. On our independent test set, our model achieved an accuracy, MCC, and F1-score of 0.757, 0.500, and 0.707, respectively, clearly outperforming other contemporary AIP prediction methods. Additionally, our model offers profound insights into the feature interpretation of AIPs, establishing a valuable knowledge foundation for the design and development of future anti-inflammatory strategies.
Collapse
Affiliation(s)
- Jiahui Guan
- School of Medicine, The Chinese University of Hong Kong, Shenzhen 518172, China
| | - Lantian Yao
- Kobilka Institute of Innovative Drug Discovery, School of Medicine, The Chinese University of Hong Kong, Shenzhen 518172, China
- School of Science and Engineering, The Chinese University of Hong Kong, Shenzhen 518172, China
| | - Chia-Ru Chung
- Department of Computer Science and Information Engineering, National Central University, Taoyuan 320317, Taiwan
| | - Peilin Xie
- Kobilka Institute of Innovative Drug Discovery, School of Medicine, The Chinese University of Hong Kong, Shenzhen 518172, China
| | - Yilun Zhang
- School of Medicine, The Chinese University of Hong Kong, Shenzhen 518172, China
| | - Junyang Deng
- School of Medicine, The Chinese University of Hong Kong, Shenzhen 518172, China
| | - Ying-Chih Chiang
- School of Medicine, The Chinese University of Hong Kong, Shenzhen 518172, China
- Kobilka Institute of Innovative Drug Discovery, School of Medicine, The Chinese University of Hong Kong, Shenzhen 518172, China
| | - Tzong-Yi Lee
- Institute of Bioinformatics and Systems Biology, National Yang Ming Chiao Tung University, Hsinchu 300093, Taiwan
- Center for Intelligent Drug Systems and Smart Bio-devices (IDS2B), National Yang Ming Chiao Tung University, Hsinchu 300093, Taiwan
| |
Collapse
|
41
|
Cassidy RM, Flores EM, Trinh Nguyen AK, Cheruvu SS, Uribe RA, Krachler AM, Odem MA. Systematic analysis of proximal midgut- and anorectal-originating contractions in larval zebrafish using event feature detection and supervised machine learning algorithms. Neurogastroenterol Motil 2023; 35:e14675. [PMID: 37743702 PMCID: PMC10841157 DOI: 10.1111/nmo.14675] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/12/2021] [Revised: 07/16/2023] [Accepted: 08/28/2023] [Indexed: 09/26/2023]
Abstract
BACKGROUND Zebrafish larvae are translucent, allowing in vivo analysis of gut development and physiology, including gut motility. While recent progress has been made in measuring gut motility in larvae, challenges remain which can influence results, such as how data are interpreted, opportunities for technical user error, and inconsistencies in methods. METHODS To overcome these challenges, we noninvasively introduced Nile Red fluorescent dye to fill the intraluminal gut space in zebrafish larvae and collected serial confocal microscopic images of gut fluorescence. We automated the detection of fluorescent-contrasted contraction events against the median-subtracted signal and compared it to manually annotated gut contraction events across anatomically defined gut regions. Supervised machine learning (multiple logistic regression) was then used to discriminate between true contraction events and noise. To demonstrate, we analyzed motility in larvae under control and reserpine-treated conditions. We also used automated event detection analysis to compare unfed and fed larvae. KEY RESULTS Automated analysis retained event features for proximal midgut-originating retrograde and anterograde contractions and anorectal-originating retrograde contractions. While manual annotation showed reserpine disrupted gut motility, machine learning only achieved equivalent contraction discrimination in controls and failed to accurately identify contractions after reserpine due to insufficient intraluminal fluorescence. Automated analysis also showed feeding had no effect on the frequency of anorectal-originating contractions. CONCLUSIONS & INFERENCES Automated event detection analysis rapidly and accurately annotated contraction events, including the previously neglected phenomenon of anorectal contractions. However, challenges remain to discriminate contraction events based on intraluminal fluorescence under treatment conditions that disrupt functional motility.
Collapse
Affiliation(s)
- Ryan M. Cassidy
- Brown Foundation Institute of Molecular Medicine, McGovern
Medical School at UTHealth, Houston, TX 77030, USA
| | - Erika M. Flores
- Department of Microbiology and Molecular Genetics, McGovern
Medical School at UTHealth, Houston, TX 77030, USA
| | - Anh K. Trinh Nguyen
- Department of Microbiology and Molecular Genetics, McGovern
Medical School at UTHealth, Houston, TX 77030, USA
| | - Sai S. Cheruvu
- Department of Integrative Biology and Pharmacology,
McGovern Medical School at UTHealth, Houston, TX 77030, USA
| | - Rosa A. Uribe
- Department of Biosciences, Rice University, Houston, TX
77005, USA
| | - Anne Marie Krachler
- Department of Microbiology and Molecular Genetics, McGovern
Medical School at UTHealth, Houston, TX 77030, USA
| | - Max A. Odem
- Department of Microbiology and Molecular Genetics, McGovern
Medical School at UTHealth, Houston, TX 77030, USA
| |
Collapse
|
42
|
Michelsen C, Jørgensen CC, Heltberg M, Jensen MH, Lucchetti A, Petersen PB, Petersen T, Kehlet H, Madsen F, Hansen TB, Gromov K, Jakobsen T, Varnum C, Overgaard S, Rathsach M, Hansen L. Machine-learning vs. logistic regression for preoperative prediction of medical morbidity after fast-track hip and knee arthroplasty-a comparative study. BMC Anesthesiol 2023; 23:391. [PMID: 38030979 PMCID: PMC10685559 DOI: 10.1186/s12871-023-02354-z] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2023] [Accepted: 11/21/2023] [Indexed: 12/01/2023] Open
Abstract
BACKGROUND Machine-learning models may improve prediction of length of stay (LOS) and morbidity after surgery. However, few studies include fast-track programs, and most rely on administrative coding with limited follow-up and information on perioperative care. This study investigates potential benefits of a machine-learning model for prediction of postoperative morbidity in fast-track total hip (THA) and knee arthroplasty (TKA). METHODS Cohort study in consecutive unselected primary THA/TKA between 2014-2017 from seven Danish centers with established fast-track protocols. Preoperative comorbidity and prescribed medication were recorded prospectively and information on length of stay and readmissions was obtained through the Danish National Patient Registry and medical records. We used a machine-learning model (Boosted Decision Trees) based on boosted decision trees with 33 preoperative variables for predicting "medical" morbidity leading to LOS > 4 days or 90-days readmissions and compared to a logistical regression model based on the same variables. We also evaluated two parsimonious models, using the ten most important variables in the full machine-learning and logistic regression models. Data collected between 2014-2016 (n:18,013) was used for model training and data from 2017 (n:3913) was used for testing. Model performances were analyzed using precision, area under receiver operating (AUROC) and precision recall curves (AUPRC), as well as the Mathews Correlation Coefficient. Variable importance was analyzed using Shapley Additive Explanations values. RESULTS Using a threshold of 20% "risk-patients" (n:782), precision, AUROC and AUPRC were 13.6%, 76.3% and 15.5% vs. 12.4%, 74.7% and 15.6% for the machine-learning and logistic regression model, respectively. The parsimonious machine-learning model performed better than the full logistic regression model. Of the top ten variables, eight were shared between the machine-learning and logistic regression models, but with a considerable age-related variation in importance of specific types of medication. CONCLUSION A machine-learning model using preoperative characteristics and prescriptions slightly improved identification of patients in high-risk of "medical" complications after fast-track THA and TKA compared to a logistic regression model. Such algorithms could help find a manageable population of patients who may benefit most from intensified perioperative care.
Collapse
Affiliation(s)
- Christian Michelsen
- The Niels Bohr Institute, University of Copenhagen, Blegdamsvej 17, 2100, Copenhagen, Denmark
| | - Christoffer C Jørgensen
- Department of Anesthesia and Intensive Care, Hospital of Northern Zealand, Dyrehavevej 29 3400, Hillerød, Denmark.
- The Centre for Fast-Track Hip and Knee Replacement, 7621, Rigshospitalet, Blegdamsvej 9, 2100, Copenhagen, Denmark.
| | - Mathias Heltberg
- The Niels Bohr Institute, University of Copenhagen, Blegdamsvej 17, 2100, Copenhagen, Denmark
| | - Mogens H Jensen
- The Niels Bohr Institute, University of Copenhagen, Blegdamsvej 17, 2100, Copenhagen, Denmark
| | - Alessandra Lucchetti
- The Niels Bohr Institute, University of Copenhagen, Blegdamsvej 17, 2100, Copenhagen, Denmark
| | - Pelle B Petersen
- Department of Anesthesia and Intensive Care, Hospital of Northern Zealand, Dyrehavevej 29 3400, Hillerød, Denmark
- The Centre for Fast-Track Hip and Knee Replacement, 7621, Rigshospitalet, Blegdamsvej 9, 2100, Copenhagen, Denmark
| | - Troels Petersen
- The Niels Bohr Institute, University of Copenhagen, Blegdamsvej 17, 2100, Copenhagen, Denmark
| | - Henrik Kehlet
- The Centre for Fast-Track Hip and Knee Replacement, 7621, Rigshospitalet, Blegdamsvej 9, 2100, Copenhagen, Denmark
- Section of Surgical Pathophysiology, 7621, Rigshospitalet, Blegdamsvej 9, 2100, Copenhagen, Denmark
| | | | | | | | | | | | | | | | | |
Collapse
|
43
|
Zhang Y, Aaronson KD, Gryak J, Wittrup E, Minoccheri C, Golbus JR, Najarian K. Predicting need for heart failure advanced therapies using an interpretable tropical geometry-based fuzzy neural network. PLoS One 2023; 18:e0295016. [PMID: 38015947 PMCID: PMC10684094 DOI: 10.1371/journal.pone.0295016] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2023] [Accepted: 11/13/2023] [Indexed: 11/30/2023] Open
Abstract
BACKGROUND Timely referral for advanced therapies (i.e., heart transplantation, left ventricular assist device) is critical for ensuring optimal outcomes for heart failure patients. Using electronic health records, our goal was to use data from a single hospitalization to develop an interpretable clinical decision-making system for predicting the need for advanced therapies at the subsequent hospitalization. METHODS Michigan Medicine heart failure patients from 2013-2021 with a left ventricular ejection fraction ≤ 35% and at least two heart failure hospitalizations within one year were used to train an interpretable machine learning model constructed using fuzzy logic and tropical geometry. Clinical knowledge was used to initialize the model. The performance and robustness of the model were evaluated with the mean and standard deviation of the area under the receiver operating curve (AUC), the area under the precision-recall curve (AUPRC), and the F1 score of the ensemble. We inferred membership functions from the model for continuous clinical variables, extracted decision rules, and then evaluated their relative importance. RESULTS The model was trained and validated using data from 557 heart failure hospitalizations from 300 patients, of whom 193 received advanced therapies. The mean (standard deviation) of AUC, AUPRC, and F1 scores of the proposed model initialized with clinical knowledge was 0.747 (0.080), 0.642 (0.080), and 0.569 (0.067), respectively, showing superior predictive performance or increased interpretability over other machine learning methods. The model learned critical risk factors predicting the need for advanced therapies in the subsequent hospitalization. Furthermore, our model displayed transparent rule sets composed of these critical concepts to justify the prediction. CONCLUSION These results demonstrate the ability to successfully predict the need for advanced heart failure therapies by generating transparent and accessible clinical rules although further research is needed to prospectively validate the risk factors identified by the model.
Collapse
Affiliation(s)
- Yufeng Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Keith D. Aaronson
- Department of Internal Medicine, Division of Cardiovascular Medicine, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Jonathan Gryak
- Department of Computer Science, Queens College, City University of New York, New York, New York, United States of America
| | - Emily Wittrup
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Cristian Minoccheri
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Jessica R. Golbus
- Department of Internal Medicine, Division of Cardiovascular Medicine, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Kayvan Najarian
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, United States of America
- Michigan Institute for Data Science, University of Michigan, Ann Arbor, Michigan, United States of America
- Department of Electrical Engineering and Computer Science, University of Michigan, Ann Arbor, Michigan, United States of America
- Department of Emergency Medicine, University of Michigan, Ann Arbor, Michigan, United States of America
| |
Collapse
|
44
|
Hülpüsch C, Rauer L, Nussbaumer T, Schwierzeck V, Bhattacharyya M, Erhart V, Traidl-Hoffmann C, Reiger M, Neumann AU. Benchmarking MicrobIEM - a user-friendly tool for decontamination of microbiome sequencing data. BMC Biol 2023; 21:269. [PMID: 37996810 PMCID: PMC10666409 DOI: 10.1186/s12915-023-01737-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2023] [Accepted: 10/16/2023] [Indexed: 11/25/2023] Open
Abstract
BACKGROUND Microbiome analysis is becoming a standard component in many scientific studies, but also requires extensive quality control of the 16S rRNA gene sequencing data prior to analysis. In particular, when investigating low-biomass microbial environments such as human skin, contaminants distort the true microbiome sample composition and need to be removed bioinformatically. We introduce MicrobIEM, a novel tool to bioinformatically remove contaminants using negative controls. RESULTS We benchmarked MicrobIEM against five established decontamination approaches in four 16S rRNA amplicon sequencing datasets: three serially diluted mock communities (108-103 cells, 0.4-80% contamination) with even or staggered taxon compositions and a skin microbiome dataset. Results depended strongly on user-selected algorithm parameters. Overall, sample-based algorithms separated mock and contaminant sequences best in the even mock, whereas control-based algorithms performed better in the two staggered mocks, particularly in low-biomass samples (≤ 106 cells). We show that a correct decontamination benchmarking requires realistic staggered mock communities and unbiased evaluation measures such as Youden's index. In the skin dataset, the Decontam prevalence filter and MicrobIEM's ratio filter effectively reduced common contaminants while keeping skin-associated genera. CONCLUSIONS MicrobIEM's ratio filter for decontamination performs better or as good as established bioinformatic decontamination tools. In contrast to established tools, MicrobIEM additionally provides interactive plots and supports selecting appropriate filtering parameters via a user-friendly graphical user interface. Therefore, MicrobIEM is the first quality control tool for microbiome experts without coding experience.
Collapse
Affiliation(s)
- Claudia Hülpüsch
- Environmental Medicine, Faculty of Medicine, University of Augsburg, Stenglinstr. 2, 86156, Augsburg, Germany
- Chair of Environmental Medicine, Technical University of Munich, Munich, Germany
- CK CARE, Christine Kühne Center for Allergy Research and Education, Davos, Switzerland
| | - Luise Rauer
- Environmental Medicine, Faculty of Medicine, University of Augsburg, Stenglinstr. 2, 86156, Augsburg, Germany
- Chair of Environmental Medicine, Technical University of Munich, Munich, Germany
- Institute of Environmental Medicine, Helmholtz Munich, Augsburg, Germany
| | - Thomas Nussbaumer
- Institute of Environmental Medicine, Helmholtz Munich, Augsburg, Germany
| | - Vera Schwierzeck
- Institute of Environmental Medicine, Helmholtz Munich, Augsburg, Germany
- Institute of Hygiene, University Hospital Muenster, Muenster, Germany
| | - Madhumita Bhattacharyya
- Environmental Medicine, Faculty of Medicine, University of Augsburg, Stenglinstr. 2, 86156, Augsburg, Germany
- Chair of Environmental Medicine, Technical University of Munich, Munich, Germany
| | - Veronika Erhart
- Environmental Medicine, Faculty of Medicine, University of Augsburg, Stenglinstr. 2, 86156, Augsburg, Germany
| | - Claudia Traidl-Hoffmann
- Environmental Medicine, Faculty of Medicine, University of Augsburg, Stenglinstr. 2, 86156, Augsburg, Germany
- Chair of Environmental Medicine, Technical University of Munich, Munich, Germany
- CK CARE, Christine Kühne Center for Allergy Research and Education, Davos, Switzerland
- Institute of Environmental Medicine, Helmholtz Munich, Augsburg, Germany
- ZIEL - Institute for Food & Health, Technical University of Munich, Freising-Weihenstephan, Germany
| | - Matthias Reiger
- Environmental Medicine, Faculty of Medicine, University of Augsburg, Stenglinstr. 2, 86156, Augsburg, Germany
- Chair of Environmental Medicine, Technical University of Munich, Munich, Germany
- Institute of Environmental Medicine, Helmholtz Munich, Augsburg, Germany
| | - Avidan U Neumann
- Environmental Medicine, Faculty of Medicine, University of Augsburg, Stenglinstr. 2, 86156, Augsburg, Germany.
- Institute of Environmental Medicine, Helmholtz Munich, Augsburg, Germany.
| |
Collapse
|
45
|
Tsuyuzaki K, Ishii M, Nikaido I. Sctensor detects many-to-many cell-cell interactions from single cell RNA-sequencing data. BMC Bioinformatics 2023; 24:420. [PMID: 37936079 PMCID: PMC10631077 DOI: 10.1186/s12859-023-05490-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2023] [Accepted: 09/21/2023] [Indexed: 11/09/2023] Open
Abstract
BACKGROUND Complex biological systems are described as a multitude of cell-cell interactions (CCIs). Recent single-cell RNA-sequencing studies focus on CCIs based on ligand-receptor (L-R) gene co-expression but the analytical methods are not appropriate to detect many-to-many CCIs. RESULTS In this work, we propose scTensor, a novel method for extracting representative triadic relationships (or hypergraphs), which include ligand-expression, receptor-expression, and related L-R pairs. CONCLUSIONS Through extensive studies with simulated and empirical datasets, we have shown that scTensor can detect some hypergraphs that cannot be detected using conventional CCI detection methods, especially when they include many-to-many relationships. scTensor is implemented as a freely available R/Bioconductor package.
Collapse
Affiliation(s)
- Koki Tsuyuzaki
- Laboratory for Bioinformatics Research RIKEN Center for Biosystems Dynamics Research, 2-1 Hirosawa, Wako, Saitama, 351-0198, Japan.
- Japan Science and Technology Agency, PRESTO, 7 Gobancho, Chiyoda-ku, Tokyo, 102-0076, Japan.
| | - Manabu Ishii
- Laboratory for Bioinformatics Research RIKEN Center for Biosystems Dynamics Research, 2-1 Hirosawa, Wako, Saitama, 351-0198, Japan
| | - Itoshi Nikaido
- Laboratory for Bioinformatics Research RIKEN Center for Biosystems Dynamics Research, 2-1 Hirosawa, Wako, Saitama, 351-0198, Japan.
- Department of Functional Genome Informatics, Division of Biological Data Science, Medical Research Institute, Tokyo Medical and Dental University (TMDU), 1-5-45 Yushima, Bunkyo-ku, Tokyo, 113-8510, Japan.
| |
Collapse
|
46
|
Phan TV, Nguyen VTV, Le MT, Nguyen BGD, Vu TT, Thai KM. Identification of efflux pump inhibitors for Pseudomonas aeruginosa MexAB-OprM via ligand-based pharmacophores, 2D-QSAR, molecular docking, and molecular dynamics approaches. Mol Divers 2023:10.1007/s11030-023-10758-9. [PMID: 37919619 DOI: 10.1007/s11030-023-10758-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2023] [Accepted: 10/24/2023] [Indexed: 11/04/2023]
Abstract
Efflux pumps have been reported as one of the significant mechanisms by which bacteria evade the effects of multiple antibiotics. The tripartite efflux pump MexAB-OprM in Pseudomonas aeruginosa is one of the most significant multidrug efflux systems due to its broad resistance to antibiotics such as chloramphenicol, fluoroquinolones, lipophilic β-lactam antibiotics, nalidixic acid, novobiocin, rifampicin, and tetracycline. A promising strategy to overcome this resistance mechanism is to combine antibiotics with efflux pump inhibitors (EPIs), which can increase their intracellular concentration to enhance their biological activities. Based on 143 EPIs with chemically diverse skeletons, the 3D pharmacophore and 2D-QSAR modelings were developed and used for the virtual screening on 9.2 million compounds including ZINC15, DrugBank, and Traditional Chinese Medicine databases to identify new EPIs. The molecular docking was also performed to evaluate the binding affinity of potential EPIs to the distal-binding pocket of MexB and resulted in 611 potential EPIs. The structure-activity relationship analyses suggested that nitrogen heterocyclic compounds, piperazine and pyridine scaffolds, and amide derivatives are the most favorable chemically features for MexAB inhibitory activities. The results from molecular dynamics analysis in 100 ns indicated that ZINC009296881 and ZINC009200074 were the most potential MexB inhibitors with strong binding affinity to the distal pocket and MM/GBSA ∆Gbind values of - 38.97 and - 30.19 kcal mol-1, respectively. The predicted pharmacokinetic properties and toxicity of these compounds indicated their potential oral drugs. Multistep virtual screening of EPIs for MexAB-OprM, efflux pump multidrug resistant of P. aeruginosa.
Collapse
Affiliation(s)
- Thien-Vy Phan
- Faculty of Pharmacy, University of Medicine and Pharmacy at Ho Chi Minh City, Ho Chi Minh City, 700000, Vietnam
- Faculty of Pharmacy, Nguyen Tat Thanh University, Ho Chi Minh City, 700000, Vietnam
| | - Vu-Thuy-Vy Nguyen
- Faculty of Pharmacy, University of Medicine and Pharmacy at Ho Chi Minh City, Ho Chi Minh City, 700000, Vietnam
- Faculty of Pharmacy, Nguyen Tat Thanh University, Ho Chi Minh City, 700000, Vietnam
| | - Minh-Tri Le
- Faculty of Pharmacy, University of Medicine and Pharmacy at Ho Chi Minh City, Ho Chi Minh City, 700000, Vietnam
- School of Medicine, Vietnam National University Ho Chi Minh City, Linh Trung Ward., Thu Duc Dist, Ho Chi Minh City, 700000, Vietnam
| | | | - Thanh-Thao Vu
- Faculty of Pharmacy, University of Medicine and Pharmacy at Ho Chi Minh City, Ho Chi Minh City, 700000, Vietnam
| | - Khac-Minh Thai
- Faculty of Pharmacy, University of Medicine and Pharmacy at Ho Chi Minh City, Ho Chi Minh City, 700000, Vietnam.
| |
Collapse
|
47
|
Chicco D, Haupt R, Garaventa A, Uva P, Luksch R, Cangelosi D. Computational intelligence analysis of high-risk neuroblastoma patient health records reveals time to maximum response as one of the most relevant factors for outcome prediction. Eur J Cancer 2023; 193:113291. [PMID: 37708628 DOI: 10.1016/j.ejca.2023.113291] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2023] [Revised: 07/24/2023] [Accepted: 08/09/2023] [Indexed: 09/16/2023]
Abstract
OBJECTIVE Seek new candidate prognostic markers for neuroblastoma outcome, relapse or progression. MATERIALS AND METHODS In this multicentre and retrospective study, Random Forests coupled with recursive feature elimination techniques were applied to electronic records (55 clinical features) of 3034 neuroblastoma patients. To assess model performance and feature importance, dataset was split into a training set (80%) and a test set (20%). RESULTS In the test set, the mean Matthews correlation coefficient for the Random Forests models was greater than 0.46. Feature importance analysis revealed that, together with maximum response to first-line treatment (D_MAX_RESP), time to maximum response to first-line treatment (TIME_MAX_RESP.days) is a relevant predictor of both patients' outcome and relapse\progression. We showed the prognostic value of the max response to first-line treatment in clinically relevant subsets of high-, intermediate-, and low-risk patients for both overall and relapse-free survival (Log-rank p-value<0.0001). In high-risk patients older than 18 months and stage 4 tumour achieving a complete response or very good partial response, patients who exhibited a D_MAX_RESP greater than 9 months showed a better prognosis with respect to patients achieving D_MAX_RESP earlier than 9 months (overall survival): hazard ratio 3.3 95% confidence interval 1.8-5.9, Log-rank p-value p < 0.0001; relapse-free survival: 3.2 95%CI 1.8-5.6, Log-rank p-value p < 0.0001). CONCLUSION Our findings evidence the emerging role of the TIME_MAX_RESP.days in addition to the D_MAX_RESP as relevant predictors of outcome and relapse\progression in neuroblastoma with potential clinical impact on the management and treatment of patients.
Collapse
Affiliation(s)
- Davide Chicco
- Institute of Health Policy Management and Evaluation, University of Toronto, Toronto, Ontario, Canada; Dipartimento di Informatica Sistemistica e Comunicazione, Università di Milano-Bicocca, Milan, Italy
| | - Riccardo Haupt
- DOPO Clinic, Department of Hematology/Oncology, IRCCS Istituto Giannina Gaslini, Genoa, Italy
| | | | - Paolo Uva
- Unità di Bioinformatica Clinica, IRCCS Istituto Giannina Gaslini, Genoa, Italy
| | - Roberto Luksch
- S.C. Pediatria oncologica, Fondazione IRCCS Istituto Nazionale dei Tumori, Milan, Italy
| | - Davide Cangelosi
- Unità di Bioinformatica Clinica, IRCCS Istituto Giannina Gaslini, Genoa, Italy.
| |
Collapse
|
48
|
Henriques SC, Paixão P, Almeida L, Silva NE. Predictive Potential of C max Bioequivalence in Pilot Bioavailability/Bioequivalence Studies, through the Alternative ƒ 2 Similarity Factor Method. Pharmaceutics 2023; 15:2498. [PMID: 37896259 PMCID: PMC10610255 DOI: 10.3390/pharmaceutics15102498] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2023] [Revised: 10/08/2023] [Accepted: 10/18/2023] [Indexed: 10/29/2023] Open
Abstract
Pilot bioavailability/bioequivalence (BA/BE) studies are downsized trials that can be conducted prior to the definitive pivotal trial. In these trials, 12 to 18 subjects are usually enrolled, although, in principle, a sample size is not formally calculated. In a previous work, authors recommended the use of an alternative approach to the average bioequivalence methodology to evaluate pilot studies' data, using the geometric mean (Gmean) ƒ2 factor with a cut off of 35, which has shown to be an appropriate method to assess the potential bioequivalence for the maximum observed concentration (Cmax) metric under the assumptions of a true Test-to-Reference Geometric Mean Ratio (GMR) of 100% and an inter-occasion variability (IOV) in the range of 10% to 45%. In this work, the authors evaluated the proposed ƒ2 factor in comparison with the standard average bioequivalence in more extreme scenarios, using a true GMR of 90% or 111% for truly bioequivalent formulations, and 80% or 125% for truly bioinequivalent formulations, in order to better derive conclusions on the potential of this analysis method. Several scenarios of pilot BA/BE crossover studies were simulated through population pharmacokinetic modelling, accounting for different IOV levels. A redefined decision tree is proposed, suggesting a fixed sample size of 20 subjects for pilot studies in the case of intra-subject coefficient of variation (ISCV%) > 20% or unknown variability, and suggesting the assessment of study results through the average bioequivalence analysis, and additionally through Gmean ƒ2 factor method in the case of the 90% confidence interval (CI) for GMR is outside the regulatory acceptance bioequivalence interval of [80.00-125.00]%. Using this alternative approach, the certainty levels to proceed with pivotal studies, depending on Gmean ƒ2 values and variability scenarios tested (20-60% IOV), were assessed, which is expected to be helpful in terms of the decision to proceed with pivotal bioequivalence studies.
Collapse
Affiliation(s)
- Sara Carolina Henriques
- Research Institute for Medicines (iMed.ULisboa), Faculty of Pharmacy, Universidade de Lisboa, 1649-003 Lisboa, Portugal;
- BlueClinical Ltd., Senhora da Hora, 4460-439 Matosinhos, Portugal;
| | - Paulo Paixão
- Research Institute for Medicines (iMed.ULisboa), Faculty of Pharmacy, Universidade de Lisboa, 1649-003 Lisboa, Portugal;
| | - Luis Almeida
- BlueClinical Ltd., Senhora da Hora, 4460-439 Matosinhos, Portugal;
| | - Nuno Elvas Silva
- Research Institute for Medicines (iMed.ULisboa), Faculty of Pharmacy, Universidade de Lisboa, 1649-003 Lisboa, Portugal;
| |
Collapse
|
49
|
Foody GM. Challenges in the real world use of classification accuracy metrics: From recall and precision to the Matthews correlation coefficient. PLoS One 2023; 18:e0291908. [PMID: 37792898 PMCID: PMC10550141 DOI: 10.1371/journal.pone.0291908] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2023] [Accepted: 09/07/2023] [Indexed: 10/06/2023] Open
Abstract
The accuracy of a classification is fundamental to its interpretation, use and ultimately decision making. Unfortunately, the apparent accuracy assessed can differ greatly from the true accuracy. Mis-estimation of classification accuracy metrics and associated mis-interpretations are often due to variations in prevalence and the use of an imperfect reference standard. The fundamental issues underlying the problems associated with variations in prevalence and reference standard quality are revisited here for binary classifications with particular attention focused on the use of the Matthews correlation coefficient (MCC). A key attribute claimed of the MCC is that a high value can only be attained when the classification performed well on both classes in a binary classification. However, it is shown here that the apparent magnitude of a set of popular accuracy metrics used in fields such as computer science medicine and environmental science (Recall, Precision, Specificity, Negative Predictive Value, J, F1, likelihood ratios and MCC) and one key attribute (prevalence) were all influenced greatly by variations in prevalence and use of an imperfect reference standard. Simulations using realistic values for data quality in applications such as remote sensing showed each metric varied over the range of possible prevalence and at differing levels of reference standard quality. The direction and magnitude of accuracy metric mis-estimation were a function of prevalence and the size and nature of the imperfections in the reference standard. It was evident that the apparent MCC could be substantially under- or over-estimated. Additionally, a high apparent MCC arose from an unquestionably poor classification. As with some other metrics of accuracy, the utility of the MCC may be overstated and apparent values need to be interpreted with caution. Apparent accuracy and prevalence values can be mis-leading and calls for the issues to be recognised and addressed should be heeded.
Collapse
Affiliation(s)
- Giles M. Foody
- School of Geography, University of Nottingham, Nottingham, Nottinghamshire, United Kingdom
| |
Collapse
|
50
|
Ghorbanali Z, Zare-Mirakabad F, Salehi N, Akbari M, Masoudi-Nejad A. DrugRep-HeSiaGraph: when heterogenous siamese neural network meets knowledge graphs for drug repurposing. BMC Bioinformatics 2023; 24:374. [PMID: 37789314 PMCID: PMC10548718 DOI: 10.1186/s12859-023-05479-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2023] [Accepted: 09/12/2023] [Indexed: 10/05/2023] Open
Abstract
BACKGROUND Drug repurposing is an approach that holds promise for identifying new therapeutic uses for existing drugs. Recently, knowledge graphs have emerged as significant tools for addressing the challenges of drug repurposing. However, there are still major issues with constructing and embedding knowledge graphs. RESULTS This study proposes a two-step method called DrugRep-HeSiaGraph to address these challenges. The method integrates the drug-disease knowledge graph with the application of a heterogeneous siamese neural network. In the first step, a drug-disease knowledge graph named DDKG-V1 is constructed by defining new relationship types, and then numerical vector representations for the nodes are created using the distributional learning method. In the second step, a heterogeneous siamese neural network called HeSiaNet is applied to enrich the embedding of drugs and diseases by bringing them closer in a new unified latent space. Then, it predicts potential drug candidates for diseases. DrugRep-HeSiaGraph achieves impressive performance metrics, including an AUC-ROC of 91.16%, an AUC-PR of 90.32%, an accuracy of 84.63%, a BS of 0.119, and an MCC of 69.31%. CONCLUSION We demonstrate the effectiveness of the proposed method in identifying potential drugs for COVID-19 as a case study. In addition, this study shows the role of dipeptidyl peptidase 4 (DPP-4) as a potential receptor for SARS-CoV-2 and the effectiveness of DPP-4 inhibitors in facing COVID-19. This highlights the practical application of the model in addressing real-world challenges in the field of drug repurposing. The code and data for DrugRep-HeSiaGraph are publicly available at https://github.com/CBRC-lab/DrugRep-HeSiaGraph .
Collapse
Affiliation(s)
- Zahra Ghorbanali
- Computational Biology Research Center (CBRC), Department of Mathematics and Computer Science, Amirkabir University of Technology, Tehran, Iran
| | - Fatemeh Zare-Mirakabad
- Computational Biology Research Center (CBRC), Department of Mathematics and Computer Science, Amirkabir University of Technology, Tehran, Iran.
| | - Najmeh Salehi
- School of Biological Science, Institute for Research in Fundamental Sciences (IPM), Tehran, Iran
| | - Mohammad Akbari
- Computational Biology Research Center (CBRC), Department of Mathematics and Computer Science, Amirkabir University of Technology, Tehran, Iran
| | - Ali Masoudi-Nejad
- Laboratory of Systems Biology and Bioinformatics (LBB), Institute of Biochemistry and Biophysics, University of Tehran, Tehran, Iran
| |
Collapse
|