1
|
Zulfat M, Hakami MA, Hazazi A, Mahmood A, Khalid A, Alqurashi RS, Abdalla AN, Hu J, Wadood A, Huang X. Identification of novel NLRP3 inhibitors as therapeutic options for epilepsy by machine learning-based virtual screening, molecular docking and biomolecular simulation studies. Heliyon 2024; 10:e34410. [PMID: 39170440 PMCID: PMC11336274 DOI: 10.1016/j.heliyon.2024.e34410] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2023] [Revised: 07/06/2024] [Accepted: 07/09/2024] [Indexed: 08/23/2024] Open
Abstract
The NOD-Like Receptor Protein-3 (NLRP3) inflammasome is a key therapeutic target for the treatment of epilepsy and has been reported to regulate inflammation in several neurological diseases. In this study, a machine learning-based virtual screening strategy has investigated candidate active compounds that inhibit the NLRP3 inflammasome. As machine learning-based virtual screening has the potential to accurately predict protein-ligand binding and reduce false positives outcomes compared to traditional virtual screening. Briefly, classification models were created using Support Vector Machine (SVM), Random Forest (RF), and K-Nearest Neighbor (KNN) machine learning methods. To determine the most crucial features of a molecule's activity, feature selection was carried out. By utilizing 10-fold cross-validation, the created models were analyzed. Among the generated models, the RF model obtained the best results as compared to others. Therefore, the RF model was used as a screening tool against the large chemical databases. Molecular operating environment (MOE) and PyRx software's were applied for molecular docking. Also, using the Amber Tools program, molecular dynamics (MD) simulation of potent inhibitors was carried out. The results showed that the KNN, SVM, and RF accuracy was 0.911 %, 0.906 %, and 0.946 %, respectively. Moreover, the model has shown sensitivity of 0.82 %, 0.78 %, and 0.86 % and specificity of 0.95 %, 0.96 %, and 0.98 % respectively. By applying the model to the ZINC and South African databases, we identified 98 and 39 compounds, respectively, potentially possessing anti-NLRP3 activity. Also, a molecular docking analysis produced ten ZINC and seven South African compounds that has comparable binding affinities to the reference drug. Moreover, MD analysis of the two complexes revealed that the two compounds (ZINC000009601348 and SANC00225) form stable complexes with varying amounts of binding energy. The in-silico studies indicate that both compounds most likely display their inhibitory effect by inhibiting the NLRP3 protein.
Collapse
Affiliation(s)
- Maryam Zulfat
- Department of Biochemistry, Computational Medicinal Chemistry Laboratory, Abdul Wali Khan University, Mardan, Pakistan
| | - Mohammed Ageeli Hakami
- Department of Clinical Laboratory Sciences, College of Applied Medical Sciences, Shaqra University, Al-Quwayiyah-19257, Riyadh, Saudi Arabia
| | - Ali Hazazi
- Department of Pathology and Laboratory Medicine, Security Forces Hospital Program, Riyadh, Saudi Arabia
- College of Medicine, Alfaisal University, Riyadh, Saudi Arabia
| | - Arif Mahmood
- Department of Biochemistry, Quaid-i-Azam University Islamabad, Pakistan
| | - Asaad Khalid
- Substance Abuse and Toxicology Research Center, Jazan University, P.O. Box: 114, Jazan 45142, Saudi Arabia
| | - Roaya S. Alqurashi
- Department of Pharmacology and Toxicology, College of Pharmacy, Umm Al-Qura University, Makkah 21955, Saudi Arabia
| | - Ashraf N. Abdalla
- Department of Pharmacology and Toxicology, College of Pharmacy, Umm Al-Qura University, Makkah 21955, Saudi Arabia
| | - Junjian Hu
- Department of Central Laboratory, SSL, Central Hospital of Dongguan City, Affiliated Dongguan Shilong People's Hospital of Guangdong Medical University, Dongguan, China
| | - Abdul Wadood
- Department of Biochemistry, Computational Medicinal Chemistry Laboratory, Abdul Wali Khan University, Mardan, Pakistan
| | - Xiaoyun Huang
- Department of Neurology, Houjie Hospital and Clinical College of Guangdong Medical University, Dongguan, China
| |
Collapse
|
2
|
Valdés JJ, Tchagang AB. Novel machine learning insights into the QM7b and QM9 quantum mechanics datasets. J Comput Chem 2024; 45:1193-1214. [PMID: 38329198 DOI: 10.1002/jcc.27295] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2023] [Revised: 12/06/2023] [Accepted: 12/12/2023] [Indexed: 02/09/2024]
Abstract
This paper (i) explores the internal structure of two quantum mechanics datasets (QM7b, QM9), composed of several thousands of organic molecules and described in terms of electronic properties, and (ii) further explores an inverse design approach to molecular design consisting of using machine learning methods to approximate the atomic composition of molecules, using QM9 data. Understanding the structure and characteristics of this kind of data is important when predicting the atomic composition from physical-chemical properties in inverse molecular designs. Intrinsic dimension analysis, clustering, and outlier detection methods were used in the study. They revealed that for both datasets the intrinsic dimensionality is several times smaller than the descriptive dimensions. The QM7b data is composed of well-defined clusters related to atomic composition. The QM9 data consists of an outer region predominantly composed of outliers, and an inner, core region that concentrates clustered inliner objects. A significant relationship exists between the number of atoms in the molecule and its outlier/inliner nature. The spatial structure exhibits a relationship with molecular weight. Despite the structural differences between the two datasets, the predictability of variables of interest for inverse molecular design is high. This is exemplified by models estimating the number of atoms of the molecule from both the original properties and from lower dimensional embedding spaces. In the generative approach the input is given by a set of desired properties of the molecule and the output is an approximation of the atomic composition in terms of its constituent chemical elements. This could serve as the starting region for further search in the huge space determined by the set of possible chemical compounds. The quantum mechanic's dataset QM9 is used in the study, composed of 133,885 small organic molecules and 19 electronic properties. Different multi-target regression approaches were considered for predicting the atomic composition from the properties, including feature engineering techniques in an auto-machine learning framework. High-quality models were found that predict the atomic composition of the molecules from their electronic properties, as well as from a subset of only 52.6% size. Feature selection worked better than feature generation. The results validate the generative approach to inverse molecular design.
Collapse
Affiliation(s)
- Julio J Valdés
- National Research Council Canada, Digital Technologies Research Centre, Ottawa, Canada
| | - Alain B Tchagang
- National Research Council Canada, Digital Technologies Research Centre, Ottawa, Canada
| |
Collapse
|
3
|
Balwani A, Cho S, Choi H. Exploring the Architectural Biases of the Canonical Cortical Microcircuit. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.05.23.595629. [PMID: 38826320 PMCID: PMC11142214 DOI: 10.1101/2024.05.23.595629] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/04/2024]
Abstract
The cortex plays a crucial role in various perceptual and cognitive functions, driven by its basic unit, the canonical cortical microcircuit. Yet, we remain short of a framework that definitively explains the structure-function relationships of this fundamental neuroanatomical motif. To better understand how physical substrates of cortical circuitry facilitate their neuronal dynamics, we employ a computational approach using recurrent neural networks and representational analyses. We examine the differences manifested by the inclusion and exclusion of biologically-motivated inter-areal laminar connections on the computational roles of different neuronal populations in the microcircuit of two hierarchically-related areas, throughout learning. Our findings show that the presence of feedback connections correlates with the functional modularization of cortical populations in different layers, and provides the microcircuit with a natural inductive bias to differentiate expected and unexpected inputs at initialization. Furthermore, when testing the effects of training the microcircuit and its variants with a predictive-coding inspired strategy, we find that doing so helps better encode noisy stimuli in areas of the cortex that receive feedback, all of which combine to suggest evidence for a predictive-coding mechanism serving as an intrinsic operative logic in the cortex.
Collapse
Affiliation(s)
- Aishwarya Balwani
- School of Electrical & Computer Engineering, Georgia Institute of Technology
| | - Suhee Cho
- Department of Brain and Cognitive Sciences, Korea Advanced Institute of Science Technology
| | - Hannah Choi
- School of Mathematics, Georgia Institute of Technology
| |
Collapse
|
4
|
Abhishek K, Brown CJ, Hamarneh G. Multi-sample ζ-mixup: richer, more realistic synthetic samples from a p-series interpolant. JOURNAL OF BIG DATA 2024; 11:43. [PMID: 38528850 PMCID: PMC10960781 DOI: 10.1186/s40537-024-00898-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 02/20/2023] [Accepted: 02/28/2024] [Indexed: 03/27/2024]
Abstract
Modern deep learning training procedures rely on model regularization techniques such as data augmentation methods, which generate training samples that increase the diversity of data and richness of label information. A popular recent method, mixup, uses convex combinations of pairs of original samples to generate new samples. However, as we show in our experiments, mixup can produce undesirable synthetic samples, where the data is sampled off the manifold and can contain incorrect labels. We propose ζ -mixup, a generalization of mixup with provably and demonstrably desirable properties that allows convex combinations of T ≥ 2 samples, leading to more realistic and diverse outputs that incorporate information from T original samples by using a p-series interpolant. We show that, compared to mixup, ζ -mixup better preserves the intrinsic dimensionality of the original datasets, which is a desirable property for training generalizable models. Furthermore, we show that our implementation of ζ -mixup is faster than mixup, and extensive evaluation on controlled synthetic and 26 diverse real-world natural and medical image classification datasets shows that ζ -mixup outperforms mixup, CutMix, and traditional data augmentation techniques. The code will be released at https://github.com/kakumarabhishek/zeta-mixup.
Collapse
Affiliation(s)
- Kumar Abhishek
- School of Computing Science, Simon Fraser University, 8888 University Drive, Burnaby, V5A 1S6 Canada
| | - Colin J Brown
- Engineering, Hinge Health, 455 Market Street, Suite 700, San Francisco, 94105 USA
| | - Ghassan Hamarneh
- School of Computing Science, Simon Fraser University, 8888 University Drive, Burnaby, V5A 1S6 Canada
| |
Collapse
|
5
|
Jin Y, Yin H, Zhang H, Wang Y, Liu S, Yang L, Song B. Predicting tumor deposits in rectal cancer: a combined deep learning model using T2-MR imaging and clinical features. Insights Imaging 2023; 14:221. [PMID: 38117396 PMCID: PMC10733230 DOI: 10.1186/s13244-023-01564-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2023] [Accepted: 11/05/2023] [Indexed: 12/21/2023] Open
Abstract
BACKGROUND Tumor deposits (TDs) are associated with poor prognosis in rectal cancer (RC). This study aims to develop and validate a deep learning (DL) model incorporating T2-MR image and clinical factors for the preoperative prediction of TDs in RC patients. METHODS AND METHODS A total of 327 RC patients with pathologically confirmed TDs status from January 2016 to December 2019 were retrospectively recruited, and the T2-MR images and clinical variables were collected. Patients were randomly split into a development dataset (n = 246) and an independent testing dataset (n = 81). A single-channel DL model, a multi-channel DL model, a hybrid DL model, and a clinical model were constructed. The performance of these predictive models was assessed by using receiver operating characteristics (ROC) analysis and decision curve analysis (DCA). RESULTS The areas under the curves (AUCs) of the clinical, single-DL, multi-DL, and hybrid-DL models were 0.734 (95% CI, 0.674-0.788), 0.710 (95% CI, 0.649-0.766), 0.767 (95% CI, 0.710-0.819), and 0.857 (95% CI, 0.807-0.898) in the development dataset. The AUC of the hybrid-DL model was significantly higher than the single-DL and multi-DL models (both p < 0.001) in the development dataset, and the single-DL model (p = 0.028) in the testing dataset. Decision curve analysis demonstrated the hybrid-DL model had higher net benefit than other models across the majority range of threshold probabilities. CONCLUSIONS The proposed hybrid-DL model achieved good predictive efficacy and could be used to predict tumor deposits in rectal cancer. CRITICAL RELEVANCE STATEMENT The proposed hybrid-DL model achieved good predictive efficacy and could be used to predict tumor deposits in rectal cancer. KEY POINTS • Preoperative non-invasive identification of TDs is of great clinical significance. • The combined hybrid-DL model achieved good predictive efficacy and could be used to predict tumor deposits in rectal cancer. • A preoperative nomogram provides gastroenterologist with an accurate and effective tool.
Collapse
Affiliation(s)
- Yumei Jin
- Department of Medical Imaging Center, Qujing First People's Hospital, Qujing, 655000, Yunnan Province, China.
- Department of Radiology, West China Hospital of Sichuan University, Chengdu, 610041, Sichuan Province, China.
| | - Hongkun Yin
- Beijing Infervision Technology Co.Ltd, Beijing, China
| | - Huiling Zhang
- Beijing Infervision Technology Co.Ltd, Beijing, China
| | - Yewu Wang
- Department of Joint and Sports Medicine, Qujing First People's Hospital, Qujing, 655000, Yunnan Province, China
| | - Shengmei Liu
- Department of Radiology, West China Hospital of Sichuan University, Chengdu, 610041, Sichuan Province, China
| | - Ling Yang
- Department of Radiology, West China Hospital of Sichuan University, Chengdu, 610041, Sichuan Province, China
| | - Bin Song
- Department of Radiology, West China Hospital of Sichuan University, Chengdu, 610041, Sichuan Province, China.
- Functional and Molecular Imaging Key Laboratory of Sichuan Province, West China Hospital of Sichuan University, Chengdu, 610041, Sichuan Province, China.
- Department of Radiology, Sanya People's Hospital, Sanya, Hainan Province, 572000, China.
| |
Collapse
|
6
|
Swinburne TD. Coarse-Graining and Forecasting Atomic Material Simulations with Descriptors. PHYSICAL REVIEW LETTERS 2023; 131:236101. [PMID: 38134806 DOI: 10.1103/physrevlett.131.236101] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/10/2023] [Revised: 07/21/2023] [Accepted: 11/13/2023] [Indexed: 12/24/2023]
Abstract
Atomic simulations of materials require significant resources to generate, store, and analyze. Here, descriptor functions are proposed as a general, metric latent space for atomic structures, ideal for use in large-scale simulations. Descriptors can regress a broad range of properties, including character-dependent dislocation densities, stress states, or radial distribution functions. A vector autoregressive model can generate trajectories over yield points, resample from new initial conditions and forecast trajectory futures. A forecast confidence, essential for practical application, is derived by propagating forecasts through the Mahalanobis outlier distance, providing a powerful tool to assess coarse-grained models. Application to nanoparticles and yielding of nanoscale dislocation networks confirms low uncertainty forecasts are accurate and resampling allows for the propagation of smooth property distributions. Yielding is associated with a collapse in the intrinsic dimension of the descriptor manifold, which is discussed in relation to the yield surface.
Collapse
Affiliation(s)
- Thomas D Swinburne
- Aix-Marseille Université, CNRS, CINaM UMR 7325, Campus de Luminy, 13288 Marseille, France
| |
Collapse
|
7
|
Flahaut M, Leprohon P, Pham NP, Gingras H, Bourbeau J, Papadopoulou B, Maltais F, Ouellette M. Distinctive features of the oropharyngeal microbiome in Inuit of Nunavik and correlations of mild to moderate bronchial obstruction with dysbiosis. Sci Rep 2023; 13:16622. [PMID: 37789055 PMCID: PMC10547696 DOI: 10.1038/s41598-023-43821-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2023] [Accepted: 09/28/2023] [Indexed: 10/05/2023] Open
Abstract
Inuit of Nunavik are coping with living conditions that can influence respiratory health. Our objective was to investigate associations between respiratory health in Inuit communities and their airway microbiome. Oropharyngeal samples were collected during the Qanuilirpitaa? 2017 Inuit Health Survey and subjected to metagenomic analyses. Participants were assigned to a bronchial obstruction group or a control group based on their clinical history and their pulmonary function, as monitored by spirometry. The Inuit microbiota composition was found to be distinct from other studied populations. Within the Inuit microbiota, differences in diversity measures tend to distinguish the two groups. Bacterial taxa found to be more abundant in the control group included candidate probiotic strains, while those enriched in the bronchial obstruction group included opportunistic pathogens. Crossing taxa affiliation method and machine learning consolidated our finding of distinct core microbiomes between the two groups. More microbial metabolic pathways were enriched in the control participants and these were often involved in vitamin and anti-inflammatory metabolism, while a link could be established between the enriched pathways in the disease group and inflammation. Overall, our results suggest a link between microbial abundance, interactions and metabolic activities and respiratory health in the Inuit population.
Collapse
Affiliation(s)
- Mathilde Flahaut
- Centre de Recherche en Infectiologie and Département de Microbiologie, Infectiologie et Immunologie, Faculté de Médecine, Université Laval, Québec City, QC, Canada
| | - Philippe Leprohon
- Centre de Recherche en Infectiologie and Département de Microbiologie, Infectiologie et Immunologie, Faculté de Médecine, Université Laval, Québec City, QC, Canada
| | - Nguyen Phuong Pham
- Centre de Recherche en Infectiologie and Département de Microbiologie, Infectiologie et Immunologie, Faculté de Médecine, Université Laval, Québec City, QC, Canada
| | - Hélène Gingras
- Centre de Recherche en Infectiologie and Département de Microbiologie, Infectiologie et Immunologie, Faculté de Médecine, Université Laval, Québec City, QC, Canada
| | - Jean Bourbeau
- Department of Medicine, Division of Respiratory Medicine, McGill University Health Center, Montréal, QC, Canada
| | - Barbara Papadopoulou
- Centre de Recherche en Infectiologie and Département de Microbiologie, Infectiologie et Immunologie, Faculté de Médecine, Université Laval, Québec City, QC, Canada
| | - François Maltais
- Groupe de Recherche en Santé Respiratoire, Centre de Recherche de L'Institut Universitaire de Cardiologie et de Pneumologie de Québec, Faculté de Médecine, Université Laval, Québec City, QC, Canada
| | - Marc Ouellette
- Centre de Recherche en Infectiologie and Département de Microbiologie, Infectiologie et Immunologie, Faculté de Médecine, Université Laval, Québec City, QC, Canada.
| |
Collapse
|
8
|
Wickersham M, Bartelo N, Kulm S, Liu Y, Zhang Y, Elemento O. USING MACHINE LEARNING METHODS TO ASSESS THE RISK OF ALCOHOL MISUSE IN OLDER ADULTS. RESEARCH SQUARE 2023:rs.3.rs-3154584. [PMID: 37886491 PMCID: PMC10602059 DOI: 10.21203/rs.3.rs-3154584/v1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/28/2023]
Abstract
The population of older adults, defined in this study as those 50 years of age or older, continues to increase every year. Substance misuse, particularly alcohol misuse, is often neglected in these individuals. To better identify older adults who might not be properly assessed for alcohol misuse, we have derived a risk assessment tool using patients from the United Kingdom Biobank (UKB), which was validated on patients in the Weill Cornell Medicine (WCM) electronic health record (EHR). The model and tooling created stratifies the risk of alcohol misuse in older adults using 10 features that are commonly found in most EHR systems. We found that the area under the receiver operating curve (AUROC) to correctly predict alcohol misuse in older adults for the UKB and WCM models were 0.84 and 0.78, respectively. We further show that of those who self-identified as having ongoing alcohol misuse in the UKB cohort, only 12.5% of these patients had any alcohol-related F.10 ICD-10 code. Extending this to the WCM cohort, we forecast that 7,838 out of 12,360 older adults with no F.10 ICD-10 code (63.4%) may be missed as having alcohol misuse in the EHR. Overall, this study importantly prioritizes the health of older adults by being able to predict alcohol misuse in an understudied population.
Collapse
Affiliation(s)
- Matthew Wickersham
- Weill-Cornell/Rockefeller/Sloan-Kettering Tri-Institutional MD-PhD Program, New York, New York, United States
- Department of Physiology and Biophysics, Weill Cornell Medicine, New York, New York, United States
| | - Nicholas Bartelo
- Department of Physiology and Biophysics, Weill Cornell Medicine, New York, New York, United States
| | - Scott Kulm
- Department of Physiology and Biophysics, Weill Cornell Medicine, New York, New York, United States
| | - Yifan Liu
- Department of Population Health Sciences, Weill Cornell Medicine, New York, New York, United States
| | - Yiye Zhang
- Department of Population Health Sciences, Weill Cornell Medicine, New York, New York, United States
- Department of Emergency Medicine, Weill Cornell Medicine, New York, New York, United States
| | - Olivier Elemento
- Department of Physiology and Biophysics, Weill Cornell Medicine, New York, New York, United States
- Caryl and Israel Englander Institute for Precision Medicine, Weill Cornell Medicine, New York, New York, United States
| |
Collapse
|
9
|
Wu J, Li C, Gao P, Zhang C, Zhang P, Zhang L, Dai C, Zhang K, Shi B, Liu M, Zheng J, Pan B, Chen Z, Zhang C, Liao W, Pan W, Fang W, Chen C. Intestinal microbiota links to allograft stability after lung transplantation: a prospective cohort study. Signal Transduct Target Ther 2023; 8:326. [PMID: 37652953 PMCID: PMC10471611 DOI: 10.1038/s41392-023-01515-3] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2022] [Revised: 05/17/2023] [Accepted: 05/28/2023] [Indexed: 09/02/2023] Open
Abstract
Whether the alternated microbiota in the gut contribute to the risk of allograft rejection (AR) and pulmonary infection (PI) in the setting of lung transplant recipients (LTRs) remains unexplored. A prospective multicenter cohort of LTRs was identified in the four lung transplant centers. Paired fecal and serum specimens were collected and divided into AR, PI, and event-free (EF) groups according to the diagnosis at sampling. Fecal samples were determined by metagenomic sequencing. And metabolites and cytokines were detected in the paired serum to analyze the potential effect of the altered microbiota community. In total, we analyzed 146 paired samples (AR = 25, PI = 43, and EF = 78). Notably, we found that the gut microbiome of AR followed a major depletion pattern with decreased 487 species and compositional diversity. Further multi-omics analysis showed depleted serum metabolites and increased inflammatory cytokines in AR and PI. Bacteroides uniformis, which declined in AR (2.4% vs 0.6%) and was negatively associated with serum IL-1β and IL-12, was identified as a driven specie in the network of gut microbiome of EF. Functionally, the EF specimens were abundant in probiotics related to mannose and cationic antimicrobial peptide metabolism. Furthermore, a support-vector machine classifier based on microbiome, metabolome, and clinical parameters highly predicted AR (AUPRC = 0.801) and PI (AUPRC = 0.855), whereby the microbiome dataset showed a particularly high diagnostic power. In conclusion, a disruptive gut microbiota showed a significant association with allograft rejection and infection and with systemic cytokines and metabolites in LTRs.
Collapse
Affiliation(s)
- Junqi Wu
- Department of Thoracic Surgery, Shanghai Pulmonary Hospital, School of Medicine, Tongji University, Shanghai, China
- Shanghai Engineering Research Center of Lung Transplantation, Shanghai, China
| | - Chongwu Li
- Department of Thoracic Surgery, Shanghai Pulmonary Hospital, School of Medicine, Tongji University, Shanghai, China
- Shanghai Engineering Research Center of Lung Transplantation, Shanghai, China
| | - Peigen Gao
- Department of Thoracic Surgery, Shanghai Pulmonary Hospital, School of Medicine, Tongji University, Shanghai, China
- Shanghai Engineering Research Center of Lung Transplantation, Shanghai, China
| | - Chenhong Zhang
- State Key Laboratory of Microbial Metabolism, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, China
| | - Pei Zhang
- Department of Thoracic Surgery, Shanghai Pulmonary Hospital, School of Medicine, Tongji University, Shanghai, China
- Shanghai Engineering Research Center of Lung Transplantation, Shanghai, China
| | - Lei Zhang
- Department of Thoracic Surgery, Shanghai Pulmonary Hospital, School of Medicine, Tongji University, Shanghai, China
- Shanghai Engineering Research Center of Lung Transplantation, Shanghai, China
| | - Chenyang Dai
- Department of Thoracic Surgery, Shanghai Pulmonary Hospital, School of Medicine, Tongji University, Shanghai, China
- Shanghai Engineering Research Center of Lung Transplantation, Shanghai, China
| | - Kunpeng Zhang
- Department of Thoracic Surgery, Shanghai Pulmonary Hospital, School of Medicine, Tongji University, Shanghai, China
- Shanghai Engineering Research Center of Lung Transplantation, Shanghai, China
| | - Bowen Shi
- Department of Thoracic Surgery, Changhai Hospital, Naval Medical University, Shanghai, China
| | - Mengyang Liu
- Department of Thoracic Surgery, The First Affiliated Hospital of Guangzhou Medical University, Guangzhou, China
| | - Junmeng Zheng
- Department of Cardiovascular Surgery, Sun Yat-sen Memorial Hospital, Sun Yat-sen University, Guangzhou, China
| | - Bo Pan
- Department of Dermatology, Shanghai Key Laboratory of Molecular Medical Mycology, Shanghai Changzheng Hospital, Naval Medical University, Shanghai, China
| | - Zhan Chen
- Adfontes (Shanghai) Bio-technology Co., Ltd, Shanghai, China
| | - Chao Zhang
- Department of Dermatology, Shanghai Key Laboratory of Molecular Medical Mycology, Shanghai Changzheng Hospital, Naval Medical University, Shanghai, China
| | - Wanqing Liao
- Department of Dermatology, Shanghai Key Laboratory of Molecular Medical Mycology, Shanghai Changzheng Hospital, Naval Medical University, Shanghai, China
| | - Weihua Pan
- Department of Dermatology, Shanghai Key Laboratory of Molecular Medical Mycology, Shanghai Changzheng Hospital, Naval Medical University, Shanghai, China.
| | - Wenjie Fang
- Department of Dermatology, Shanghai Key Laboratory of Molecular Medical Mycology, Shanghai Changzheng Hospital, Naval Medical University, Shanghai, China.
| | - Chang Chen
- Department of Thoracic Surgery, Shanghai Pulmonary Hospital, School of Medicine, Tongji University, Shanghai, China.
- Shanghai Engineering Research Center of Lung Transplantation, Shanghai, China.
| |
Collapse
|
10
|
Wang Z, Sun L, Xu Y, Liang P, Xu K, Huang J. Discovery of novel JAK1 inhibitors through combining machine learning, structure-based pharmacophore modeling and bio-evaluation. J Transl Med 2023; 21:579. [PMID: 37641144 PMCID: PMC10464202 DOI: 10.1186/s12967-023-04443-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2023] [Accepted: 08/16/2023] [Indexed: 08/31/2023] Open
Abstract
BACKGROUND Janus kinase 1 (JAK1) plays a critical role in most cytokine-mediated inflammatory, autoimmune responses and various cancers via the JAK/STAT signaling pathway. Inhibition of JAK1 is therefore an attractive therapeutic strategy for several diseases. Recently, high-performance machine learning techniques have been increasingly applied in virtual screening to develop new kinase inhibitors. Our study aimed to develop a novel layered virtual screening method based on machine learning (ML) and pharmacophore models to identify the potential JAK1 inhibitors. METHODS Firstly, we constructed a high-quality dataset comprising 3834 JAK1 inhibitors and 12,230 decoys, followed by establishing a series of classification models based on a combination of three molecular descriptors and six ML algorithms. To further screen potential compounds, we constructed several pharmacophore models based on Hiphop and receptor-ligand algorithms. We then used molecular docking to filter the recognized compounds. Finally, the binding stability and enzyme inhibition activity of the identified compounds were assessed by molecular dynamics (MD) simulations and in vitro enzyme activity tests. RESULTS The best performance ML model DNN-ECFP4 and two pharmacophore models Hiphop3 and 6TPF 08 were utilized to screen the ZINC database. A total of 13 potentially active compounds were screened and the MD results demonstrated that all of the above molecules could bind with JAK1 stably in dynamic conditions. Among the shortlisted compounds, the four purchasable compounds demonstrated significant kinase inhibition activity, with Z-10 being the most active (IC50 = 194.9 nM). CONCLUSION The current study provides an efficient and accurate integrated model. The hit compounds were promising candidates for the further development of novel JAK1 inhibitors.
Collapse
Affiliation(s)
- Zixiao Wang
- Department of Pharmacy, Honghui Hospital, Xi' an Jiaotong University, Xi' an, 710054, China.
| | - Lili Sun
- Department of Pharmacy, The First Affiliated Hospital of Xi'an Jiaotong University, Xi'an, 710061, China
| | - Yu Xu
- State Key Laboratory of Natural Medicines,Jiangsu Key Laboratory of Drug Discovery for Metabolic Diseases, Center of Drug Discovery,China Pharmaceutical University, Nanjing, 210009, China
| | - Peida Liang
- Department of Pharmacy, Honghui Hospital, Xi' an Jiaotong University, Xi' an, 710054, China
| | - Kaiyan Xu
- School of Pharmacy, Lanzhou University, Lanzhou, 730000, China
| | - Jing Huang
- Department of Pharmacy, Honghui Hospital, Xi' an Jiaotong University, Xi' an, 710054, China.
| |
Collapse
|
11
|
Wang Y, Shi Y, Zhang C, Su K, Hu Y, Chen L, Wu Y, Huang H. Fetal weight estimation based on deep neural network: a retrospective observational study. BMC Pregnancy Childbirth 2023; 23:560. [PMID: 37533038 PMCID: PMC10394792 DOI: 10.1186/s12884-023-05819-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2022] [Accepted: 06/27/2023] [Indexed: 08/04/2023] Open
Abstract
BACKGROUND Improving the accuracy of estimated fetal weight (EFW) calculation can contribute to decision-making for obstetricians and decrease perinatal complications. This study aimed to develop a deep neural network (DNN) model for EFW based on obstetric electronic health records. METHODS This study retrospectively analyzed the electronic health records of pregnant women with live births delivery at the obstetrics department of International Peace Maternity & Child Health Hospital between January 2016 and December 2018. The DNN model was evaluated using Hadlock's formula and multiple linear regression. RESULTS A total of 34824 live births (23922 primiparas) from 49896 pregnant women were analyzed. The root-mean-square error of DNN model was 189.64 g (95% CI 187.95 g-191.16 g), and the mean absolute percentage error was 5.79% (95%CI: 5.70%-5.81%), significantly lower compared to Hadlock's formula (240.36 g and 6.46%, respectively). By combining with previously unreported factors, such as birth weight of prior pregnancies, a concise and effective DNN model was built based on only 10 parameters. Accuracy rate of a new model increased from 76.08% to 83.87%, with root-mean-square error of only 243.80 g. CONCLUSIONS Proposed DNN model for EFW calculation is more accurate than previous approaches in this area and be adopted for better decision making related to fetal monitoring.
Collapse
Affiliation(s)
- Yifei Wang
- International Peace Maternity and Child Health Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, 200030, China
- Shanghai Key Laboratory of Embryo Original Diseases, Shanghai, 200030, China
| | - Yi Shi
- Bio-X Institutes, Key Laboratory for the Genetics of Developmental and Neuropsychiatric Disorders, Shanghai Jiao Tong University, Shanghai, 200030, China
| | - Chenjie Zhang
- International Peace Maternity and Child Health Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, 200030, China
- Shanghai Key Laboratory of Embryo Original Diseases, Shanghai, 200030, China
| | - Kaizhen Su
- International Peace Maternity and Child Health Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, 200030, China
- Shanghai Key Laboratory of Embryo Original Diseases, Shanghai, 200030, China
| | - Yixiao Hu
- Department of Mathematical Sciences, Tsinghua University, Beijing, 100084, China
| | - Lei Chen
- International Peace Maternity and Child Health Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, 200030, China
| | - Yanting Wu
- Obstetrics and Gynecology Hospital, Institute of Reproduction and Development, Fudan University, Shanghai, 200011, China.
- Research Units of Embryo Original Diseases, Chinese Academy of Medical Sciences, , Shanghai, China.
| | - Hefeng Huang
- International Peace Maternity and Child Health Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, 200030, China.
- Shanghai Key Laboratory of Embryo Original Diseases, Shanghai, 200030, China.
- Obstetrics and Gynecology Hospital, Institute of Reproduction and Development, Fudan University, Shanghai, 200011, China.
- Research Units of Embryo Original Diseases, Chinese Academy of Medical Sciences, , Shanghai, China.
- Research Units of Embryo Original Diseases (No. 2019RU056), Chinese Academy of Medical Sciences, Shanghai, China.
| |
Collapse
|
12
|
Gonzalez-Castillo J, Fernandez IS, Lam KC, Handwerker DA, Pereira F, Bandettini PA. Manifold learning for fMRI time-varying functional connectivity. Front Hum Neurosci 2023; 17:1134012. [PMID: 37497043 PMCID: PMC10366614 DOI: 10.3389/fnhum.2023.1134012] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2022] [Accepted: 06/21/2023] [Indexed: 07/28/2023] Open
Abstract
Whole-brain functional connectivity (FC) measured with functional MRI (fMRI) evolves over time in meaningful ways at temporal scales going from years (e.g., development) to seconds [e.g., within-scan time-varying FC (tvFC)]. Yet, our ability to explore tvFC is severely constrained by its large dimensionality (several thousands). To overcome this difficulty, researchers often seek to generate low dimensional representations (e.g., 2D and 3D scatter plots) hoping those will retain important aspects of the data (e.g., relationships to behavior and disease progression). Limited prior empirical work suggests that manifold learning techniques (MLTs)-namely those seeking to infer a low dimensional non-linear surface (i.e., the manifold) where most of the data lies-are good candidates for accomplishing this task. Here we explore this possibility in detail. First, we discuss why one should expect tvFC data to lie on a low dimensional manifold. Second, we estimate what is the intrinsic dimension (ID; i.e., minimum number of latent dimensions) of tvFC data manifolds. Third, we describe the inner workings of three state-of-the-art MLTs: Laplacian Eigenmaps (LEs), T-distributed Stochastic Neighbor Embedding (T-SNE), and Uniform Manifold Approximation and Projection (UMAP). For each method, we empirically evaluate its ability to generate neuro-biologically meaningful representations of tvFC data, as well as their robustness against hyper-parameter selection. Our results show that tvFC data has an ID that ranges between 4 and 26, and that ID varies significantly between rest and task states. We also show how all three methods can effectively capture subject identity and task being performed: UMAP and T-SNE can capture these two levels of detail concurrently, but LE could only capture one at a time. We observed substantial variability in embedding quality across MLTs, and within-MLT as a function of hyper-parameter selection. To help alleviate this issue, we provide heuristics that can inform future studies. Finally, we also demonstrate the importance of feature normalization when combining data across subjects and the role that temporal autocorrelation plays in the application of MLTs to tvFC data. Overall, we conclude that while MLTs can be useful to generate summary views of labeled tvFC data, their application to unlabeled data such as resting-state remains challenging.
Collapse
Affiliation(s)
- Javier Gonzalez-Castillo
- Section on Functional Imaging Methods, National Institute of Mental Health, Bethesda, MD, United States
| | - Isabel S. Fernandez
- Section on Functional Imaging Methods, National Institute of Mental Health, Bethesda, MD, United States
| | - Ka Chun Lam
- Machine Learning Group, National Institute of Mental Health, Bethesda, MD, United States
| | - Daniel A. Handwerker
- Section on Functional Imaging Methods, National Institute of Mental Health, Bethesda, MD, United States
| | - Francisco Pereira
- Machine Learning Group, National Institute of Mental Health, Bethesda, MD, United States
| | - Peter A. Bandettini
- Section on Functional Imaging Methods, National Institute of Mental Health, Bethesda, MD, United States
- Functional Magnetic Resonance Imaging (FMRI) Core, National Institute of Mental Health, Bethesda, MD, United States
| |
Collapse
|
13
|
Lysov M, Pukhkiy K, Vasiliev E, Getmanskaya A, Turlapov V. Ensuring Explainability and Dimensionality Reduction in a Multidimensional HSI World for Early XAI-Diagnostics of Plant Stress. ENTROPY (BASEL, SWITZERLAND) 2023; 25:e25050801. [PMID: 37238556 DOI: 10.3390/e25050801] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/10/2023] [Revised: 05/08/2023] [Accepted: 05/08/2023] [Indexed: 05/28/2023]
Abstract
This work is mostly devoted to the search for effective solutions to the problem of early diagnosis of plant stress (given an example of wheat and its drought stress), which would be based on explainable artificial intelligence (XAI). The main idea is to combine the benefits of two of the most popular agricultural data sources, hyperspectral images (HSI) and thermal infrared images (TIR), in a single XAI model. Our own dataset of a 25-day experiment was used, which was created via both (1) an HSI camera Specim IQ (400-1000 nm, 204, 512 × 512) and (2) a TIR camera Testo 885-2 (320 × 240, res. 0.1 °C). The HSI were a source of the k-dimensional high-level features of plants (k ≤ K, where K is the number of HSI channels) for the learning process. Such combination was implemented as a single-layer perceptron (SLP) regressor, which is the main feature of the XAI model and receives as input an HSI pixel-signature belonging to the plant mask, which then automatically through the mask receives a mark from the TIR. The correlation of HSI channels with the TIR image on the plant's mask on the days of the experiment was studied. It was established that HSI channel 143 (820 nm) was the most correlated with TIR. The problem of training the HSI signatures of plants with their corresponding temperature value via the XAI model was solved. The RMSE of plant temperature prediction is 0.2-0.3 °C, which is acceptable for early diagnostics. Each HSI pixel was represented in training by a number (k) of channels (k ≤ K = 204 in our case). The number of channels used for training was minimized by a factor of 25-30, from 204 to eight or seven, while maintaining the RMSE value. The model is computationally efficient in training; the average training time was much less than one minute (Intel Core i3-8130U, 2.2 GHz, 4 cores, 4 GB). This XAI model can be considered a research-aimed model (R-XAI), which allows the transfer of knowledge about plants from the TIR domain to the HSI domain, with their contrasting onto only a few from hundreds of HSI channels.
Collapse
Affiliation(s)
- Maxim Lysov
- Department of Mathematical Software and Supercomputing Technologies, Lobachevsky University, 603950 Nizhny Novgorod, Russia
| | - Konstantin Pukhkiy
- Department of Mathematical Software and Supercomputing Technologies, Lobachevsky University, 603950 Nizhny Novgorod, Russia
| | - Evgeny Vasiliev
- Department of Mathematical Software and Supercomputing Technologies, Lobachevsky University, 603950 Nizhny Novgorod, Russia
| | - Alexandra Getmanskaya
- Department of Mathematical Software and Supercomputing Technologies, Lobachevsky University, 603950 Nizhny Novgorod, Russia
| | - Vadim Turlapov
- Department of Mathematical Software and Supercomputing Technologies, Lobachevsky University, 603950 Nizhny Novgorod, Russia
| |
Collapse
|
14
|
Koch V, Weitzer N, Dos Santos DP, Gruenewald LD, Mahmoudi S, Martin SS, Eichler K, Bernatz S, Gruber-Rouh T, Booz C, Hammerstingl RM, Biciusca T, Rosbach N, Gökduman A, D'Angelo T, Finkelmeier F, Yel I, Alizadeh LS, Sommer CM, Cengiz D, Vogl TJ, Albrecht MH. Multiparametric detection and outcome prediction of pancreatic cancer involving dual-energy CT, diffusion-weighted MRI, and radiomics. Cancer Imaging 2023; 23:38. [PMID: 37072856 PMCID: PMC10114410 DOI: 10.1186/s40644-023-00549-8] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2022] [Accepted: 03/17/2023] [Indexed: 04/20/2023] Open
Abstract
BACKGROUND The advent of next-generation computed tomography (CT)- and magnetic resonance imaging (MRI) opened many new perspectives in the evaluation of tumor characteristics. An increasing body of evidence suggests the incorporation of quantitative imaging biomarkers into clinical decision-making to provide mineable tissue information. The present study sought to evaluate the diagnostic and predictive value of a multiparametric approach involving radiomics texture analysis, dual-energy CT-derived iodine concentration (DECT-IC), and diffusion-weighted MRI (DWI) in participants with histologically proven pancreatic cancer. METHODS In this study, a total of 143 participants (63 years ± 13, 48 females) who underwent third-generation dual-source DECT and DWI between November 2014 and October 2022 were included. Among these, 83 received a final diagnosis of pancreatic cancer, 20 had pancreatitis, and 40 had no evidence of pancreatic pathologies. Data comparisons were performed using chi-square statistic tests, one-way ANOVA, or two-tailed Student's t-test. For the assessment of the association of texture features with overall survival, receiver operating characteristics analysis and Cox regression tests were used. RESULTS Malignant pancreatic tissue differed significantly from normal or inflamed tissue regarding radiomics features (overall P < .001, respectively) and iodine uptake (overall P < .001, respectively). The performance for the distinction of malignant from normal or inflamed pancreatic tissue ranged between an AUC of ≥ 0.995 (95% CI, 0.955-1.0; P < .001) for radiomics features, ≥ 0.852 (95% CI, 0.767-0.914; P < .001) for DECT-IC, and ≥ 0.690 (95% CI, 0.587-0.780; P = .01) for DWI, respectively. During a follow-up of 14 ± 12 months (range, 10-44 months), the multiparametric approach showed a moderate prognostic power to predict all-cause mortality (c-index = 0.778 [95% CI, 0.697-0.864], P = .01). CONCLUSIONS Our reported multiparametric approach allowed for accurate discrimination of pancreatic cancer and revealed great potential to provide independent prognostic information on all-cause mortality.
Collapse
Affiliation(s)
- Vitali Koch
- Department of Diagnostic and Interventional Radiology, University Hospital Frankfurt, Theodor-Stern-Kai 7, Frankfurt am Main, 60590, Germany.
| | - Nils Weitzer
- Department of Diagnostic and Interventional Radiology, University Hospital Frankfurt, Theodor-Stern-Kai 7, Frankfurt am Main, 60590, Germany
| | - Daniel Pinto Dos Santos
- Department of Diagnostic and Interventional Radiology, University Hospital Frankfurt, Theodor-Stern-Kai 7, Frankfurt am Main, 60590, Germany
| | - Leon D Gruenewald
- Department of Diagnostic and Interventional Radiology, University Hospital Frankfurt, Theodor-Stern-Kai 7, Frankfurt am Main, 60590, Germany
| | - Scherwin Mahmoudi
- Department of Diagnostic and Interventional Radiology, University Hospital Frankfurt, Theodor-Stern-Kai 7, Frankfurt am Main, 60590, Germany
| | - Simon S Martin
- Department of Diagnostic and Interventional Radiology, University Hospital Frankfurt, Theodor-Stern-Kai 7, Frankfurt am Main, 60590, Germany
| | - Katrin Eichler
- Department of Diagnostic and Interventional Radiology, University Hospital Frankfurt, Theodor-Stern-Kai 7, Frankfurt am Main, 60590, Germany
| | - Simon Bernatz
- Department of Diagnostic and Interventional Radiology, University Hospital Frankfurt, Theodor-Stern-Kai 7, Frankfurt am Main, 60590, Germany
| | - Tatjana Gruber-Rouh
- Department of Diagnostic and Interventional Radiology, University Hospital Frankfurt, Theodor-Stern-Kai 7, Frankfurt am Main, 60590, Germany
| | - Christian Booz
- Department of Diagnostic and Interventional Radiology, University Hospital Frankfurt, Theodor-Stern-Kai 7, Frankfurt am Main, 60590, Germany
| | - Renate M Hammerstingl
- Department of Diagnostic and Interventional Radiology, University Hospital Frankfurt, Theodor-Stern-Kai 7, Frankfurt am Main, 60590, Germany
| | - Teodora Biciusca
- Department of Diagnostic and Interventional Radiology, University Hospital Frankfurt, Theodor-Stern-Kai 7, Frankfurt am Main, 60590, Germany
| | - Nicolas Rosbach
- Department of Diagnostic and Interventional Radiology, University Hospital Frankfurt, Theodor-Stern-Kai 7, Frankfurt am Main, 60590, Germany
| | - Aynur Gökduman
- Department of Diagnostic and Interventional Radiology, University Hospital Frankfurt, Theodor-Stern-Kai 7, Frankfurt am Main, 60590, Germany
| | - Tommaso D'Angelo
- Department of Biomedical Sciences and Morphological and Functional Imaging, University Hospital Messina, Messina, Italy
| | - Fabian Finkelmeier
- Department of Internal Medicine, University Hospital Frankfurt, Frankfurt Am Main, Germany
| | - Ibrahim Yel
- Department of Diagnostic and Interventional Radiology, University Hospital Frankfurt, Theodor-Stern-Kai 7, Frankfurt am Main, 60590, Germany
| | - Leona S Alizadeh
- Department of Diagnostic and Interventional Radiology, University Hospital Frankfurt, Theodor-Stern-Kai 7, Frankfurt am Main, 60590, Germany
| | - Christof M Sommer
- Clinic of Diagnostic and Interventional Radiology, Heidelberg University Hospital, Heidelberg, Germany
| | - Duygu Cengiz
- Department of Radiology, University of Koc School of Medicine, Istanbul, Turkey
| | - Thomas J Vogl
- Department of Diagnostic and Interventional Radiology, University Hospital Frankfurt, Theodor-Stern-Kai 7, Frankfurt am Main, 60590, Germany
| | - Moritz H Albrecht
- Department of Diagnostic and Interventional Radiology, University Hospital Frankfurt, Theodor-Stern-Kai 7, Frankfurt am Main, 60590, Germany
| |
Collapse
|
15
|
Śliwowski M, Martin M, Souloumiac A, Blanchart P, Aksenova T. Impact of dataset size and long-term ECoG-based BCI usage on deep learning decoders performance. Front Hum Neurosci 2023; 17:1111645. [PMID: 37007675 PMCID: PMC10061076 DOI: 10.3389/fnhum.2023.1111645] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2022] [Accepted: 02/27/2023] [Indexed: 03/18/2023] Open
Abstract
IntroductionIn brain-computer interfaces (BCI) research, recording data is time-consuming and expensive, which limits access to big datasets. This may influence the BCI system performance as machine learning methods depend strongly on the training dataset size. Important questions arise: taking into account neuronal signal characteristics (e.g., non-stationarity), can we achieve higher decoding performance with more data to train decoders? What is the perspective for further improvement with time in the case of long-term BCI studies? In this study, we investigated the impact of long-term recordings on motor imagery decoding from two main perspectives: model requirements regarding dataset size and potential for patient adaptation.MethodsWe evaluated the multilinear model and two deep learning (DL) models on a long-term BCI & Tetraplegia (ClinicalTrials.gov identifier: NCT02550522) clinical trial dataset containing 43 sessions of ECoG recordings performed with a tetraplegic patient. In the experiment, a participant executed 3D virtual hand translation using motor imagery patterns. We designed multiple computational experiments in which training datasets were increased or translated to investigate the relationship between models' performance and different factors influencing recordings.ResultsOur results showed that DL decoders showed similar requirements regarding the dataset size compared to the multilinear model while demonstrating higher decoding performance. Moreover, high decoding performance was obtained with relatively small datasets recorded later in the experiment, suggesting motor imagery patterns improvement and patient adaptation during the long-term experiment. Finally, we proposed UMAP embeddings and local intrinsic dimensionality as a way to visualize the data and potentially evaluate data quality.DiscussionDL-based decoding is a prospective approach in BCI which may be efficiently applied with real-life dataset size. Patient-decoder co-adaptation is an important factor to consider in long-term clinical BCI.
Collapse
Affiliation(s)
- Maciej Śliwowski
- Université Grenoble Alpes, CEA, LETI, Clinatec, Grenoble, France
- Université Paris-Saclay, CEA, List, Palaiseau, France
| | - Matthieu Martin
- Université Grenoble Alpes, CEA, LETI, Clinatec, Grenoble, France
| | | | | | - Tetiana Aksenova
- Université Grenoble Alpes, CEA, LETI, Clinatec, Grenoble, France
- *Correspondence: Tetiana Aksenova
| |
Collapse
|
16
|
Gonzalez-Castillo J, Fernandez I, Lam KC, Handwerker DA, Pereira F, Bandettini PA. Manifold Learning for fMRI time-varying FC. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.01.14.523992. [PMID: 36789436 PMCID: PMC9928030 DOI: 10.1101/2023.01.14.523992] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/18/2023]
Abstract
Whole-brain functional connectivity ( FC ) measured with functional MRI (fMRI) evolve over time in meaningful ways at temporal scales going from years (e.g., development) to seconds (e.g., within-scan time-varying FC ( tvFC )). Yet, our ability to explore tvFC is severely constrained by its large dimensionality (several thousands). To overcome this difficulty, researchers seek to generate low dimensional representations (e.g., 2D and 3D scatter plots) expected to retain its most informative aspects (e.g., relationships to behavior, disease progression). Limited prior empirical work suggests that manifold learning techniques ( MLTs )-namely those seeking to infer a low dimensional non-linear surface (i.e., the manifold) where most of the data lies-are good candidates for accomplishing this task. Here we explore this possibility in detail. First, we discuss why one should expect tv FC data to lie on a low dimensional manifold. Second, we estimate what is the intrinsic dimension (i.e., minimum number of latent dimensions; ID ) of tvFC data manifolds. Third, we describe the inner workings of three state-of-the-art MLTs : Laplacian Eigenmaps ( LE ), T-distributed Stochastic Neighbor Embedding ( T-SNE ), and Uniform Manifold Approximation and Projection ( UMAP ). For each method, we empirically evaluate its ability to generate neuro-biologically meaningful representations of tvFC data, as well as their robustness against hyper-parameter selection. Our results show that tvFC data has an ID that ranges between 4 and 26, and that ID varies significantly between rest and task states. We also show how all three methods can effectively capture subject identity and task being performed: UMAP and T-SNE can capture these two levels of detail concurrently, but L E could only capture one at a time. We observed substantial variability in embedding quality across MLTs , and within- MLT as a function of hyper-parameter selection. To help alleviate this issue, we provide heuristics that can inform future studies. Finally, we also demonstrate the importance of feature normalization when combining data across subjects and the role that temporal autocorrelation plays in the application of MLTs to tvFC data. Overall, we conclude that while MLTs can be useful to generate summary views of labeled tvFC data, their application to unlabeled data such as resting-state remains challenging.
Collapse
Affiliation(s)
| | - Isabel Fernandez
- Section on Functional Imaging Methods, National Institute of Mental Health, Bethesda, MD
| | - Ka Chun Lam
- Machine Learning Group, National Institute of Mental Health, Bethesda, MD
| | - Daniel A Handwerker
- Section on Functional Imaging Methods, National Institute of Mental Health, Bethesda, MD
| | - Francisco Pereira
- Machine Learning Group, National Institute of Mental Health, Bethesda, MD
| | - Peter A Bandettini
- Section on Functional Imaging Methods, National Institute of Mental Health, Bethesda, MD
- Machine Learning Group, National Institute of Mental Health, Bethesda, MD
- FMRI Core, National Institute of Mental Health, Bethesda, MD
| |
Collapse
|
17
|
Dunin-Barkowski W, Gorban A. Editorial: Toward and beyond human-level AI, volume II. Front Neurorobot 2023; 16:1120167. [PMID: 36687208 PMCID: PMC9853958 DOI: 10.3389/fnbot.2022.1120167] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2022] [Accepted: 12/13/2022] [Indexed: 01/07/2023] Open
Affiliation(s)
- Witali Dunin-Barkowski
- Department of Neuroinformatics, Center for Optical Neural Technologies, Scientific Research Institute for System Analysis, Russian Academy of Sciences, Moscow, Russia,*Correspondence: Witali Dunin-Barkowski ✉
| | - Alexander Gorban
- Department of Mathematics, University of Leicester, Leicester, United Kingdom,Scientific and Educational Mathematical Center “Mathematics of Future Technology,” Lobachevsky State University of Nizhny Novgorod, Nizhny Novgorod, Russia
| |
Collapse
|
18
|
Mirkes EM, Bac J, Fouché A, Stasenko SV, Zinovyev A, Gorban AN. Domain Adaptation Principal Component Analysis: Base Linear Method for Learning with Out-of-Distribution Data. ENTROPY (BASEL, SWITZERLAND) 2022; 25:33. [PMID: 36673174 PMCID: PMC9858254 DOI: 10.3390/e25010033] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/26/2022] [Revised: 12/18/2022] [Accepted: 12/21/2022] [Indexed: 06/17/2023]
Abstract
Domain adaptation is a popular paradigm in modern machine learning which aims at tackling the problem of divergence (or shift) between the labeled training and validation datasets (source domain) and a potentially large unlabeled dataset (target domain). The task is to embed both datasets into a common space in which the source dataset is informative for training while the divergence between source and target is minimized. The most popular domain adaptation solutions are based on training neural networks that combine classification and adversarial learning modules, frequently making them both data-hungry and difficult to train. We present a method called Domain Adaptation Principal Component Analysis (DAPCA) that identifies a linear reduced data representation useful for solving the domain adaptation task. DAPCA algorithm introduces positive and negative weights between pairs of data points, and generalizes the supervised extension of principal component analysis. DAPCA is an iterative algorithm that solves a simple quadratic optimization problem at each iteration. The convergence of the algorithm is guaranteed, and the number of iterations is small in practice. We validate the suggested algorithm on previously proposed benchmarks for solving the domain adaptation task. We also show the benefit of using DAPCA in analyzing single-cell omics datasets in biomedical applications. Overall, DAPCA can serve as a practical preprocessing step in many machine learning applications leading to reduced dataset representations, taking into account possible divergence between source and target domains.
Collapse
Affiliation(s)
- Evgeny M. Mirkes
- School of Computing and Mathematical Sciences, University of Leicester, Leicester LE1 7RH, UK
| | - Jonathan Bac
- Institut Curie, PSL Research University, 75005 Paris, France
- Institut National de la Santé et de la Recherche Médicale (INSERM), U900, 75012 Paris, France
- CBIO-Centre for Computational Biology, Mines ParisTech, PSL Research University, 75005 Paris, France
| | - Aziz Fouché
- Institut Curie, PSL Research University, 75005 Paris, France
- Institut National de la Santé et de la Recherche Médicale (INSERM), U900, 75012 Paris, France
- CBIO-Centre for Computational Biology, Mines ParisTech, PSL Research University, 75005 Paris, France
| | - Sergey V. Stasenko
- Laboratory of Advanced Methods for High-Dimensional Data Analysis, Lobachevsky University, 603000 Nizhniy Novgorod, Russia
| | - Andrei Zinovyev
- Institut Curie, PSL Research University, 75005 Paris, France
- Institut National de la Santé et de la Recherche Médicale (INSERM), U900, 75012 Paris, France
- CBIO-Centre for Computational Biology, Mines ParisTech, PSL Research University, 75005 Paris, France
| | - Alexander N. Gorban
- School of Computing and Mathematical Sciences, University of Leicester, Leicester LE1 7RH, UK
| |
Collapse
|
19
|
Lysov M, Maximova I, Vasiliev E, Getmanskaya A, Turlapov V. Entropy as a High-Level Feature for XAI-Based Early Plant Stress Detection. ENTROPY (BASEL, SWITZERLAND) 2022; 24:1597. [PMID: 36359687 PMCID: PMC9689005 DOI: 10.3390/e24111597] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/19/2022] [Revised: 10/17/2022] [Accepted: 10/26/2022] [Indexed: 06/16/2023]
Abstract
This article is devoted to searching for high-level explainable features that can remain explainable for a wide class of objects or phenomena and become an integral part of explainable AI (XAI). The present study involved a 25-day experiment on early diagnosis of wheat stress using drought stress as an example. The state of the plants was periodically monitored via thermal infrared (TIR) and hyperspectral image (HSI) cameras. A single-layer perceptron (SLP)-based classifier was used as the main instrument in the XAI study. To provide explainability of the SLP input, the direct HSI was replaced by images of six popular vegetation indices and three HSI channels (R630, G550, and B480; referred to as indices), along with the TIR image. Furthermore, in the explainability analysis, each of the 10 images was replaced by its 6 statistical features: min, max, mean, std, max-min, and the entropy. For the SLP output explainability, seven output neurons corresponding to the key states of the plants were chosen. The inner layer of the SLP was constructed using 15 neurons, including 10 corresponding to the indices and 5 reserved neurons. The classification possibilities of all 60 features and 10 indices of the SLP classifier were studied. Study result: Entropy is the earliest high-level stress feature for all indices; entropy and an entropy-like feature (max-min) paired with one of the other statistical features can provide, for most indices, 100% accuracy (or near 100%), serving as an integral part of XAI.
Collapse
|
20
|
Roy T, Sharma K, Dhall A, Patiyal S, Raghava GPS. In silico method for predicting infectious strains of influenza A virus from its genome and protein sequences. J Gen Virol 2022; 103. [PMID: 36318663 DOI: 10.1099/jgv.0.001802] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/16/2023] Open
Abstract
Influenza A is a contagious viral disease responsible for four pandemics in the past and a major public health concern. Being zoonotic in nature, the virus can cross the species barrier and transmit from wild aquatic bird reservoirs to humans via intermediate hosts. In this study, we have developed a computational method for the prediction of human-associated and non-human-associated influenza A virus sequences. The models were trained and validated on proteins and genome sequences of influenza A virus. Firstly, we have developed prediction models for 15 types of influenza A proteins using composition-based and one-hot-encoding features. We have achieved a highest AUC of 0.98 for HA protein on a validation dataset using dipeptide composition-based features. Of note, we obtained a maximum AUC of 0.99 using one-hot-encoding features for protein-based models on a validation dataset. Secondly, we built models using whole genome sequences which achieved an AUC of 0.98 on a validation dataset. In addition, we showed that our method outperforms a similarity-based approach (i.e., blast) on the same validation dataset. Finally, we integrated our best models into a user-friendly web server 'FluSPred' (https://webs.iiitd.edu.in/raghava/fluspred/index.html) and a standalone version (https://github.com/raghavagps/FluSPred) for the prediction of human-associated/non-human-associated influenza A virus strains.
Collapse
Affiliation(s)
- Trinita Roy
- Department of Computational Biology, Indraprastha Institute of Information Technology, Okhla Phase 3, New Delhi-110020, India
| | - Khushal Sharma
- Department of Computational Biology, Indraprastha Institute of Information Technology, Okhla Phase 3, New Delhi-110020, India
| | - Anjali Dhall
- Department of Computational Biology, Indraprastha Institute of Information Technology, Okhla Phase 3, New Delhi-110020, India
| | - Sumeet Patiyal
- Department of Computational Biology, Indraprastha Institute of Information Technology, Okhla Phase 3, New Delhi-110020, India
| | - Gajendra Pal Singh Raghava
- Department of Computational Biology, Indraprastha Institute of Information Technology, Okhla Phase 3, New Delhi-110020, India
| |
Collapse
|
21
|
Khan MI, Park T, Imran MA, Gowda Saralamma VV, Lee DC, Choi J, Baig MH, Dong JJ. Development of machine learning models for the screening of potential HSP90 inhibitors. Front Mol Biosci 2022; 9:967510. [PMID: 36339714 PMCID: PMC9626531 DOI: 10.3389/fmolb.2022.967510] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2022] [Accepted: 08/15/2022] [Indexed: 11/18/2022] Open
Abstract
Heat shock protein 90 (Hsp90) is a molecular chaperone playing a significant role in the folding of client proteins. This cellular protein is linked to the progression of several cancer types, including breast cancer, lung cancer, and gastrointestinal stromal tumors. Several oncogenic kinases are Hsp90 clients and their activity depends on this molecular chaperone. This makes HSP90 a prominent therapeutic target for cancer treatment. Studies have confirmed the inhibition of HSP90 as a striking therapeutic treatment for cancer management. In this study, we have utilized machine learning and different in silico approaches to screen the KCB database to identify the potential HSP90 inhibitors. Further evaluation of these inhibitors on various cancer cell lines showed favorable inhibitory activity. These inhibitors could serve as a basis for future development of effective HSP90 inhibitors.
Collapse
Affiliation(s)
- Mohd Imran Khan
- Department of Family Medicine, Gangnam Severance Hospital, Yonsei University College of Medicine, Seoul, South Korea
| | - Taehwan Park
- Department of Family Medicine, Gangnam Severance Hospital, Yonsei University College of Medicine, Seoul, South Korea
| | - Mohammad Azhar Imran
- Department of Family Medicine, Gangnam Severance Hospital, Yonsei University College of Medicine, Seoul, South Korea
| | | | - Duk Chul Lee
- Department of Family Medicine, Severance Hospital, Yonsei University College of Medicine, Seoul, South Korea
| | - Jaehyuk Choi
- BNJBiopharma, Yonsei University International Campus, Incheon, South Korea
| | - Mohammad Hassan Baig
- Department of Family Medicine, Gangnam Severance Hospital, Yonsei University College of Medicine, Seoul, South Korea
- *Correspondence: Jae-June Dong, ; Mohammad Hassan Baig,
| | - Jae-June Dong
- Department of Family Medicine, Gangnam Severance Hospital, Yonsei University College of Medicine, Seoul, South Korea
- *Correspondence: Jae-June Dong, ; Mohammad Hassan Baig,
| |
Collapse
|
22
|
Sharma T, Saralamma VVG, Lee DC, Imran MA, Choi J, Baig MH, Dong JJ. Combining structure-based pharmacophore modeling and machine learning for the identification of novel BTK inhibitors. Int J Biol Macromol 2022; 222:239-250. [PMID: 36130643 DOI: 10.1016/j.ijbiomac.2022.09.151] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2022] [Revised: 09/13/2022] [Accepted: 09/16/2022] [Indexed: 11/05/2022]
Abstract
Bruton's tyrosine kinase (BTK) is a critical enzyme which is involved in multiple signaling pathways that regulate cellular survival, activation, and proliferation, making it a major cancer therapeutic target. We applied the novel integrated structure-based pharmacophore modeling, machine learning, and other in silico studies to screen the Korean chemical database (KCB) to identify the potential BTK inhibitors (BTKi). Further evaluation of these inhibitors on three different human cancer cell lines showed significant cell growth inhibitory activity. Among the 13 compounds shortlisted, four demonstrated consistent cell inhibition activity among breast, gastric, and lung cancer cells (IC50 below 3 μM). The selected compounds also showed significant kinase inhibition activity (IC50 below 5 μM). The current study suggests the potential of these inhibitors for targeting BTK malignant tumors.
Collapse
Affiliation(s)
- Tanuj Sharma
- Department of Family Medicine, Gangnam Severance Hospital, Yonsei University College of Medicine, Gangnam-gu, Seoul 120-752, Republic of Korea
| | - Venu Venkatarame Gowda Saralamma
- Department of Family Medicine, Gangnam Severance Hospital, Yonsei University College of Medicine, Gangnam-gu, Seoul 120-752, Republic of Korea
| | - Duk Chul Lee
- Department of Family Medicine, Gangnam Severance Hospital, Yonsei University College of Medicine, Gangnam-gu, Seoul 120-752, Republic of Korea
| | - Mohammad Azhar Imran
- Department of Family Medicine, Gangnam Severance Hospital, Yonsei University College of Medicine, Gangnam-gu, Seoul 120-752, Republic of Korea
| | - Jaehyuk Choi
- BNJBiopharma, 2nd floor Memorial Hall, 85, Songdogwahak-ro, Yeonsu-gu, Incheon 21983, Republic of Korea
| | - Mohammad Hassan Baig
- Department of Family Medicine, Gangnam Severance Hospital, Yonsei University College of Medicine, Gangnam-gu, Seoul 120-752, Republic of Korea.
| | - Jae-June Dong
- Department of Family Medicine, Gangnam Severance Hospital, Yonsei University College of Medicine, Gangnam-gu, Seoul 120-752, Republic of Korea.
| |
Collapse
|
23
|
He Y, Liu K, Han L, Han W. Clustering Analysis, Structure Fingerprint Analysis, and Quantum Chemical Calculations of Compounds from Essential Oils of Sunflower (Helianthus annuus L.) Receptacles. Int J Mol Sci 2022; 23:ijms231710169. [PMID: 36077567 PMCID: PMC9456235 DOI: 10.3390/ijms231710169] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2022] [Revised: 08/25/2022] [Accepted: 08/30/2022] [Indexed: 11/25/2022] Open
Abstract
Sunflower (Helianthus annuus L.) is an appropriate crop for current new patterns of green agriculture, so it is important to change sunflower receptacles from waste to useful resource. However, there is limited knowledge on the functions of compounds from the essential oils of sunflower receptacles. In this study, a new method was created for chemical space network analysis and classification of small samples, and applied to 104 compounds. Here, t-SNE (t-Distributed Stochastic Neighbor Embedding) dimensions were used to reduce coordinates as node locations and edge connections of chemical space networks, respectively, and molecules were grouped according to whether the edges were connected and the proximity of the node coordinates. Through detailed analysis of the structural characteristics and fingerprints of each classified group, our classification method attained good accuracy. Targets were then identified using reverse docking methods, and the active centers of the same types of compounds were determined by quantum chemical calculation. The results indicated that these compounds can be divided into nine groups, according to their mean within-group similarity (MWGS) values. The three families with the most members, i.e., the d-limonene group (18), α-pinene group (10), and γ-maaliene group (nine members) determined the protein targets, using PharmMapper. Structure fingerprint analysis was employed to predict the binding mode of the ligands of four families of the protein targets. Thence, quantum chemical calculations were applied to the active group of the representative compounds of the four families. This study provides further scientific information to support the use of sunflower receptacles.
Collapse
Affiliation(s)
| | | | - Lu Han
- Correspondence: (L.H.); (W.H.)
| | | |
Collapse
|
24
|
Liu X, Shu Y, Yu P, Li H, Duan W, Wei Z, Li K, Xie W, Zeng Y, Peng D. Classification of severe obstructive sleep apnea with cognitive impairment using degree centrality: A machine learning analysis. Front Neurol 2022; 13:1005650. [PMID: 36090863 PMCID: PMC9453022 DOI: 10.3389/fneur.2022.1005650] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2022] [Accepted: 08/11/2022] [Indexed: 11/24/2022] Open
Abstract
In this study, we aimed to use voxel-level degree centrality (DC) features in combination with machine learning methods to distinguish obstructive sleep apnea (OSA) patients with and without mild cognitive impairment (MCI). Ninety-nine OSA patients were recruited for rs-MRI scanning, including 51 MCI patients and 48 participants with no mild cognitive impairment. Based on the Automated Anatomical Labeling (AAL) brain atlas, the DC features of all participants were calculated and extracted. Ten DC features were screened out by deleting variables with high pin-correlation and minimum absolute contraction and performing selective operator lasso regression. Finally, three machine learning methods were used to establish classification models. The support vector machine method had the best classification efficiency (AUC = 0.78), followed by random forest (AUC = 0.71) and logistic regression (AUC = 0.77). These findings demonstrate an effective machine learning approach for differentiating OSA patients with and without MCI and provide potential neuroimaging evidence for cognitive impairment caused by OSA.
Collapse
Affiliation(s)
- Xiang Liu
- Department of Radiology, the First Affiliated Hospital of Nanchang University, Jiangxi, China
| | - Yongqiang Shu
- Department of Radiology, the First Affiliated Hospital of Nanchang University, Jiangxi, China
| | - Pengfei Yu
- Big Data Center, the Second Affiliated Hospital of Nanchang University, Jiangxi, China
| | - Haijun Li
- Department of PET Center, the First Affiliated Hospital of Nanchang University, Jiangxi, China
| | - Wenfeng Duan
- Department of Radiology, the First Affiliated Hospital of Nanchang University, Jiangxi, China
| | - Zhipeng Wei
- Department of Radiology, the First Affiliated Hospital of Nanchang University, Jiangxi, China
| | - Kunyao Li
- Department of Radiology, the First Affiliated Hospital of Nanchang University, Jiangxi, China
| | - Wei Xie
- Department of Radiology, the First Affiliated Hospital of Nanchang University, Jiangxi, China
| | - Yaping Zeng
- Department of Radiology, the First Affiliated Hospital of Nanchang University, Jiangxi, China
| | - Dechang Peng
- Department of Radiology, the First Affiliated Hospital of Nanchang University, Jiangxi, China
- *Correspondence: Dechang Peng
| |
Collapse
|
25
|
Liu Z, Bhattacharya S, Maiti T. Variational Bayes Ensemble Learning Neural Networks With Compressed Feature Space. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2022; PP:1379-1385. [PMID: 35584070 DOI: 10.1109/tnnls.2022.3172276] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
We consider the problem of nonparametric classification from a high-dimensional input vector (small n large p problem). To handle the high-dimensional feature space, we propose a random projection (RP) of the feature space followed by training of a neural network (NN) on the compressed feature space. Unlike regularization techniques (lasso, ridge, etc.), which train on the full data, NNs based on compressed feature space have significantly lower computation complexity and memory storage requirements. Nonetheless, a random compression-based method is often sensitive to the choice of compression. To address this issue, we adopt a Bayesian model averaging (BMA) approach and leverage the posterior model weights to determine: 1) uncertainty under each compression and 2) intrinsic dimensionality of the feature space (the effective dimension of feature space useful for prediction). The final prediction is improved by averaging models with projected dimensions close to the intrinsic dimensionality. Furthermore, we propose a variational approach to the afore-mentioned BMA to allow for simultaneous estimation of both model weights and model-specific parameters. Since the proposed variational solution is parallelizable across compressions, it preserves the computational gain of frequentist ensemble techniques while providing the full uncertainty quantification of a Bayesian approach. We establish the asymptotic consistency of the proposed algorithm under the suitable characterization of the RPs and the prior parameters. Finally, we provide extensive numerical examples for empirical validation of the proposed method.
Collapse
|
26
|
Pinar-Sanchez J, Bermejo López P, Solís García Del Pozo J, Redondo-Ruiz J, Navarro Casado L, Andres-Pretel F, Celorrio Bustillo ML, Esparcia Moreno M, García Ruiz S, Solera Santos JJ, Navarro Bravo B. Common Laboratory Parameters Are Useful for Screening for Alcohol Use Disorder: Designing a Predictive Model Using Machine Learning. J Clin Med 2022; 11:2061. [PMID: 35407669 PMCID: PMC8999878 DOI: 10.3390/jcm11072061] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2022] [Revised: 04/01/2022] [Accepted: 04/03/2022] [Indexed: 11/16/2022] Open
Abstract
The diagnosis of alcohol use disorder (AUD) remains a difficult challenge, and some patients may not be adequately diagnosed. This study aims to identify an optimum combination of laboratory markers to detect alcohol consumption, using data science. An analytical observational study was conducted with 337 subjects (253 men and 83 women, with a mean age of 44 years (10.61 Standard Deviation (SD)). The first group included 204 participants being treated in the Addictive Behaviors Unit (ABU) from Albacete (Spain). They met the diagnostic criteria for AUD specified in the Diagnostic and Statistical Manual of mental disorders fifth edition (DSM-5). The second group included 133 blood donors (people with no risk of AUD), recruited by cross-section. All participants were also divided in two groups according to the WHO classification for risk of alcohol consumption in Spain, that is, males drinking more than 28 standard drink units (SDUs) or women drinking more than 17 SDUs. Medical history and laboratory markers were selected from our hospital's database. A correlation between alterations in laboratory markers and the amount of alcohol consumed was established. We then created three predicted models (with logistic regression, classification tree, and Bayesian network) to detect risk of alcohol consumption by using laboratory markers as predictive features. For the execution of the selection of variables and the creation and validation of predictive models, two tools were used: the scikit-learn library for Python, and the Weka application. The logistic regression model provided a maximum AUD prediction accuracy of 85.07%. Secondly, the classification tree provided a lower accuracy of 79.4%, but easier interpretation. Finally, the Naive Bayes network had an accuracy of 87.46%. The combination of several common biochemical markers and the use of data science can enhance detection of AUD, helping to prevent future medical complications derived from AUD.
Collapse
Affiliation(s)
- Juana Pinar-Sanchez
- Department of Internal Medicine, Jose Maria Morales Meseguer University General Hospital, 30008 Murcia, Spain;
| | - Pablo Bermejo López
- Computer Science Department, Universidad de Castilla-La Mancha, 02071 Albacete, Spain;
| | - Julián Solís García Del Pozo
- Unit of Infectious Diseases, Department of Internal Medicine, University General Hospital of Albacete, 02006 Albacete, Spain
| | - Jose Redondo-Ruiz
- Unit and Gerodontology, Department of Dermatology, Stomatology, Radiology and Physical Medicine, Special Care Dentistry, Jose Maria Morales Meseguer University General Hospital, Faculty of Medicine, University of Murcia, 30008 Murcia, Spain;
| | - Laura Navarro Casado
- Department of Biochemistry, University General Hospital of Albacete, 02006 Albacete, Spain;
| | - Fernando Andres-Pretel
- Clinical Research Support Unit, National Paraplegics Hospital of Toledo Foundation, 45004 Toledo, Spain;
| | | | - Mercedes Esparcia Moreno
- Department of Mental Health, Addictive Conducts Unit Care in Albacete, 02005 Albacete, Spain; (M.L.C.B.); (M.E.M.)
| | - Santiago García Ruiz
- Blood Donation Center from Albacete and Cuenca, Department of Hematology, University General Hospital of Albacete, 02006 Albacete, Spain;
| | | | - Beatriz Navarro Bravo
- Department of Psychology, Faculty of Medicine, Universidad de Castilla-La Mancha, 02008 Albacete, Spain
| |
Collapse
|
27
|
Ding W, Wu L, Li X, Chang L, Liu G, Du H. Comprehensive analysis of competitive endogenous RNAs network: Identification and validation of prediction model composed of mRNA signature and miRNA signature in gastric cancer. Oncol Lett 2022; 23:150. [PMID: 35350591 PMCID: PMC8941526 DOI: 10.3892/ol.2022.13270] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2021] [Accepted: 02/22/2022] [Indexed: 11/18/2022] Open
Abstract
Gastric cancer (GC), one of the most lethal malignant tumors, is highly aggressive with a poor prognosis, while the molecular mechanisms underlying it remain largely unknown. Although advanced imaging techniques and comprehensive treatment facilitate the diagnosis and survival of some GC patients, the precise diagnosis and prognosis are still a challenge. The present study used publicly available gene expression profiles from The Cancer Genome Atlas and Gene Expression Omnibus datasets including mRNA, micro (mi)RNA and circular (circ)RNA of GC to establish a competing endogenous RNA network (ceRNA). Further, the present study performed least absolute shrinkage and selector operator regression analysis on the hub RNAs to establish a prediction model with mRNA and miRNA. The ceRNA network contained 109 edges and 56 nodes and the visible network contains 13 miRNAs, 9 circRNAs and 34 mRNAs. The five mRNA-based signature were CTF1, FKBP5, RNF128, GSTM2 and ADAMTS1. The area under curve (AUC) value of the diagnosis training cohort was 0.9975. The prognosis of the high-risk group (RiskScore >4.664) was worse compared with that of the low-risk group (RiskScore ≤4.664; P<0.05) in the training cohort. The five miRNA-based signature were miR-145-5p, miR-615-3p, miR-6507-5p, miR-937-3p and miR-99a-3p. The AUC value of the diagnosis training cohort was 0.9975. The prognosis of the high-risk group (RiskScore >1.621) was worse compared with that of the low-risk group (RiskScore ≤1.621; P<0.05) in the training cohort. The validation cohorts indicated that both five mRNA and five miRNA-based signatures had strong predictive power in diagnosis and prognosis for GC. In conclusion, a ceRNA network was established for GC and a five mRNA-based signature and a five miRNA-based signature was identified that enabled diagnosis and prognosis of GC by assigning patient to a high-risk group or low-risk group.
Collapse
Affiliation(s)
- Wenshuang Ding
- Department of Pathology, Guangzhou First People's Hospital, School of Medicine, South China University of Technology, Guangzhou, Guangdong 510030, P.R. China
| | - Liqiong Wu
- Department of Pathology, Guangzhou First People's Hospital, School of Medicine, South China University of Technology, Guangzhou, Guangdong 510030, P.R. China
| | - Xiubo Li
- Department of Pathology, Guangzhou First People's Hospital, School of Medicine, South China University of Technology, Guangzhou, Guangdong 510030, P.R. China
| | - Lijun Chang
- Department of Pathology, Guangzhou First People's Hospital, School of Medicine, South China University of Technology, Guangzhou, Guangdong 510030, P.R. China
| | - Guorong Liu
- Department of Pathology, Guangzhou First People's Hospital, School of Medicine, South China University of Technology, Guangzhou, Guangdong 510030, P.R. China
| | - Hong Du
- Department of Pathology, Guangzhou First People's Hospital, School of Medicine, South China University of Technology, Guangzhou, Guangdong 510030, P.R. China
| |
Collapse
|
28
|
Zinovyev A, Sadovsky M, Calzone L, Fouché A, Groeneveld CS, Chervov A, Barillot E, Gorban AN. Modeling Progression of Single Cell Populations Through the Cell Cycle as a Sequence of Switches. Front Mol Biosci 2022; 8:793912. [PMID: 35178429 PMCID: PMC8846220 DOI: 10.3389/fmolb.2021.793912] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2021] [Accepted: 12/15/2021] [Indexed: 11/13/2022] Open
Abstract
Cell cycle is a biological process underlying the existence and propagation of life in time and space. It has been an object for mathematical modeling for long, with several alternative mechanistic modeling principles suggested, describing in more or less details the known molecular mechanisms. Recently, cell cycle has been investigated at single cell level in snapshots of unsynchronized cell populations, exploiting the new methods for transcriptomic and proteomic molecular profiling. This raises a need for simplified semi-phenomenological cell cycle models, in order to formalize the processes underlying the cell cycle, at a higher abstracted level. Here we suggest a modeling framework, recapitulating the most important properties of the cell cycle as a limit trajectory of a dynamical process characterized by several internal states with switches between them. In the simplest form, this leads to a limit cycle trajectory, composed by linear segments in logarithmic coordinates describing some extensive (depending on system size) cell properties. We prove a theorem connecting the effective embedding dimensionality of the cell cycle trajectory with the number of its linear segments. We also develop a simplified kinetic model with piecewise-constant kinetic rates describing the dynamics of lumps of genes involved in S-phase and G2/M phases. We show how the developed cell cycle models can be applied to analyze the available single cell datasets and simulate certain properties of the observed cell cycle trajectories. Based on our model, we can predict with good accuracy the cell line doubling time from the length of cell cycle trajectory.
Collapse
Affiliation(s)
- Andrei Zinovyev
- Institut Curie, PSL Research University, Paris, France
- INSERM, Paris, France
- MINES ParisTech, PSL Research University, CBIO-Centre for Computational Biology, Paris, France
- *Correspondence: Andrei Zinovyev,
| | - Michail Sadovsky
- Institute of Computational Modeling (RAS), Krasnoyarsk, Russia
- Laboratory of Medical Cybernetics, V.F.Voino-Yasenetsky Krasnoyarsk State Medical University, Krasnoyarsk, Russia
- Federal Research and Clinic Center of FMBA of Russia, Krasnoyarsk, Russia
- Laboratory of Advanced Methods for High-Dimensional Data Analysis, Lobachevsky University, Nizhniy Novgorod, Russia
| | - Laurence Calzone
- Institut Curie, PSL Research University, Paris, France
- INSERM, Paris, France
- MINES ParisTech, PSL Research University, CBIO-Centre for Computational Biology, Paris, France
| | - Aziz Fouché
- Institut Curie, PSL Research University, Paris, France
- INSERM, Paris, France
- MINES ParisTech, PSL Research University, CBIO-Centre for Computational Biology, Paris, France
| | - Clarice S. Groeneveld
- Cartes d’Identité des Tumeurs (CIT) Program, Ligue Nationale Contre le Cancer, Paris, France
- Oncologie Moleculaire, UMR144, Institut Curie, Paris, France
| | - Alexander Chervov
- Institut Curie, PSL Research University, Paris, France
- INSERM, Paris, France
- MINES ParisTech, PSL Research University, CBIO-Centre for Computational Biology, Paris, France
| | - Emmanuel Barillot
- Institut Curie, PSL Research University, Paris, France
- INSERM, Paris, France
- MINES ParisTech, PSL Research University, CBIO-Centre for Computational Biology, Paris, France
| | - Alexander N. Gorban
- Laboratory of Advanced Methods for High-Dimensional Data Analysis, Lobachevsky University, Nizhniy Novgorod, Russia
- Department of Mathematics, University of Leicester, Leicester, United Kingdom
| |
Collapse
|
29
|
Amblard E, Bac J, Chervov A, Soumelis V, Zinovyev A. Hubness reduction improves clustering and trajectory inference in single-cell transcriptomic data. Bioinformatics 2022; 38:1045-1051. [PMID: 34871374 DOI: 10.1093/bioinformatics/btab795] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2021] [Revised: 11/05/2021] [Accepted: 11/17/2021] [Indexed: 02/03/2023] Open
Abstract
MOTIVATION Single-cell RNA-seq (scRNAseq) datasets are characterized by large ambient dimensionality, and their analyses can be affected by various manifestations of the dimensionality curse. One of these manifestations is the hubness phenomenon, i.e. existence of data points with surprisingly large incoming connectivity degree in the datapoint neighbourhood graph. Conventional approach to dampen the unwanted effects of high dimension consists in applying drastic dimensionality reduction. It remains unexplored if this step can be avoided thus retaining more information than contained in the low-dimensional projections, by correcting directly hubness. RESULTS We investigated hubness in scRNAseq data. We show that hub cells do not represent any visible technical or biological bias. The effect of various hubness reduction methods is investigated with respect to the clustering, trajectory inference and visualization tasks in scRNAseq datasets. We show that hubness reduction generates neighbourhood graphs with properties more suitable for applying machine learning methods; and that it outperforms other state-of-the-art methods for improving neighbourhood graphs. As a consequence, clustering, trajectory inference and visualization perform better, especially for datasets characterized by large intrinsic dimensionality. Hubness is an important phenomenon characterizing data point neighbourhood graphs computed for various types of sequencing datasets. Reducing hubness can be beneficial for the analysis of scRNAseq data with large intrinsic dimensionality in which case it can be an alternative to drastic dimensionality reduction. AVAILABILITY AND IMPLEMENTATION The code used to analyze the datasets and produce the figures of this article is available from https://github.com/sysbio-curie/schubness. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Elise Amblard
- Université de Paris, INSERM, HIPI, F-75010 Paris, France
| | - Jonathan Bac
- Institut Curie, PSL Research University, F-75005 Paris, France.,INSERM, U900, F-75005 Paris, France.,CBIO-Centre for Computational Biology, Mines ParisTech, PSL Research University, 75006 Paris, France
| | - Alexander Chervov
- Institut Curie, PSL Research University, F-75005 Paris, France.,INSERM, U900, F-75005 Paris, France.,CBIO-Centre for Computational Biology, Mines ParisTech, PSL Research University, 75006 Paris, France
| | | | - Andrei Zinovyev
- Institut Curie, PSL Research University, F-75005 Paris, France.,INSERM, U900, F-75005 Paris, France.,CBIO-Centre for Computational Biology, Mines ParisTech, PSL Research University, 75006 Paris, France.,Laboratory of Advanced Methods for High-Dimensional Data Analysis, Lobachevsky University, 603000 Nizhny Novgorod, Russia
| |
Collapse
|