Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Bac J, Mirkes EM, Gorban AN, Tyukin I, Zinovyev A. Scikit-Dimension: A Python Package for Intrinsic Dimension Estimation. Entropy (Basel) 2021;23:1368. [PMID: 34682092 PMCID: PMC8534554 DOI: 10.3390/e23101368] [Citation(s) in RCA: 28] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/17/2021] [Revised: 10/10/2021] [Accepted: 10/16/2021] [Indexed: 02/07/2023]

For:	Bac J, Mirkes EM, Gorban AN, Tyukin I, Zinovyev A. Scikit-Dimension: A Python Package for Intrinsic Dimension Estimation. Entropy (Basel) 2021;23:1368. [PMID: 34682092 PMCID: PMC8534554 DOI: 10.3390/e23101368] [Citation(s) in RCA: 28] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/17/2021] [Revised: 10/10/2021] [Accepted: 10/16/2021] [Indexed: 02/07/2023]

Number

Cited by Other Article(s)

Zulfat M, Hakami MA, Hazazi A, Mahmood A, Khalid A, Alqurashi RS, Abdalla AN, Hu J, Wadood A, Huang X. Identification of novel NLRP3 inhibitors as therapeutic options for epilepsy by machine learning-based virtual screening, molecular docking and biomolecular simulation studies. Heliyon 2024;10:e34410. [PMID: 39170440 PMCID: PMC11336274 DOI: 10.1016/j.heliyon.2024.e34410] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2023] [Revised: 07/06/2024] [Accepted: 07/09/2024] [Indexed: 08/23/2024] Open

Abstract

The NOD-Like Receptor Protein-3 (NLRP3) inflammasome is a key therapeutic target for the treatment of epilepsy and has been reported to regulate inflammation in several neurological diseases. In this study, a machine learning-based virtual screening strategy has investigated candidate active compounds that inhibit the NLRP3 inflammasome. As machine learning-based virtual screening has the potential to accurately predict protein-ligand binding and reduce false positives outcomes compared to traditional virtual screening. Briefly, classification models were created using Support Vector Machine (SVM), Random Forest (RF), and K-Nearest Neighbor (KNN) machine learning methods. To determine the most crucial features of a molecule's activity, feature selection was carried out. By utilizing 10-fold cross-validation, the created models were analyzed. Among the generated models, the RF model obtained the best results as compared to others. Therefore, the RF model was used as a screening tool against the large chemical databases. Molecular operating environment (MOE) and PyRx software's were applied for molecular docking. Also, using the Amber Tools program, molecular dynamics (MD) simulation of potent inhibitors was carried out. The results showed that the KNN, SVM, and RF accuracy was 0.911 %, 0.906 %, and 0.946 %, respectively. Moreover, the model has shown sensitivity of 0.82 %, 0.78 %, and 0.86 % and specificity of 0.95 %, 0.96 %, and 0.98 % respectively. By applying the model to the ZINC and South African databases, we identified 98 and 39 compounds, respectively, potentially possessing anti-NLRP3 activity. Also, a molecular docking analysis produced ten ZINC and seven South African compounds that has comparable binding affinities to the reference drug. Moreover, MD analysis of the two complexes revealed that the two compounds (ZINC000009601348 and SANC00225) form stable complexes with varying amounts of binding energy. The in-silico studies indicate that both compounds most likely display their inhibitory effect by inhibiting the NLRP3 protein.

Collapse

Valdés JJ, Tchagang AB. Novel machine learning insights into the QM7b and QM9 quantum mechanics datasets. J Comput Chem 2024;45:1193-1214. [PMID: 38329198 DOI: 10.1002/jcc.27295] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2023] [Revised: 12/06/2023] [Accepted: 12/12/2023] [Indexed: 02/09/2024]

Abstract

This paper (i) explores the internal structure of two quantum mechanics datasets (QM7b, QM9), composed of several thousands of organic molecules and described in terms of electronic properties, and (ii) further explores an inverse design approach to molecular design consisting of using machine learning methods to approximate the atomic composition of molecules, using QM9 data. Understanding the structure and characteristics of this kind of data is important when predicting the atomic composition from physical-chemical properties in inverse molecular designs. Intrinsic dimension analysis, clustering, and outlier detection methods were used in the study. They revealed that for both datasets the intrinsic dimensionality is several times smaller than the descriptive dimensions. The QM7b data is composed of well-defined clusters related to atomic composition. The QM9 data consists of an outer region predominantly composed of outliers, and an inner, core region that concentrates clustered inliner objects. A significant relationship exists between the number of atoms in the molecule and its outlier/inliner nature. The spatial structure exhibits a relationship with molecular weight. Despite the structural differences between the two datasets, the predictability of variables of interest for inverse molecular design is high. This is exemplified by models estimating the number of atoms of the molecule from both the original properties and from lower dimensional embedding spaces. In the generative approach the input is given by a set of desired properties of the molecule and the output is an approximation of the atomic composition in terms of its constituent chemical elements. This could serve as the starting region for further search in the huge space determined by the set of possible chemical compounds. The quantum mechanic's dataset QM9 is used in the study, composed of 133,885 small organic molecules and 19 electronic properties. Different multi-target regression approaches were considered for predicting the atomic composition from the properties, including feature engineering techniques in an auto-machine learning framework. High-quality models were found that predict the atomic composition of the molecules from their electronic properties, as well as from a subset of only 52.6% size. Feature selection worked better than feature generation. The results validate the generative approach to inverse molecular design.

Collapse

Balwani A, Cho S, Choi H. Exploring the Architectural Biases of the Canonical Cortical Microcircuit. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.05.23.595629. [PMID: 38826320 PMCID: PMC11142214 DOI: 10.1101/2024.05.23.595629] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/04/2024]

Abhishek K, Brown CJ, Hamarneh G. Multi-sample ζ-mixup: richer, more realistic synthetic samples from a p-series interpolant. JOURNAL OF BIG DATA 2024;11:43. [PMID: 38528850 PMCID: PMC10960781 DOI: 10.1186/s40537-024-00898-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 02/20/2023] [Accepted: 02/28/2024] [Indexed: 03/27/2024]

Jin Y, Yin H, Zhang H, Wang Y, Liu S, Yang L, Song B. Predicting tumor deposits in rectal cancer: a combined deep learning model using T2-MR imaging and clinical features. Insights Imaging 2023;14:221. [PMID: 38117396 PMCID: PMC10733230 DOI: 10.1186/s13244-023-01564-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2023] [Accepted: 11/05/2023] [Indexed: 12/21/2023] Open

Abstract

BACKGROUND

Tumor deposits (TDs) are associated with poor prognosis in rectal cancer (RC). This study aims to develop and validate a deep learning (DL) model incorporating T2-MR image and clinical factors for the preoperative prediction of TDs in RC patients.

METHODS AND METHODS

A total of 327 RC patients with pathologically confirmed TDs status from January 2016 to December 2019 were retrospectively recruited, and the T2-MR images and clinical variables were collected. Patients were randomly split into a development dataset (n = 246) and an independent testing dataset (n = 81). A single-channel DL model, a multi-channel DL model, a hybrid DL model, and a clinical model were constructed. The performance of these predictive models was assessed by using receiver operating characteristics (ROC) analysis and decision curve analysis (DCA).

RESULTS

The areas under the curves (AUCs) of the clinical, single-DL, multi-DL, and hybrid-DL models were 0.734 (95% CI, 0.674-0.788), 0.710 (95% CI, 0.649-0.766), 0.767 (95% CI, 0.710-0.819), and 0.857 (95% CI, 0.807-0.898) in the development dataset. The AUC of the hybrid-DL model was significantly higher than the single-DL and multi-DL models (both p < 0.001) in the development dataset, and the single-DL model (p = 0.028) in the testing dataset. Decision curve analysis demonstrated the hybrid-DL model had higher net benefit than other models across the majority range of threshold probabilities.

CONCLUSIONS

The proposed hybrid-DL model achieved good predictive efficacy and could be used to predict tumor deposits in rectal cancer.

CRITICAL RELEVANCE STATEMENT

The proposed hybrid-DL model achieved good predictive efficacy and could be used to predict tumor deposits in rectal cancer.

KEY POINTS

• Preoperative non-invasive identification of TDs is of great clinical significance. • The combined hybrid-DL model achieved good predictive efficacy and could be used to predict tumor deposits in rectal cancer. • A preoperative nomogram provides gastroenterologist with an accurate and effective tool.

Collapse

Swinburne TD. Coarse-Graining and Forecasting Atomic Material Simulations with Descriptors. PHYSICAL REVIEW LETTERS 2023;131:236101. [PMID: 38134806 DOI: 10.1103/physrevlett.131.236101] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/10/2023] [Revised: 07/21/2023] [Accepted: 11/13/2023] [Indexed: 12/24/2023]

Flahaut M, Leprohon P, Pham NP, Gingras H, Bourbeau J, Papadopoulou B, Maltais F, Ouellette M. Distinctive features of the oropharyngeal microbiome in Inuit of Nunavik and correlations of mild to moderate bronchial obstruction with dysbiosis. Sci Rep 2023;13:16622. [PMID: 37789055 PMCID: PMC10547696 DOI: 10.1038/s41598-023-43821-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2023] [Accepted: 09/28/2023] [Indexed: 10/05/2023] Open

Wickersham M, Bartelo N, Kulm S, Liu Y, Zhang Y, Elemento O. USING MACHINE LEARNING METHODS TO ASSESS THE RISK OF ALCOHOL MISUSE IN OLDER ADULTS. RESEARCH SQUARE 2023:rs.3.rs-3154584. [PMID: 37886491 PMCID: PMC10602059 DOI: 10.21203/rs.3.rs-3154584/v1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/28/2023]

Wu J, Li C, Gao P, Zhang C, Zhang P, Zhang L, Dai C, Zhang K, Shi B, Liu M, Zheng J, Pan B, Chen Z, Zhang C, Liao W, Pan W, Fang W, Chen C. Intestinal microbiota links to allograft stability after lung transplantation: a prospective cohort study. Signal Transduct Target Ther 2023;8:326. [PMID: 37652953 PMCID: PMC10471611 DOI: 10.1038/s41392-023-01515-3] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2022] [Revised: 05/17/2023] [Accepted: 05/28/2023] [Indexed: 09/02/2023] Open

Affiliation(s)

Junqi Wu Department of Thoracic Surgery, Shanghai Pulmonary Hospital, School of Medicine, Tongji University, Shanghai, China Shanghai Engineering Research Center of Lung Transplantation, Shanghai, China
Chongwu Li Department of Thoracic Surgery, Shanghai Pulmonary Hospital, School of Medicine, Tongji University, Shanghai, China Shanghai Engineering Research Center of Lung Transplantation, Shanghai, China
Peigen Gao Department of Thoracic Surgery, Shanghai Pulmonary Hospital, School of Medicine, Tongji University, Shanghai, China Shanghai Engineering Research Center of Lung Transplantation, Shanghai, China
Chenhong Zhang State Key Laboratory of Microbial Metabolism, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, China
Pei Zhang Department of Thoracic Surgery, Shanghai Pulmonary Hospital, School of Medicine, Tongji University, Shanghai, China Shanghai Engineering Research Center of Lung Transplantation, Shanghai, China
Lei Zhang Department of Thoracic Surgery, Shanghai Pulmonary Hospital, School of Medicine, Tongji University, Shanghai, China Shanghai Engineering Research Center of Lung Transplantation, Shanghai, China
Chenyang Dai Department of Thoracic Surgery, Shanghai Pulmonary Hospital, School of Medicine, Tongji University, Shanghai, China Shanghai Engineering Research Center of Lung Transplantation, Shanghai, China
Kunpeng Zhang Department of Thoracic Surgery, Shanghai Pulmonary Hospital, School of Medicine, Tongji University, Shanghai, China Shanghai Engineering Research Center of Lung Transplantation, Shanghai, China
Bowen Shi Department of Thoracic Surgery, Changhai Hospital, Naval Medical University, Shanghai, China
Mengyang Liu Department of Thoracic Surgery, The First Affiliated Hospital of Guangzhou Medical University, Guangzhou, China
Junmeng Zheng Department of Cardiovascular Surgery, Sun Yat-sen Memorial Hospital, Sun Yat-sen University, Guangzhou, China
Bo Pan Department of Dermatology, Shanghai Key Laboratory of Molecular Medical Mycology, Shanghai Changzheng Hospital, Naval Medical University, Shanghai, China
Zhan Chen Adfontes (Shanghai) Bio-technology Co., Ltd, Shanghai, China
Chao Zhang Department of Dermatology, Shanghai Key Laboratory of Molecular Medical Mycology, Shanghai Changzheng Hospital, Naval Medical University, Shanghai, China
Wanqing Liao Department of Dermatology, Shanghai Key Laboratory of Molecular Medical Mycology, Shanghai Changzheng Hospital, Naval Medical University, Shanghai, China
Weihua Pan Department of Dermatology, Shanghai Key Laboratory of Molecular Medical Mycology, Shanghai Changzheng Hospital, Naval Medical University, Shanghai, China.
Wenjie Fang Department of Dermatology, Shanghai Key Laboratory of Molecular Medical Mycology, Shanghai Changzheng Hospital, Naval Medical University, Shanghai, China.
Chang Chen Department of Thoracic Surgery, Shanghai Pulmonary Hospital, School of Medicine, Tongji University, Shanghai, China. Shanghai Engineering Research Center of Lung Transplantation, Shanghai, China.

Collapse

Wang Z, Sun L, Xu Y, Liang P, Xu K, Huang J. Discovery of novel JAK1 inhibitors through combining machine learning, structure-based pharmacophore modeling and bio-evaluation. J Transl Med 2023;21:579. [PMID: 37641144 PMCID: PMC10464202 DOI: 10.1186/s12967-023-04443-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2023] [Accepted: 08/16/2023] [Indexed: 08/31/2023] Open

Abstract

BACKGROUND

Janus kinase 1 (JAK1) plays a critical role in most cytokine-mediated inflammatory, autoimmune responses and various cancers via the JAK/STAT signaling pathway. Inhibition of JAK1 is therefore an attractive therapeutic strategy for several diseases. Recently, high-performance machine learning techniques have been increasingly applied in virtual screening to develop new kinase inhibitors. Our study aimed to develop a novel layered virtual screening method based on machine learning (ML) and pharmacophore models to identify the potential JAK1 inhibitors.

METHODS

Firstly, we constructed a high-quality dataset comprising 3834 JAK1 inhibitors and 12,230 decoys, followed by establishing a series of classification models based on a combination of three molecular descriptors and six ML algorithms. To further screen potential compounds, we constructed several pharmacophore models based on Hiphop and receptor-ligand algorithms. We then used molecular docking to filter the recognized compounds. Finally, the binding stability and enzyme inhibition activity of the identified compounds were assessed by molecular dynamics (MD) simulations and in vitro enzyme activity tests.

RESULTS

The best performance ML model DNN-ECFP4 and two pharmacophore models Hiphop3 and 6TPF 08 were utilized to screen the ZINC database. A total of 13 potentially active compounds were screened and the MD results demonstrated that all of the above molecules could bind with JAK1 stably in dynamic conditions. Among the shortlisted compounds, the four purchasable compounds demonstrated significant kinase inhibition activity, with Z-10 being the most active (IC50 = 194.9 nM).

CONCLUSION

The current study provides an efficient and accurate integrated model. The hit compounds were promising candidates for the further development of novel JAK1 inhibitors.

Collapse

Wang Y, Shi Y, Zhang C, Su K, Hu Y, Chen L, Wu Y, Huang H. Fetal weight estimation based on deep neural network: a retrospective observational study. BMC Pregnancy Childbirth 2023;23:560. [PMID: 37533038 PMCID: PMC10394792 DOI: 10.1186/s12884-023-05819-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2022] [Accepted: 06/27/2023] [Indexed: 08/04/2023] Open

Affiliation(s)

Yifei Wang International Peace Maternity and Child Health Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, 200030, China Shanghai Key Laboratory of Embryo Original Diseases, Shanghai, 200030, China
Yi Shi Bio-X Institutes, Key Laboratory for the Genetics of Developmental and Neuropsychiatric Disorders, Shanghai Jiao Tong University, Shanghai, 200030, China
Chenjie Zhang International Peace Maternity and Child Health Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, 200030, China Shanghai Key Laboratory of Embryo Original Diseases, Shanghai, 200030, China
Kaizhen Su International Peace Maternity and Child Health Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, 200030, China Shanghai Key Laboratory of Embryo Original Diseases, Shanghai, 200030, China
Yixiao Hu Department of Mathematical Sciences, Tsinghua University, Beijing, 100084, China
Lei Chen International Peace Maternity and Child Health Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, 200030, China
Yanting Wu Obstetrics and Gynecology Hospital, Institute of Reproduction and Development, Fudan University, Shanghai, 200011, China. Research Units of Embryo Original Diseases, Chinese Academy of Medical Sciences, , Shanghai, China.
Hefeng Huang International Peace Maternity and Child Health Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, 200030, China. Shanghai Key Laboratory of Embryo Original Diseases, Shanghai, 200030, China. Obstetrics and Gynecology Hospital, Institute of Reproduction and Development, Fudan University, Shanghai, 200011, China. Research Units of Embryo Original Diseases, Chinese Academy of Medical Sciences, , Shanghai, China. Research Units of Embryo Original Diseases (No. 2019RU056), Chinese Academy of Medical Sciences, Shanghai, China.

Collapse

Gonzalez-Castillo J, Fernandez IS, Lam KC, Handwerker DA, Pereira F, Bandettini PA. Manifold learning for fMRI time-varying functional connectivity. Front Hum Neurosci 2023;17:1134012. [PMID: 37497043 PMCID: PMC10366614 DOI: 10.3389/fnhum.2023.1134012] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2022] [Accepted: 06/21/2023] [Indexed: 07/28/2023] Open

Abstract

Whole-brain functional connectivity (FC) measured with functional MRI (fMRI) evolves over time in meaningful ways at temporal scales going from years (e.g., development) to seconds [e.g., within-scan time-varying FC (tvFC)]. Yet, our ability to explore tvFC is severely constrained by its large dimensionality (several thousands). To overcome this difficulty, researchers often seek to generate low dimensional representations (e.g., 2D and 3D scatter plots) hoping those will retain important aspects of the data (e.g., relationships to behavior and disease progression). Limited prior empirical work suggests that manifold learning techniques (MLTs)-namely those seeking to infer a low dimensional non-linear surface (i.e., the manifold) where most of the data lies-are good candidates for accomplishing this task. Here we explore this possibility in detail. First, we discuss why one should expect tvFC data to lie on a low dimensional manifold. Second, we estimate what is the intrinsic dimension (ID; i.e., minimum number of latent dimensions) of tvFC data manifolds. Third, we describe the inner workings of three state-of-the-art MLTs: Laplacian Eigenmaps (LEs), T-distributed Stochastic Neighbor Embedding (T-SNE), and Uniform Manifold Approximation and Projection (UMAP). For each method, we empirically evaluate its ability to generate neuro-biologically meaningful representations of tvFC data, as well as their robustness against hyper-parameter selection. Our results show that tvFC data has an ID that ranges between 4 and 26, and that ID varies significantly between rest and task states. We also show how all three methods can effectively capture subject identity and task being performed: UMAP and T-SNE can capture these two levels of detail concurrently, but LE could only capture one at a time. We observed substantial variability in embedding quality across MLTs, and within-MLT as a function of hyper-parameter selection. To help alleviate this issue, we provide heuristics that can inform future studies. Finally, we also demonstrate the importance of feature normalization when combining data across subjects and the role that temporal autocorrelation plays in the application of MLTs to tvFC data. Overall, we conclude that while MLTs can be useful to generate summary views of labeled tvFC data, their application to unlabeled data such as resting-state remains challenging.

Collapse

Lysov M, Pukhkiy K, Vasiliev E, Getmanskaya A, Turlapov V. Ensuring Explainability and Dimensionality Reduction in a Multidimensional HSI World for Early XAI-Diagnostics of Plant Stress. ENTROPY (BASEL, SWITZERLAND) 2023;25:e25050801. [PMID: 37238556 DOI: 10.3390/e25050801] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/10/2023] [Revised: 05/08/2023] [Accepted: 05/08/2023] [Indexed: 05/28/2023]

Abstract

This work is mostly devoted to the search for effective solutions to the problem of early diagnosis of plant stress (given an example of wheat and its drought stress), which would be based on explainable artificial intelligence (XAI). The main idea is to combine the benefits of two of the most popular agricultural data sources, hyperspectral images (HSI) and thermal infrared images (TIR), in a single XAI model. Our own dataset of a 25-day experiment was used, which was created via both (1) an HSI camera Specim IQ (400-1000 nm, 204, 512 × 512) and (2) a TIR camera Testo 885-2 (320 × 240, res. 0.1 °C). The HSI were a source of the k-dimensional high-level features of plants (k ≤ K, where K is the number of HSI channels) for the learning process. Such combination was implemented as a single-layer perceptron (SLP) regressor, which is the main feature of the XAI model and receives as input an HSI pixel-signature belonging to the plant mask, which then automatically through the mask receives a mark from the TIR. The correlation of HSI channels with the TIR image on the plant's mask on the days of the experiment was studied. It was established that HSI channel 143 (820 nm) was the most correlated with TIR. The problem of training the HSI signatures of plants with their corresponding temperature value via the XAI model was solved. The RMSE of plant temperature prediction is 0.2-0.3 °C, which is acceptable for early diagnostics. Each HSI pixel was represented in training by a number (k) of channels (k ≤ K = 204 in our case). The number of channels used for training was minimized by a factor of 25-30, from 204 to eight or seven, while maintaining the RMSE value. The model is computationally efficient in training; the average training time was much less than one minute (Intel Core i3-8130U, 2.2 GHz, 4 cores, 4 GB). This XAI model can be considered a research-aimed model (R-XAI), which allows the transfer of knowledge about plants from the TIR domain to the HSI domain, with their contrasting onto only a few from hundreds of HSI channels.

Collapse

Koch V, Weitzer N, Dos Santos DP, Gruenewald LD, Mahmoudi S, Martin SS, Eichler K, Bernatz S, Gruber-Rouh T, Booz C, Hammerstingl RM, Biciusca T, Rosbach N, Gökduman A, D'Angelo T, Finkelmeier F, Yel I, Alizadeh LS, Sommer CM, Cengiz D, Vogl TJ, Albrecht MH. Multiparametric detection and outcome prediction of pancreatic cancer involving dual-energy CT, diffusion-weighted MRI, and radiomics. Cancer Imaging 2023;23:38. [PMID: 37072856 PMCID: PMC10114410 DOI: 10.1186/s40644-023-00549-8] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2022] [Accepted: 03/17/2023] [Indexed: 04/20/2023] Open

Abstract

BACKGROUND

The advent of next-generation computed tomography (CT)- and magnetic resonance imaging (MRI) opened many new perspectives in the evaluation of tumor characteristics. An increasing body of evidence suggests the incorporation of quantitative imaging biomarkers into clinical decision-making to provide mineable tissue information. The present study sought to evaluate the diagnostic and predictive value of a multiparametric approach involving radiomics texture analysis, dual-energy CT-derived iodine concentration (DECT-IC), and diffusion-weighted MRI (DWI) in participants with histologically proven pancreatic cancer.

METHODS

In this study, a total of 143 participants (63 years ± 13, 48 females) who underwent third-generation dual-source DECT and DWI between November 2014 and October 2022 were included. Among these, 83 received a final diagnosis of pancreatic cancer, 20 had pancreatitis, and 40 had no evidence of pancreatic pathologies. Data comparisons were performed using chi-square statistic tests, one-way ANOVA, or two-tailed Student's t-test. For the assessment of the association of texture features with overall survival, receiver operating characteristics analysis and Cox regression tests were used.

RESULTS

Malignant pancreatic tissue differed significantly from normal or inflamed tissue regarding radiomics features (overall P < .001, respectively) and iodine uptake (overall P < .001, respectively). The performance for the distinction of malignant from normal or inflamed pancreatic tissue ranged between an AUC of ≥ 0.995 (95% CI, 0.955-1.0; P < .001) for radiomics features, ≥ 0.852 (95% CI, 0.767-0.914; P < .001) for DECT-IC, and ≥ 0.690 (95% CI, 0.587-0.780; P = .01) for DWI, respectively. During a follow-up of 14 ± 12 months (range, 10-44 months), the multiparametric approach showed a moderate prognostic power to predict all-cause mortality (c-index = 0.778 [95% CI, 0.697-0.864], P = .01).

CONCLUSIONS

Our reported multiparametric approach allowed for accurate discrimination of pancreatic cancer and revealed great potential to provide independent prognostic information on all-cause mortality.

Collapse

Affiliation(s)

Vitali Koch Department of Diagnostic and Interventional Radiology, University Hospital Frankfurt, Theodor-Stern-Kai 7, Frankfurt am Main, 60590, Germany.
Nils Weitzer Department of Diagnostic and Interventional Radiology, University Hospital Frankfurt, Theodor-Stern-Kai 7, Frankfurt am Main, 60590, Germany
Daniel Pinto Dos Santos Department of Diagnostic and Interventional Radiology, University Hospital Frankfurt, Theodor-Stern-Kai 7, Frankfurt am Main, 60590, Germany
Leon D Gruenewald Department of Diagnostic and Interventional Radiology, University Hospital Frankfurt, Theodor-Stern-Kai 7, Frankfurt am Main, 60590, Germany
Scherwin Mahmoudi Department of Diagnostic and Interventional Radiology, University Hospital Frankfurt, Theodor-Stern-Kai 7, Frankfurt am Main, 60590, Germany
Simon S Martin Department of Diagnostic and Interventional Radiology, University Hospital Frankfurt, Theodor-Stern-Kai 7, Frankfurt am Main, 60590, Germany
Katrin Eichler Department of Diagnostic and Interventional Radiology, University Hospital Frankfurt, Theodor-Stern-Kai 7, Frankfurt am Main, 60590, Germany
Simon Bernatz Department of Diagnostic and Interventional Radiology, University Hospital Frankfurt, Theodor-Stern-Kai 7, Frankfurt am Main, 60590, Germany
Tatjana Gruber-Rouh Department of Diagnostic and Interventional Radiology, University Hospital Frankfurt, Theodor-Stern-Kai 7, Frankfurt am Main, 60590, Germany
Christian Booz Department of Diagnostic and Interventional Radiology, University Hospital Frankfurt, Theodor-Stern-Kai 7, Frankfurt am Main, 60590, Germany
Renate M Hammerstingl Department of Diagnostic and Interventional Radiology, University Hospital Frankfurt, Theodor-Stern-Kai 7, Frankfurt am Main, 60590, Germany
Teodora Biciusca Department of Diagnostic and Interventional Radiology, University Hospital Frankfurt, Theodor-Stern-Kai 7, Frankfurt am Main, 60590, Germany
Nicolas Rosbach Department of Diagnostic and Interventional Radiology, University Hospital Frankfurt, Theodor-Stern-Kai 7, Frankfurt am Main, 60590, Germany
Aynur Gökduman Department of Diagnostic and Interventional Radiology, University Hospital Frankfurt, Theodor-Stern-Kai 7, Frankfurt am Main, 60590, Germany
Tommaso D'Angelo Department of Biomedical Sciences and Morphological and Functional Imaging, University Hospital Messina, Messina, Italy
Fabian Finkelmeier Department of Internal Medicine, University Hospital Frankfurt, Frankfurt Am Main, Germany
Ibrahim Yel Department of Diagnostic and Interventional Radiology, University Hospital Frankfurt, Theodor-Stern-Kai 7, Frankfurt am Main, 60590, Germany
Leona S Alizadeh Department of Diagnostic and Interventional Radiology, University Hospital Frankfurt, Theodor-Stern-Kai 7, Frankfurt am Main, 60590, Germany
Christof M Sommer Clinic of Diagnostic and Interventional Radiology, Heidelberg University Hospital, Heidelberg, Germany
Duygu Cengiz Department of Radiology, University of Koc School of Medicine, Istanbul, Turkey
Thomas J Vogl Department of Diagnostic and Interventional Radiology, University Hospital Frankfurt, Theodor-Stern-Kai 7, Frankfurt am Main, 60590, Germany
Moritz H Albrecht Department of Diagnostic and Interventional Radiology, University Hospital Frankfurt, Theodor-Stern-Kai 7, Frankfurt am Main, 60590, Germany

Collapse

Śliwowski M, Martin M, Souloumiac A, Blanchart P, Aksenova T. Impact of dataset size and long-term ECoG-based BCI usage on deep learning decoders performance. Front Hum Neurosci 2023;17:1111645. [PMID: 37007675 PMCID: PMC10061076 DOI: 10.3389/fnhum.2023.1111645] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2022] [Accepted: 02/27/2023] [Indexed: 03/18/2023] Open

Abstract IntroductionIn brain-computer interfaces (BCI) research, recording data is time-consuming and expensive, which limits access to big datasets. This may influence the BCI system performance as machine learning methods depend strongly on the training dataset size. Important questions arise: taking into account neuronal signal characteristics (e.g., non-stationarity), can we achieve higher decoding performance with more data to train decoders? What is the perspective for further improvement with time in the case of long-term BCI studies? In this study, we investigated the impact of long-term recordings on motor imagery decoding from two main perspectives: model requirements regarding dataset size and potential for patient adaptation.MethodsWe evaluated the multilinear model and two deep learning (DL) models on a long-term BCI & Tetraplegia (ClinicalTrials.gov identifier: NCT02550522) clinical trial dataset containing 43 sessions of ECoG recordings performed with a tetraplegic patient. In the experiment, a participant executed 3D virtual hand translation using motor imagery patterns. We designed multiple computational experiments in which training datasets were increased or translated to investigate the relationship between models' performance and different factors influencing recordings.ResultsOur results showed that DL decoders showed similar requirements regarding the dataset size compared to the multilinear model while demonstrating higher decoding performance. Moreover, high decoding performance was obtained with relatively small datasets recorded later in the experiment, suggesting motor imagery patterns improvement and patient adaptation during the long-term experiment. Finally, we proposed UMAP embeddings and local intrinsic dimensionality as a way to visualize the data and potentially evaluate data quality.DiscussionDL-based decoding is a prospective approach in BCI which may be efficiently applied with real-life dataset size. Patient-decoder co-adaptation is an important factor to consider in long-term clinical BCI. Collapse

Gonzalez-Castillo J, Fernandez I, Lam KC, Handwerker DA, Pereira F, Bandettini PA. Manifold Learning for fMRI time-varying FC. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.01.14.523992. [PMID: 36789436 PMCID: PMC9928030 DOI: 10.1101/2023.01.14.523992] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/18/2023]

Abstract

Whole-brain functional connectivity ( FC ) measured with functional MRI (fMRI) evolve over time in meaningful ways at temporal scales going from years (e.g., development) to seconds (e.g., within-scan time-varying FC ( tvFC )). Yet, our ability to explore tvFC is severely constrained by its large dimensionality (several thousands). To overcome this difficulty, researchers seek to generate low dimensional representations (e.g., 2D and 3D scatter plots) expected to retain its most informative aspects (e.g., relationships to behavior, disease progression). Limited prior empirical work suggests that manifold learning techniques ( MLTs )-namely those seeking to infer a low dimensional non-linear surface (i.e., the manifold) where most of the data lies-are good candidates for accomplishing this task. Here we explore this possibility in detail. First, we discuss why one should expect tv FC data to lie on a low dimensional manifold. Second, we estimate what is the intrinsic dimension (i.e., minimum number of latent dimensions; ID ) of tvFC data manifolds. Third, we describe the inner workings of three state-of-the-art MLTs : Laplacian Eigenmaps ( LE ), T-distributed Stochastic Neighbor Embedding ( T-SNE ), and Uniform Manifold Approximation and Projection ( UMAP ). For each method, we empirically evaluate its ability to generate neuro-biologically meaningful representations of tvFC data, as well as their robustness against hyper-parameter selection. Our results show that tvFC data has an ID that ranges between 4 and 26, and that ID varies significantly between rest and task states. We also show how all three methods can effectively capture subject identity and task being performed: UMAP and T-SNE can capture these two levels of detail concurrently, but L E could only capture one at a time. We observed substantial variability in embedding quality across MLTs , and within- MLT as a function of hyper-parameter selection. To help alleviate this issue, we provide heuristics that can inform future studies. Finally, we also demonstrate the importance of feature normalization when combining data across subjects and the role that temporal autocorrelation plays in the application of MLTs to tvFC data. Overall, we conclude that while MLTs can be useful to generate summary views of labeled tvFC data, their application to unlabeled data such as resting-state remains challenging.

Collapse

Dunin-Barkowski W, Gorban A. Editorial: Toward and beyond human-level AI, volume II. Front Neurorobot 2023;16:1120167. [PMID: 36687208 PMCID: PMC9853958 DOI: 10.3389/fnbot.2022.1120167] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2022] [Accepted: 12/13/2022] [Indexed: 01/07/2023] Open

Mirkes EM, Bac J, Fouché A, Stasenko SV, Zinovyev A, Gorban AN. Domain Adaptation Principal Component Analysis: Base Linear Method for Learning with Out-of-Distribution Data. ENTROPY (BASEL, SWITZERLAND) 2022;25:33. [PMID: 36673174 PMCID: PMC9858254 DOI: 10.3390/e25010033] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/26/2022] [Revised: 12/18/2022] [Accepted: 12/21/2022] [Indexed: 06/17/2023]

Lysov M, Maximova I, Vasiliev E, Getmanskaya A, Turlapov V. Entropy as a High-Level Feature for XAI-Based Early Plant Stress Detection. ENTROPY (BASEL, SWITZERLAND) 2022;24:1597. [PMID: 36359687 PMCID: PMC9689005 DOI: 10.3390/e24111597] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/19/2022] [Revised: 10/17/2022] [Accepted: 10/26/2022] [Indexed: 06/16/2023]

Roy T, Sharma K, Dhall A, Patiyal S, Raghava GPS. In silico method for predicting infectious strains of influenza A virus from its genome and protein sequences. J Gen Virol 2022;103. [PMID: 36318663 DOI: 10.1099/jgv.0.001802] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/16/2023] Open

Khan MI, Park T, Imran MA, Gowda Saralamma VV, Lee DC, Choi J, Baig MH, Dong JJ. Development of machine learning models for the screening of potential HSP90 inhibitors. Front Mol Biosci 2022;9:967510. [PMID: 36339714 PMCID: PMC9626531 DOI: 10.3389/fmolb.2022.967510] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2022] [Accepted: 08/15/2022] [Indexed: 11/18/2022] Open

Sharma T, Saralamma VVG, Lee DC, Imran MA, Choi J, Baig MH, Dong JJ. Combining structure-based pharmacophore modeling and machine learning for the identification of novel BTK inhibitors. Int J Biol Macromol 2022;222:239-250. [PMID: 36130643 DOI: 10.1016/j.ijbiomac.2022.09.151] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2022] [Revised: 09/13/2022] [Accepted: 09/16/2022] [Indexed: 11/05/2022]

He Y, Liu K, Han L, Han W. Clustering Analysis, Structure Fingerprint Analysis, and Quantum Chemical Calculations of Compounds from Essential Oils of Sunflower (Helianthus annuus L.) Receptacles. Int J Mol Sci 2022;23:ijms231710169. [PMID: 36077567 PMCID: PMC9456235 DOI: 10.3390/ijms231710169] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2022] [Revised: 08/25/2022] [Accepted: 08/30/2022] [Indexed: 11/25/2022] Open

Liu X, Shu Y, Yu P, Li H, Duan W, Wei Z, Li K, Xie W, Zeng Y, Peng D. Classification of severe obstructive sleep apnea with cognitive impairment using degree centrality: A machine learning analysis. Front Neurol 2022;13:1005650. [PMID: 36090863 PMCID: PMC9453022 DOI: 10.3389/fneur.2022.1005650] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2022] [Accepted: 08/11/2022] [Indexed: 11/24/2022] Open

Liu Z, Bhattacharya S, Maiti T. Variational Bayes Ensemble Learning Neural Networks With Compressed Feature Space. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2022;PP:1379-1385. [PMID: 35584070 DOI: 10.1109/tnnls.2022.3172276] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]

Pinar-Sanchez J, Bermejo López P, Solís García Del Pozo J, Redondo-Ruiz J, Navarro Casado L, Andres-Pretel F, Celorrio Bustillo ML, Esparcia Moreno M, García Ruiz S, Solera Santos JJ, Navarro Bravo B. Common Laboratory Parameters Are Useful for Screening for Alcohol Use Disorder: Designing a Predictive Model Using Machine Learning. J Clin Med 2022;11:2061. [PMID: 35407669 PMCID: PMC8999878 DOI: 10.3390/jcm11072061] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2022] [Revised: 04/01/2022] [Accepted: 04/03/2022] [Indexed: 11/16/2022] Open

Abstract

The diagnosis of alcohol use disorder (AUD) remains a difficult challenge, and some patients may not be adequately diagnosed. This study aims to identify an optimum combination of laboratory markers to detect alcohol consumption, using data science. An analytical observational study was conducted with 337 subjects (253 men and 83 women, with a mean age of 44 years (10.61 Standard Deviation (SD)). The first group included 204 participants being treated in the Addictive Behaviors Unit (ABU) from Albacete (Spain). They met the diagnostic criteria for AUD specified in the Diagnostic and Statistical Manual of mental disorders fifth edition (DSM-5). The second group included 133 blood donors (people with no risk of AUD), recruited by cross-section. All participants were also divided in two groups according to the WHO classification for risk of alcohol consumption in Spain, that is, males drinking more than 28 standard drink units (SDUs) or women drinking more than 17 SDUs. Medical history and laboratory markers were selected from our hospital's database. A correlation between alterations in laboratory markers and the amount of alcohol consumed was established. We then created three predicted models (with logistic regression, classification tree, and Bayesian network) to detect risk of alcohol consumption by using laboratory markers as predictive features. For the execution of the selection of variables and the creation and validation of predictive models, two tools were used: the scikit-learn library for Python, and the Weka application. The logistic regression model provided a maximum AUD prediction accuracy of 85.07%. Secondly, the classification tree provided a lower accuracy of 79.4%, but easier interpretation. Finally, the Naive Bayes network had an accuracy of 87.46%. The combination of several common biochemical markers and the use of data science can enhance detection of AUD, helping to prevent future medical complications derived from AUD.

Collapse

Ding W, Wu L, Li X, Chang L, Liu G, Du H. Comprehensive analysis of competitive endogenous RNAs network: Identification and validation of prediction model composed of mRNA signature and miRNA signature in gastric cancer. Oncol Lett 2022;23:150. [PMID: 35350591 PMCID: PMC8941526 DOI: 10.3892/ol.2022.13270] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2021] [Accepted: 02/22/2022] [Indexed: 11/18/2022] Open

Abstract

Gastric cancer (GC), one of the most lethal malignant tumors, is highly aggressive with a poor prognosis, while the molecular mechanisms underlying it remain largely unknown. Although advanced imaging techniques and comprehensive treatment facilitate the diagnosis and survival of some GC patients, the precise diagnosis and prognosis are still a challenge. The present study used publicly available gene expression profiles from The Cancer Genome Atlas and Gene Expression Omnibus datasets including mRNA, micro (mi)RNA and circular (circ)RNA of GC to establish a competing endogenous RNA network (ceRNA). Further, the present study performed least absolute shrinkage and selector operator regression analysis on the hub RNAs to establish a prediction model with mRNA and miRNA. The ceRNA network contained 109 edges and 56 nodes and the visible network contains 13 miRNAs, 9 circRNAs and 34 mRNAs. The five mRNA-based signature were CTF1, FKBP5, RNF128, GSTM2 and ADAMTS1. The area under curve (AUC) value of the diagnosis training cohort was 0.9975. The prognosis of the high-risk group (RiskScore >4.664) was worse compared with that of the low-risk group (RiskScore ≤4.664; P<0.05) in the training cohort. The five miRNA-based signature were miR-145-5p, miR-615-3p, miR-6507-5p, miR-937-3p and miR-99a-3p. The AUC value of the diagnosis training cohort was 0.9975. The prognosis of the high-risk group (RiskScore >1.621) was worse compared with that of the low-risk group (RiskScore ≤1.621; P<0.05) in the training cohort. The validation cohorts indicated that both five mRNA and five miRNA-based signatures had strong predictive power in diagnosis and prognosis for GC. In conclusion, a ceRNA network was established for GC and a five mRNA-based signature and a five miRNA-based signature was identified that enabled diagnosis and prognosis of GC by assigning patient to a high-risk group or low-risk group.

Collapse

Zinovyev A, Sadovsky M, Calzone L, Fouché A, Groeneveld CS, Chervov A, Barillot E, Gorban AN. Modeling Progression of Single Cell Populations Through the Cell Cycle as a Sequence of Switches. Front Mol Biosci 2022;8:793912. [PMID: 35178429 PMCID: PMC8846220 DOI: 10.3389/fmolb.2021.793912] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2021] [Accepted: 12/15/2021] [Indexed: 11/13/2022] Open

Amblard E, Bac J, Chervov A, Soumelis V, Zinovyev A. Hubness reduction improves clustering and trajectory inference in single-cell transcriptomic data. Bioinformatics 2022;38:1045-1051. [PMID: 34871374 DOI: 10.1093/bioinformatics/btab795] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2021] [Revised: 11/05/2021] [Accepted: 11/17/2021] [Indexed: 02/03/2023] Open

Abstract

MOTIVATION

Single-cell RNA-seq (scRNAseq) datasets are characterized by large ambient dimensionality, and their analyses can be affected by various manifestations of the dimensionality curse. One of these manifestations is the hubness phenomenon, i.e. existence of data points with surprisingly large incoming connectivity degree in the datapoint neighbourhood graph. Conventional approach to dampen the unwanted effects of high dimension consists in applying drastic dimensionality reduction. It remains unexplored if this step can be avoided thus retaining more information than contained in the low-dimensional projections, by correcting directly hubness.

RESULTS

We investigated hubness in scRNAseq data. We show that hub cells do not represent any visible technical or biological bias. The effect of various hubness reduction methods is investigated with respect to the clustering, trajectory inference and visualization tasks in scRNAseq datasets. We show that hubness reduction generates neighbourhood graphs with properties more suitable for applying machine learning methods; and that it outperforms other state-of-the-art methods for improving neighbourhood graphs. As a consequence, clustering, trajectory inference and visualization perform better, especially for datasets characterized by large intrinsic dimensionality. Hubness is an important phenomenon characterizing data point neighbourhood graphs computed for various types of sequencing datasets. Reducing hubness can be beneficial for the analysis of scRNAseq data with large intrinsic dimensionality in which case it can be an alternative to drastic dimensionality reduction.

AVAILABILITY AND IMPLEMENTATION

The code used to analyze the datasets and produce the figures of this article is available from https://github.com/sysbio-curie/schubness.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

Collapse