1
|
Wang Z, Su J, Feng Y, Xu Q, Wang H, Jiang H. Conversion of hazardous waste into thermal conductive polymer: A prediction and guidance from machine learning. JOURNAL OF ENVIRONMENTAL MANAGEMENT 2024; 370:122864. [PMID: 39405875 DOI: 10.1016/j.jenvman.2024.122864] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/12/2024] [Revised: 09/12/2024] [Accepted: 10/07/2024] [Indexed: 11/17/2024]
Abstract
The preparation methods and thermal conductivity (TC) of the reported thermal conductive polymers vary significantly. A method to clarify the relationship between TC and influencing factors and to reach consistent conclusions is needed. In this study, we compiled 403 sets of data from the literature. Six typical features and three machine learning (ML) algorithms were selected and optimized. XGBoost algorithm achieved the best prediction of TC of thermal conductive polymer (correlation coefficient with 0.855). To further investigate the influence of the 6 features on the TC of thermal conductive polymer, we conducted the SHapley Additive exPlanations (SHAP) analysis. Based on the above results, pyrrhotite tailings were determined as the filler and the corresponding process parameters were also determined. However, the above model built based on literature was still unsatisfactory. We further optimized XGBoost and built XGBoost-Exp through data from the real experiment. Finally, a small percentage (23%) of real experimental data can significantly improve the prediction power of XGBoost-Exp for unseen data (correlation coefficient with 0.815). To summarize, XGBoost-Exp exhibits exceptional predictive performance for the TC of the unseen data, offering valuable insights into the influence of various features. Meanwhile, this method provides a new perspective for the utilization of hazardous sulfide minerals.
Collapse
Affiliation(s)
- Zhiyi Wang
- College of Chemistry and Chemical Engineering, Central South University, Changsha, 410083, Hunan, PR China
| | - Jiming Su
- College of Minerals Processing & Bioengineering, Central South University, Changsha, 410083, Hunan, PR China
| | - Yijin Feng
- College of Chemistry and Chemical Engineering, Central South University, Changsha, 410083, Hunan, PR China
| | - Qianqian Xu
- College of Chemistry and Chemical Engineering, Central South University, Changsha, 410083, Hunan, PR China
| | - Hui Wang
- College of Chemistry and Chemical Engineering, Central South University, Changsha, 410083, Hunan, PR China.
| | - Hongru Jiang
- Key Laboratory of Ministry of Education for Advanced Materials in Tropical Island Resources, School of Chemistry and Chemical Engineering, Hainan University, Haikou, 570228, Hainan, PR China.
| |
Collapse
|
2
|
Bassetti D, Pospíšil L, Horenko I. On Entropic Learning from Noisy Time Series in the Small Data Regime. ENTROPY (BASEL, SWITZERLAND) 2024; 26:553. [PMID: 39056915 PMCID: PMC11276242 DOI: 10.3390/e26070553] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/08/2024] [Revised: 06/24/2024] [Accepted: 06/25/2024] [Indexed: 07/28/2024]
Abstract
In this work, we present a novel methodology for performing the supervised classification of time-ordered noisy data; we call this methodology Entropic Sparse Probabilistic Approximation with Markov regularization (eSPA-Markov). It is an extension of entropic learning methodologies, allowing the simultaneous learning of segmentation patterns, entropy-optimal feature space discretizations, and Bayesian classification rules. We prove the conditions for the existence and uniqueness of the learning problem solution and propose a one-shot numerical learning algorithm that-in the leading order-scales linearly in dimension. We show how this technique can be used for the computationally scalable identification of persistent (metastable) regime affiliations and regime switches from high-dimensional non-stationary and noisy time series, i.e., when the size of the data statistics is small compared to their dimensionality and when the noise variance is larger than the variance in the signal. We demonstrate its performance on a set of toy learning problems, comparing eSPA-Markov to state-of-the-art techniques, including deep learning and random forests. We show how this technique can be used for the analysis of noisy time series from DNA and RNA Nanopore sequencing.
Collapse
Affiliation(s)
- Davide Bassetti
- Faculty of Mathematics, RPTU Kaiserslautern-Landau, Gottlieb-Daimler-Str. 48, 67663 Kaiserslautern, Germany
| | - Lukáš Pospíšil
- Department of Mathematics, Faculty of Civil Engineering, VŠB-TUO, Ludvika Podeste 1875/17, 708 33 Ostrava, Czech Republic;
| | - Illia Horenko
- Faculty of Mathematics, RPTU Kaiserslautern-Landau, Gottlieb-Daimler-Str. 48, 67663 Kaiserslautern, Germany
| |
Collapse
|
3
|
Vecchi E, Bassetti D, Graziato F, Pospíšil L, Horenko I. Gauge-Optimal Approximate Learning for Small Data Classification. Neural Comput 2024; 36:1198-1227. [PMID: 38669692 DOI: 10.1162/neco_a_01664] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2023] [Accepted: 01/16/2024] [Indexed: 04/28/2024]
Abstract
Small data learning problems are characterized by a significant discrepancy between the limited number of response variable observations and the large feature space dimension. In this setting, the common learning tools struggle to identify the features important for the classification task from those that bear no relevant information and cannot derive an appropriate learning rule that allows discriminating among different classes. As a potential solution to this problem, here we exploit the idea of reducing and rotating the feature space in a lower-dimensional gauge and propose the gauge-optimal approximate learning (GOAL) algorithm, which provides an analytically tractable joint solution to the dimension reduction, feature segmentation, and classification problems for small data learning problems. We prove that the optimal solution of the GOAL algorithm consists in piecewise-linear functions in the Euclidean space and that it can be approximated through a monotonically convergent algorithm that presents-under the assumption of a discrete segmentation of the feature space-a closed-form solution for each optimization substep and an overall linear iteration cost scaling. The GOAL algorithm has been compared to other state-of-the-art machine learning tools on both synthetic data and challenging real-world applications from climate science and bioinformatics (i.e., prediction of the El Niño Southern Oscillation and inference of epigenetically induced gene-activity networks from limited experimental data). The experimental results show that the proposed algorithm outperforms the reported best competitors for these problems in both learning performance and computational cost.
Collapse
Affiliation(s)
- Edoardo Vecchi
- Università della Svizzera Italiana, Faculty of Informatics, Institute of Computing, 6962 Lugano, Switzerland
| | - Davide Bassetti
- Technical University of Kaiserslautern, Faculty of Mathematics, Group of Mathematics of AI, 67663 Kaiserslautern, Germany
| | | | - Lukáš Pospíšil
- VSB Ostrava, Department of Mathematics, Ludvika Podeste 1875/17 708 33 Ostrava, Czech Republic
| | - Illia Horenko
- Technical University of Kaiserslautern, Faculty of Mathematics, Group of Mathematics of AI, 67663 Kaiserslautern, Germany
| |
Collapse
|
4
|
Rahimi M, Akbari A, Asadi F, Emami H. Cervical cancer survival prediction by machine learning algorithms: a systematic review. BMC Cancer 2023; 23:341. [PMID: 37055741 PMCID: PMC10103471 DOI: 10.1186/s12885-023-10808-3] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2023] [Accepted: 04/05/2023] [Indexed: 04/15/2023] Open
Abstract
BACKGROUND Cervical cancer is a common malignant tumor of the female reproductive system and is considered a leading cause of mortality in women worldwide. The analysis of time to event, which is crucial for any clinical research, can be well done with the method of survival prediction. This study aims to systematically investigate the use of machine learning to predict survival in patients with cervical cancer. METHOD An electronic search of the PubMed, Scopus, and Web of Science databases was performed on October 1, 2022. All articles extracted from the databases were collected in an Excel file and duplicate articles were removed. The articles were screened twice based on the title and the abstract and checked again with the inclusion and exclusion criteria. The main inclusion criterion was machine learning algorithms for predicting cervical cancer survival. The information extracted from the articles included authors, publication year, dataset details, survival type, evaluation criteria, machine learning models, and the algorithm execution method. RESULTS A total of 13 articles were included in this study, most of which were published from 2018 onwards. The most common machine learning models were random forest (6 articles, 46%), logistic regression (4 articles, 30%), support vector machines (3 articles, 23%), ensemble and hybrid learning (3 articles, 23%), and Deep Learning (3 articles, 23%). The number of sample datasets in the study varied between 85 and 14946 patients, and the models were internally validated except for two articles. The area under the curve (AUC) range for overall survival (0.40 to 0.99), disease-free survival (0.56 to 0.88), and progression-free survival (0.67 to 0.81), respectively from (lowest to highest) received. Finally, 15 variables with an effective role in predicting cervical cancer survival were identified. CONCLUSION Combining heterogeneous multidimensional data with machine learning techniques can play a very influential role in predicting cervical cancer survival. Despite the benefits of machine learning, the problem of interpretability, explainability, and imbalanced datasets is still one of the biggest challenges. Providing machine learning algorithms for survival prediction as a standard requires further studies.
Collapse
Affiliation(s)
- Milad Rahimi
- Department of Health Information Technology and Management, Medical Informatics, School of Allied Medical Sciences, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| | - Atieh Akbari
- Obstetrics and Gynecology, Cancer Research Center, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| | - Farkhondeh Asadi
- Department of Health Information Technology and Management, Health Information Management, School of Allied Medical Sciences, Shahid Beheshti University of Medical Sciences, Tehran, Iran.
| | - Hassan Emami
- Department of Health Information Technology and Management, Information Technology, School of Allied Medical Sciences, Shahid Beheshti University of Medical Sciences, Tehran, Iran.
| |
Collapse
|
5
|
Scarce Data in Intelligent Technical Systems: Causes, Characteristics, and Implications. SCI 2022. [DOI: 10.3390/sci4040049] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open
Abstract
Technical systems generate an increasing amount of data as integrated sensors become more available. Even so, data are still often scarce because of technical limitations of sensors, an expensive labelling process, or rare concepts, such as machine faults, which are hard to capture. Data scarcity leads to incomplete information about a concept of interest. This contribution details causes and effects of scarce data in technical systems. To this end, a typology is introduced which defines different types of incompleteness. Based on this, machine learning and information fusion methods are presented and discussed that are specifically designed to deal with scarce data. The paper closes with a motivation and a call for further research efforts into a combination of machine learning and information fusion.
Collapse
|
6
|
Horenko I, Pospíšil L, Vecchi E, Albrecht S, Gerber A, Rehbock B, Stroh A, Gerber S. Low-Cost Probabilistic 3D Denoising with Applications for Ultra-Low-Radiation Computed Tomography. J Imaging 2022; 8:jimaging8060156. [PMID: 35735955 PMCID: PMC9224620 DOI: 10.3390/jimaging8060156] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2022] [Revised: 05/18/2022] [Accepted: 05/19/2022] [Indexed: 12/04/2022] Open
Abstract
We propose a pipeline for synthetic generation of personalized Computer Tomography (CT) images, with a radiation exposure evaluation and a lifetime attributable risk (LAR) assessment. We perform a patient-specific performance evaluation for a broad range of denoising algorithms (including the most popular deep learning denoising approaches, wavelets-based methods, methods based on Mumford−Shah denoising, etc.), focusing both on accessing the capability to reduce the patient-specific CT-induced LAR and on computational cost scalability. We introduce a parallel Probabilistic Mumford−Shah denoising model (PMS) and show that it markedly-outperforms the compared common denoising methods in denoising quality and cost scaling. In particular, we show that it allows an approximately 22-fold robust patient-specific LAR reduction for infants and a 10-fold LAR reduction for adults. Using a normal laptop, the proposed algorithm for PMS allows cheap and robust (with a multiscale structural similarity index >90%) denoising of very large 2D videos and 3D images (with over 107 voxels) that are subject to ultra-strong noise (Gaussian and non-Gaussian) for signal-to-noise ratios far below 1.0. The code is provided for open access.
Collapse
Affiliation(s)
- Illia Horenko
- Faculty of Mathematics, Technical University of Kaiserslautern, 67663 Kaiserslautern, Germany
- Correspondence: (I.H.); (S.G.)
| | - Lukáš Pospíšil
- Department of Mathematics, VSB Ostrava, Ludvika Podeste 1875/17, 708 33 Ostrava, Czech Republic;
| | - Edoardo Vecchi
- Institute of Computing, Faculty of Informatics, Universitá della Svizzera Italiana (USI), 6962 Viganello, Switzerland;
| | - Steffen Albrecht
- Institute of Physiology, University Medical Center of the Johannes Gutenberg—University Mainz, 55128 Mainz, Germany;
| | - Alexander Gerber
- Institute of Occupational Medicine, Faculty of Medicine, GU Frankfurt, 60590 Frankfurt am Main, Germany;
| | - Beate Rehbock
- Lung Radiology Center Berlin, 10627 Berlin, Germany;
| | - Albrecht Stroh
- Institute of Pathophysiology, University Medical Center of the Johannes Gutenberg—University Mainz, 55128 Mainz, Germany;
| | - Susanne Gerber
- Institute for Human Genetics, University Medical Center of the Johannes Gutenberg—University Mainz, 55128 Mainz, Germany
- Correspondence: (I.H.); (S.G.)
| |
Collapse
|
7
|
Vecchi E, Pospíšil L, Albrecht S, O'Kane TJ, Horenko I. eSPA+: Scalable Entropy-Optimal Machine Learning Classification for Small Data Problems. Neural Comput 2022; 34:1220-1255. [PMID: 35344997 DOI: 10.1162/neco_a_01490] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2021] [Accepted: 12/20/2021] [Indexed: 11/04/2022]
Abstract
Classification problems in the small data regime (with small data statistic T and relatively large feature space dimension D) impose challenges for the common machine learning (ML) and deep learning (DL) tools. The standard learning methods from these areas tend to show a lack of robustness when applied to data sets with significantly fewer data points than dimensions and quickly reach the overfitting bound, thus leading to poor performance beyond the training set. To tackle this issue, we propose eSPA+, a significant extension of the recently formulated entropy-optimal scalable probabilistic approximation algorithm (eSPA). Specifically, we propose to change the order of the optimization steps and replace the most computationally expensive subproblem of eSPA with its closed-form solution. We prove that with these two enhancements, eSPA+ moves from the polynomial to the linear class of complexity scaling algorithms. On several small data learning benchmarks, we show that the eSPA+ algorithm achieves a many-fold speed-up with respect to eSPA and even better performance results when compared to a wide array of ML and DL tools. In particular, we benchmark eSPA+ against the standard eSPA and the main classes of common learning algorithms in the small data regime: various forms of support vector machines, random forests, and long short-term memory algorithms. In all the considered applications, the common learning methods and eSPA are markedly outperformed by eSPA+, which achieves significantly higher prediction accuracy with an orders-of-magnitude lower computational cost.
Collapse
Affiliation(s)
- Edoardo Vecchi
- Universitá della Svizzera Italiana, Faculty of Informatics, TI-6900 Lugano, Switzerland
| | - Lukáš Pospíšil
- VSB Ostrava, Department of Mathematics, Ludvika Podeste 1875/17 708 33 Ostrava, Czech Republic
| | - Steffen Albrecht
- University Medical Center of the Johannes Gutenberg-Universität, Institute of Physiology, 55128 Mainz, Germany
| | | | - Illia Horenko
- Universitá della Svizzera Italiana, Faculty of Informatics, TI-6900 Lugano, Switzerland
| |
Collapse
|
8
|
Cheap robust learning of data anomalies with analytically solvable entropic outlier sparsification. Proc Natl Acad Sci U S A 2022; 119:2119659119. [PMID: 35197293 PMCID: PMC8917346 DOI: 10.1073/pnas.2119659119] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 01/30/2022] [Indexed: 11/21/2022] Open
Abstract
Entropic outlier sparsification (EOS) is proposed as a cheap and robust computational strategy for learning in the presence of data anomalies and outliers. EOS dwells on the derived analytic solution of the (weighted) expected loss minimization problem subject to Shannon entropy regularization. An identified closed-form solution is proven to impose additional costs that depend linearly on statistics size and are independent of data dimension. Obtained analytic results also explain why the mixtures of spherically symmetric Gaussians—used heuristically in many popular data analysis algorithms—represent an optimal and least-biased choice for the nonparametric probability distributions when working with squared Euclidean distances. The performance of EOS is compared to a range of commonly used tools on synthetic problems and on partially mislabeled supervised classification problems from biomedicine. Applying EOS for coinference of data anomalies during learning is shown to allow reaching an accuracy of 97%±2% when predicting patient mortality after heart failure, statistically significantly outperforming predictive performance of common learning tools for the same data.
Collapse
|
9
|
Isbister JB, Reyes-Puerta V, Sun JJ, Horenko I, Luhmann HJ. Clustering and control for adaptation uncovers time-warped spike time patterns in cortical networks in vivo. Sci Rep 2021; 11:15066. [PMID: 34326363 PMCID: PMC8322153 DOI: 10.1038/s41598-021-94002-0] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2020] [Accepted: 06/29/2021] [Indexed: 12/04/2022] Open
Abstract
How information in the nervous system is encoded by patterns of action potentials (i.e. spikes) remains an open question. Multi-neuron patterns of single spikes are a prime candidate for spike time encoding but their temporal variability requires further characterisation. Here we show how known sources of spike count variability affect stimulus-evoked spike time patterns between neurons separated over multiple layers and columns of adult rat somatosensory cortex in vivo. On subsets of trials (clusters) and after controlling for stimulus-response adaptation, spike time differences between pairs of neurons are “time-warped” (compressed/stretched) by trial-to-trial changes in shared excitability, explaining why fixed spike time patterns and noise correlations are seldom reported. We show that predicted cortical state is correlated between groups of 4 neurons, introducing the possibility of spike time pattern modulation by population-wide trial-to-trial changes in excitability (i.e. cortical state). Under the assumption of state-dependent coding, we propose an improved potential encoding capacity.
Collapse
Affiliation(s)
- James B Isbister
- Oxford Centre for Theoretical Neuroscience and Artificial Intelligence, Department of Experimental Psychology, University of Oxford, Oxford, UK. .,The Blue Brain Project, École Polytechnique Fédérale de Lausanne, 1202, Geneva, Switzerland.
| | - Vicente Reyes-Puerta
- Institute of Physiology, University Medical Center, Johannes Gutenberg University, Mainz, Germany
| | - Jyh-Jang Sun
- Institute of Physiology, University Medical Center, Johannes Gutenberg University, Mainz, Germany.,NERF, Kapeldreef 75, 3001, Leuven, Belgium.,imec, Remisebosweg 1, 3001, Leuven, Belgium
| | - Illia Horenko
- Faculty of Informatics, Universita della Svizzera Italiana, Via G. Buffi 13, 6900, Lugano, Switzerland
| | - Heiko J Luhmann
- Institute of Physiology, University Medical Center, Johannes Gutenberg University, Mainz, Germany
| |
Collapse
|
10
|
Pfenninger M, Reuss F, Kiebler A, Schönnenbeck P, Caliendo C, Gerber S, Cocchiararo B, Reuter S, Blüthgen N, Mody K, Mishra B, Bálint M, Thines M, Feldmeyer B. Genomic basis for drought resistance in European beech forests threatened by climate change. eLife 2021; 10:e65532. [PMID: 34132196 PMCID: PMC8266386 DOI: 10.7554/elife.65532] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2020] [Accepted: 06/07/2021] [Indexed: 12/30/2022] Open
Abstract
In the course of global climate change, Central Europe is experiencing more frequent and prolonged periods of drought. The drought years 2018 and 2019 affected European beeches (Fagus sylvatica L.) differently: even in the same stand, drought-damaged trees neighboured healthy trees, suggesting that the genotype rather than the environment was responsible for this conspicuous pattern. We used this natural experiment to study the genomic basis of drought resistance with Pool-GWAS. Contrasting the extreme phenotypes identified 106 significantly associated single-nucleotide polymorphisms (SNPs) throughout the genome. Most annotated genes with associated SNPs (>70%) were previously implicated in the drought reaction of plants. Non-synonymous substitutions led either to a functional amino acid exchange or premature termination. An SNP assay with 70 loci allowed predicting drought phenotype in 98.6% of a validation sample of 92 trees. Drought resistance in European beech is a moderately polygenic trait that should respond well to natural selection, selective management, and breeding.
Collapse
Affiliation(s)
- Markus Pfenninger
- Molecular Ecology, Senckenberg Biodiversity and Climate Research CentreFrankfurt am MainGermany
- Institute for Organismic and Molecular Evolution, Johannes Gutenberg UniversityMainzGermany
- LOEWE Centre for Translational Biodiversity GenomicsFrankfurt am MainGermany
| | - Friederike Reuss
- Molecular Ecology, Senckenberg Biodiversity and Climate Research CentreFrankfurt am MainGermany
| | - Angelika Kiebler
- Molecular Ecology, Senckenberg Biodiversity and Climate Research CentreFrankfurt am MainGermany
| | - Philipp Schönnenbeck
- Molecular Ecology, Senckenberg Biodiversity and Climate Research CentreFrankfurt am MainGermany
- Institute of Human Genetics, University Medical Center, Johannes Gutenberg UniversityMainzGermany
| | - Cosima Caliendo
- Molecular Ecology, Senckenberg Biodiversity and Climate Research CentreFrankfurt am MainGermany
- Institute of Human Genetics, University Medical Center, Johannes Gutenberg UniversityMainzGermany
| | - Susanne Gerber
- Institute of Human Genetics, University Medical Center, Johannes Gutenberg UniversityMainzGermany
| | - Berardino Cocchiararo
- LOEWE Centre for Translational Biodiversity GenomicsFrankfurt am MainGermany
- Conservation Genetics Section, Senckenberg Research Institute and Natural History Museum FrankfurtGelnhausenGermany
| | - Sabrina Reuter
- Ecological Networks lab, Department of Biology, Technische Universität DarmstadtDarmstadtGermany
| | - Nico Blüthgen
- Ecological Networks lab, Department of Biology, Technische Universität DarmstadtDarmstadtGermany
| | - Karsten Mody
- Ecological Networks lab, Department of Biology, Technische Universität DarmstadtDarmstadtGermany
- Department of Applied Ecology, Hochschule Geisenheim UniversityGeisenheimGermany
| | - Bagdevi Mishra
- Biological Archives, Senckenberg Biodiversity and Climate Research CentreFrankfurt am MainGermany
| | - Miklós Bálint
- LOEWE Centre for Translational Biodiversity GenomicsFrankfurt am MainGermany
- Functional Environmental Genomics, Senckenberg Biodiversity and Climate Research CentreFrankfurt am MainGermany
- Agricultural Sciences, Nutritional Sciences, and Environmental Management, Universität GiessenGiessenGermany
| | - Marco Thines
- LOEWE Centre for Translational Biodiversity GenomicsFrankfurt am MainGermany
- Biological Archives, Senckenberg Biodiversity and Climate Research CentreFrankfurt am MainGermany
- Institute for Ecology, Evolution and Diversity, Johann Wolfgang Goethe-UniversityFrankfurt am MainGermany
| | - Barbara Feldmeyer
- Molecular Ecology, Senckenberg Biodiversity and Climate Research CentreFrankfurt am MainGermany
| |
Collapse
|
11
|
Rodrigues DR, Everschor-Sitte K, Gerber S, Horenko I. A deeper look into natural sciences with physics-based and data-driven measures. iScience 2021; 24:102171. [PMID: 33665584 PMCID: PMC7907479 DOI: 10.1016/j.isci.2021.102171] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022] Open
Abstract
With the development of machine learning in recent years, it is possible to glean much more information from an experimental data set to study matter. In this perspective, we discuss some state-of-the-art data-driven tools to analyze latent effects in data and explain their applicability in natural science, focusing on two recently introduced, physics-motivated computationally cheap tools-latent entropy and latent dimension. We exemplify their capabilities by applying them on several examples in the natural sciences and show that they reveal so far unobserved features such as, for example, a gradient in a magnetic measurement and a latent network of glymphatic channels from the mouse brain microscopy data. What sets these techniques apart is the relaxation of restrictive assumptions typical of many machine learning models and instead incorporating aspects that best fit the dynamical systems at hand.
Collapse
Affiliation(s)
- Davi Röhe Rodrigues
- Institute of Physics, Johannes Gutenberg University of Mainz, 55128 Mainz, Germany
| | | | - Susanne Gerber
- Institute of Human Genetics, University Medical Center of the Johannes Gutenberg University Mainz, 55131 Mainz, Germany
| | - Illia Horenko
- Università della Svizzera Italiana, Faculty of Informatics, Via G. Buffi 13, 6900 Lugano, Switzerland
| |
Collapse
|