101
|
Zhou Y, Leung SW, Mizutani S, Takagi T, Tian YS. MEPHAS: an interactive graphical user interface for medical and pharmaceutical statistical analysis with R and Shiny. BMC Bioinformatics 2020; 21:183. [PMID: 32393166 PMCID: PMC7216538 DOI: 10.1186/s12859-020-3494-x] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2019] [Accepted: 04/15/2020] [Indexed: 11/20/2022] Open
Abstract
Background Even though R is one of the most commonly used statistical computing environments, it lacks a graphical user interface (GUI) that appeals to students, researchers, lecturers, and practitioners in medicine and pharmacy for conducting standard data analytics. Current GUIs built on top of R, such as EZR and R-Commander, aim to facilitate R coding and visualization, but most of the functionalities are still accessed through a command-line interface (CLI). To assist practitioners of medicine and pharmacy and researchers to run most routines in fundamental statistical analysis, we developed an interactive GUI; i.e., MEPHAS, to support various web-based systems that are accessible from laptops, workstations, or tablets, under Windows, macOS (and IOS), or Linux. In addition to fundamental statistical analysis, advanced statistics such as the extended Cox regression and dimensional analyses including partial least squares regression (PLS-R) and sparse partial least squares regression (SPLS-R), are also available in MEPHAS. Results MEPHAS is a web-based GUI (https://alain003.phs.osaka-u.ac.jp/mephas/) that is based on a shiny framework. We also created the corresponding R package mephas (https://mephas.github.io/). Thus far, MEPHAS has supported four categories of statistics, including probability, hypothesis testing, regression models, and dimensional analyses. Instructions and help menus were accessible during the entire analytical process via the web-based GUI, particularly advanced dimensional data analysis that required much explanation. The GUI was designed to be intuitive for non-technical users to perform various statistical functions, e.g., managing data, customizing plots, setting parameters, and monitoring real-time results, without any R coding from users. All generated graphs can be saved to local machines, and tables can be downloaded as CSV files. Conclusion MEPHAS is a free and open-source web-interactive GUI that was designed to support statistical data analyses and prediction for medical and pharmaceutical practitioners and researchers. It enables various medical and pharmaceutical statistical analyses through interactive parameter settings and dynamic visualization of the results.
Collapse
|
102
|
Merino A, Garcia-Alvarez D, Sainz-Palmero GI, Acebes LF, Fuente MJ. Knowledge based recursive non-linear partial least squares (RNPLS). ISA TRANSACTIONS 2020; 100:481-494. [PMID: 31952793 DOI: 10.1016/j.isatra.2020.01.006] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/08/2019] [Revised: 11/21/2019] [Accepted: 01/03/2020] [Indexed: 06/10/2023]
Abstract
Soft sensors driven by data are very common in industrial plants to perform indirect measurements of difficult to measure critical variables by using other variables that are relatively easier to obtain. The use of soft sensors implies some challenges, such as the colinearity of the predictor variables, the time-varying and possible non-linear nature of the industrial process. To deal with the first challenge, the partial least square (PLS) regression has been employed in many applications to model the linear relations between process variables, with noisy and highly correlated data. However, the PLS model needs to deal with the other two issues: the non-linear and time-varying characteristics of the processes. In this work, a new knowledge-based methodology for a recursive non-linear PLS algorithm (RNPLS) is systematized to deal with these issues. Here, the non-linear PLS algorithm is set up by carrying out the PLS regression over the augmented input matrix, which includes knowledge based non-linear transformations of some of the variables. This transformation depends on the system's nature, and takes into account the available knowledge about the process, which is provided by expert knowledge or emulated using software tools. Then, the recursive exponential weighted PLS is used to modify and adapt the model according to the process changes. This RNPLS algorithm has been tested using two case studies according to the available knowledge, a real industrial evaporation station of the sugar industry, where the expert knowledge about the process permits the formulation of the relationships, and a simulated wastewater treatment plant, where the necessary knowledge about the process is obtained by a software tool. The results show that the methodology involving knowledge regarding the process is able to adjust the process changes, providing highly accurate predictions.
Collapse
|
103
|
Faul L, St Jacques PL, DeRosa JT, Parikh N, De Brigard F. Differential contribution of anterior and posterior midline regions during mental simulation of counterfactual and perspective shifts in autobiographical memories. Neuroimage 2020; 215:116843. [PMID: 32289455 DOI: 10.1016/j.neuroimage.2020.116843] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2019] [Revised: 04/04/2020] [Accepted: 04/06/2020] [Indexed: 12/16/2022] Open
Abstract
Retrieving autobiographical memories induces a natural tendency to mentally simulate alternate versions of past events, either by reconstructing the perceptual details of the originally experienced perspective or the conceptual information of what actually occurred. Here we examined whether the episodic system recruited during imaginative experiences functionally dissociates depending on the nature of this reconstruction. Using fMRI, we evaluated differential patterns of neural activity and hippocampal connectivity when twenty-nine participants naturally recalled past negative events, shifted visual perspective, or imagined better or worse outcomes than what actually occurred. We found that counterfactual thoughts were distinguished by neural recruitment in dorsomedial prefrontal cortex, whereas shifts in visual perspective were uniquely supported by the precuneus. Additionally, connectivity with the anterior hippocampus changed depending upon the mental simulation that was performed - with enhanced hippocampal connectivity with medial prefrontal cortex for counterfactual simulations and precuneus for shifted visual perspectives. Together, our findings provide a novel assessment of differences between these common methods of mental simulation and a more detailed account for the neural network underlying episodic retrieval and reconstruction.
Collapse
|
104
|
Liang X, Gong Q, Zheng H, Xu J. Examining the impact factors of the water environment using the extended STIRPAT model: A Case Study in Sichuan. ENVIRONMENTAL SCIENCE AND POLLUTION RESEARCH INTERNATIONAL 2020; 27:12942-12952. [PMID: 31916174 DOI: 10.1007/s11356-019-06745-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/20/2019] [Accepted: 10/10/2019] [Indexed: 06/10/2023]
Abstract
China's rapid social and economic development has led to a significant deterioration in the water environment, which has limited sustainable regional development. Therefore, understanding the specific factors that affect the water environment is vital for future water conservation efforts. From a social economy perspective, this paper used population, the economy, urbanization, technological level, water consumption, and other factors to expand the STIRPAT model, after which partial least squares was applied to solve the model parameters and comprehensively analyze the impact of regional development on the water environment in Sichuan Province from 2007 to 2017. It was found that the main factors affecting the water environment were resident population, urbanization, service industry development, and industrialization, with the industrialization factor being found to have a reverse waste-sewage water discharge inhibition. In addition, it was found that during the study period, there was no environmental Kuznets curve between water resource environmental pollution and economic growth in Sichuan Province. Finally, some policy recommendations for improving the water environment were given based on the results.
Collapse
|
105
|
Xia Z, Yi T, Liu Y. Rapid and nondestructive determination of sesamin and sesamolin in Chinese sesames by near-infrared spectroscopy coupling with chemometric method. SPECTROCHIMICA ACTA. PART A, MOLECULAR AND BIOMOLECULAR SPECTROSCOPY 2020; 228:117777. [PMID: 31727518 DOI: 10.1016/j.saa.2019.117777] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/14/2019] [Revised: 11/06/2019] [Accepted: 11/06/2019] [Indexed: 06/10/2023]
Abstract
Sesame was one of the most important crops in Africa and east Asia. The sesamin and sesamolin in sesames have shown various pharmacological, biological and physiologic activities. In this study, a rapid and nondestructive method for determination of sesamin and sesamolin in Chinese sesames by near-infrared spectroscopy coupled with chemometric method was proposed. The near infrared spectra of sesame samples from three different Chinese areas were collected and the partial least squares (PLS) was used to construct the quantitative models. The spectral preprocessing and variable selection methods were adopted to improve the predictability and stability of the model. Reasonable quantitative results can be obtained when the samples used for model construction and prediction were harvested in same years. For sesamin and sesamolin, the correlation coefficient (R) and root mean square error prediction (RMSEP) were 0.9754, 0.9636 and 151.2951, 39.7720, respectively. The optimized models seem less effective when they were used to predict the samples harvested in other years or countries. However, acceptable results can still be obtained.
Collapse
|
106
|
Silalahi DD, Midi H, Arasan J, Mustafa MS, Caliman JP. Kernel partial diagnostic robust potential to handle high-dimensional and irregular data space on near infrared spectral data. Heliyon 2020; 6:e03176. [PMID: 32042959 PMCID: PMC7002778 DOI: 10.1016/j.heliyon.2020.e03176] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2018] [Revised: 05/28/2019] [Accepted: 01/02/2020] [Indexed: 11/24/2022] Open
Abstract
In practice, the collected spectra are very often composes of complex overtone and many overlapping peaks which may lead to misinterpretation because of its significant nonlinear characteristics. Using linear solution might not be appropriate. In addition, with a high-dimension of dataset due to large number of observations and data points the classical multiple regressions will neglect to fit. These complexities commonly will impact to multicollinearity problem, furthermore the risk of contamination of multiple outliers and high leverage points also increases. To address these problems, a new method called Kernel Partial Diagnostic Robust Potential (KPDRGP) is introduced. The method allows the nonlinear solution which maps nonlinearly the original input X matrix into higher dimensional feature mapping with corresponds to the Reproducing Kernel Hilbert Spaces (RKHS). In dimensional reduction, the method replaces the dot products calculation of elements in the mapped data to a nonlinear function in the original input space. To prevent the contamination of the multiple outlier and high leverage points the robust procedure using Diagnostic Robust Generalized Potentials (DRGP) algorithm was used. The results verified that using the simulation and real data, the proposed KPDRGP method was superior to the methods in the class of non-kernel and some other robust methods with kernel solution.
Collapse
|
107
|
Bolton TAW, Freitas LGA, Jochaut D, Giraud AL, Van De Ville D. Neural responses in autism during movie watching: Inter-individual response variability co-varies with symptomatology. Neuroimage 2020; 216:116571. [PMID: 31987996 DOI: 10.1016/j.neuroimage.2020.116571] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2019] [Revised: 12/13/2019] [Accepted: 01/17/2020] [Indexed: 12/22/2022] Open
Abstract
Naturalistic movie paradigms are exquisitely dynamic by nature, yet dedicated analytical methods typically remain static. Here, we deployed a dynamic inter-subject functional correlation (ISFC) analysis to study movie-driven functional brain changes in a population of male young adults diagnosed with autism spectrum disorder (ASD). We took inspiration from the resting-state research field in generating a set of whole-brain ISFC states expressed by the analysed ASD and typically developing (TD) subjects along time. Change points of state expression often involved transitions between different scenes of the movie, resulting in the reorganisation of whole-brain ISFC patterns to recruit different functional networks. Both subject populations showed idiosyncratic state expression at dedicated time points, but only TD subjects were also characterised by episodes of homogeneous recruitment. The temporal fluctuations in both quantities, as well as in cross-population dissimilarity, were tied to contextual movie cues. The prominent idiosyncrasy seen in ASD subjects was linked to individual symptomatology by partial least squares analysis, as different temporal sequences of ISFC states were expressed by subjects suffering from social and verbal communication impairments, as opposed to nonverbal communication deficits and stereotypic behaviours. Furthermore, the temporal expression of several of these states was correlated with the movie context, the presence of faces on screen, or overall luminosity. Overall, our results support the use of dynamic analytical frameworks to fully exploit the information obtained by naturalistic stimulation paradigms. They also show that autism should be understood as a multi-faceted disorder, in which the functional brain alterations seen in a given subject will vary as a function of the extent and balance of expressed symptoms.
Collapse
|
108
|
Mendez KM, Broadhurst DI, Reinke SN. Migrating from partial least squares discriminant analysis to artificial neural networks: a comparison of functionally equivalent visualisation and feature contribution tools using jupyter notebooks. Metabolomics 2020; 16:17. [PMID: 31965332 PMCID: PMC6974504 DOI: 10.1007/s11306-020-1640-0] [Citation(s) in RCA: 23] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/30/2019] [Accepted: 01/13/2020] [Indexed: 01/25/2023]
Abstract
INTRODUCTION Metabolomics data is commonly modelled multivariately using partial least squares discriminant analysis (PLS-DA). Its success is primarily due to ease of interpretation, through projection to latent structures, and transparent assessment of feature importance using regression coefficients and Variable Importance in Projection scores. In recent years several non-linear machine learning (ML) methods have grown in popularity but with limited uptake essentially due to convoluted optimisation and interpretation. Artificial neural networks (ANNs) are a non-linear projection-based ML method that share a structural equivalence with PLS, and as such should be amenable to equivalent optimisation and interpretation methods. OBJECTIVES We hypothesise that standardised optimisation, visualisation, evaluation and statistical inference techniques commonly used by metabolomics researchers for PLS-DA can be migrated to a non-linear, single hidden layer, ANN. METHODS We compared a standardised optimisation, visualisation, evaluation and statistical inference techniques workflow for PLS with the proposed ANN workflow. Both workflows were implemented in the Python programming language. All code and results have been made publicly available as Jupyter notebooks on GitHub. RESULTS The migration of the PLS workflow to a non-linear, single hidden layer, ANN was successful. There was a similarity in significant metabolites determined using PLS model coefficients and ANN Connection Weight Approach. CONCLUSION We have shown that it is possible to migrate the standardised PLS-DA workflow to simple non-linear ANNs. This result opens the door for more widespread use and to the investigation of transparent interpretation of more complex ANN architectures.
Collapse
|
109
|
Mäkelä M, Geladi P, Rissanen M, Rautkari L, Dahl O. Hyperspectral near infrared image calibration and regression. Anal Chim Acta 2020; 1105:56-63. [PMID: 32138926 DOI: 10.1016/j.aca.2020.01.019] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2019] [Revised: 12/02/2019] [Accepted: 01/08/2020] [Indexed: 10/25/2022]
Abstract
Reference materials are used in diffuse reflectance imaging for transforming the digitized camera signal into reflectance and absorbance units for subsequent interpretation. Traditional white and dark reference signals are generally used for calculating reflectance or absorbance, but these can be supplemented with additional reflectance targets to improve the accuracy of reflectance transformations. In this work we provide an overview of hyperspectral image regression and assess the effects of reflectance calibration on image interpretation using partial least squares regression. Linear and quadratic reflectance transformations based on additional reflectance targets decrease average measurement errors and make it easier to estimate model pseudorank during image regression. The lowest measurement and prediction errors were obtained with the column and wavelength specific quadratic transformations which retained the spatial information provided by the line-scanning instrument and reduced errors in the predicted concentration maps.
Collapse
|
110
|
Csala A, Zwinderman AH, Hof MH. Multiset sparse partial least squares path modeling for high dimensional omics data analysis. BMC Bioinformatics 2020; 21:9. [PMID: 31918677 PMCID: PMC6953292 DOI: 10.1186/s12859-019-3286-3] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2019] [Accepted: 11/20/2019] [Indexed: 12/29/2022] Open
Abstract
BACKGROUND Recent technological developments have enabled the measurement of a plethora of biomolecular data from various omics domains, and research is ongoing on statistical methods to leverage these omics data to better model and understand biological pathways and genetic architectures of complex phenotypes. Current reviews report that the simultaneous analysis of multiple (i.e. three or more) high dimensional omics data sources is still challenging and suitable statistical methods are unavailable. Often mentioned challenges are the lack of accounting for the hierarchical structure between omics domains and the difficulty of interpretation of genomewide results. This study is motivated to address these challenges. We propose multiset sparse Partial Least Squares path modeling (msPLS), a generalized penalized form of Partial Least Squares path modeling, for the simultaneous modeling of biological pathways across multiple omics domains. msPLS simultaneously models the effect of multiple molecular markers, from multiple omics domains, on the variation of multiple phenotypic variables, while accounting for the relationships between data sources, and provides sparse results. The sparsity in the model helps to provide interpretable results from analyses of hundreds of thousands of biomolecular variables. RESULTS With simulation studies, we quantified the ability of msPLS to discover associated variables among high dimensional data sources. Furthermore, we analysed high dimensional omics datasets to explore biological pathways associated with Marfan syndrome and with Chronic Lymphocytic Leukaemia. Additionally, we compared the results of msPLS to the results of Multi-Omics Factor Analysis (MOFA), which is an alternative method to analyse this type of data. CONCLUSIONS msPLS is an multiset multivariate method for the integrative analysis of multiple high dimensional omics data sources. It accounts for the relationship between multiple high dimensional data sources while it provides interpretable results through its sparse solutions. The biomarkers found by msPLS in the omics datasets can be interpreted in terms of biological pathways associated with the pathophysiology of Marfan syndrome and of Chronic Lymphocytic Leukaemia. Additionally, msPLS outperforms MOFA in terms of variation explained in the chronic lymphocytic leukaemia dataset while it identifies the two most important clinical markers for Chronic Lymphocytic Leukaemia AVAILABILITY: http://uva.csala.me/mspls.https://github.com/acsala/2018_msPLS.
Collapse
|
111
|
Villa JEL, Afonso MAS, Dos Santos DP, Mercadal PA, Coronado EA, Poppi RJ. Colloidal gold clusters formation and chemometrics for direct SERS determination of bioanalytes in complex media. SPECTROCHIMICA ACTA. PART A, MOLECULAR AND BIOMOLECULAR SPECTROSCOPY 2020; 224:117380. [PMID: 31344581 DOI: 10.1016/j.saa.2019.117380] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/26/2019] [Revised: 06/22/2019] [Accepted: 07/08/2019] [Indexed: 05/27/2023]
Abstract
In this work, we report the sensitive and selective sensing of the purine bases adenine and guanine in urine matrix by using surface-enhanced Raman spectroscopy (SERS) and a colloidal SERS substrate. To identify suitable conditions for quantitative analysis, the pH dependence of spectra of adenine, guanine, urine simulant and their mixtures was studied on gold nanoparticles suspension. Interestingly, although the urine matrix promotes the analytes signal suppression and overlapping bands, it can also cause an improvement in repeatability of the SERS measurements. This effect was associated to the relatively controlled formation of small-sized gold clusters and it was investigated both experimentally and theoretically. Furthermore, a correlation constrained multivariate curve resolution-alternating least squares (MCR-ALS) method was developed to resolve overlapping SERS bands and to quantify physiologically relevant (micromolar) concentrations of the bioanalytes. The performance of the proposed MCR-ALS approach (assessed in terms of figures of merit) was similar to that obtained by using partial least squares regression, but with the additional advantage of retrieving valuable spectral information. Therefore, this method can be used for improving selectivity of colloidal clusters in qualitative and quantitative SERS analysis of complex media, avoiding the need for tedious nanoparticle-surface modification or preliminary chromatographic separation.
Collapse
|
112
|
Mendez KM, Reinke SN, Broadhurst DI. A comparative evaluation of the generalised predictive ability of eight machine learning algorithms across ten clinical metabolomics data sets for binary classification. Metabolomics 2019; 15:150. [PMID: 31728648 PMCID: PMC6856029 DOI: 10.1007/s11306-019-1612-4] [Citation(s) in RCA: 89] [Impact Index Per Article: 17.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/22/2019] [Accepted: 11/05/2019] [Indexed: 12/18/2022]
Abstract
INTRODUCTION Metabolomics is increasingly being used in the clinical setting for disease diagnosis, prognosis and risk prediction. Machine learning algorithms are particularly important in the construction of multivariate metabolite prediction. Historically, partial least squares (PLS) regression has been the gold standard for binary classification. Nonlinear machine learning methods such as random forests (RF), kernel support vector machines (SVM) and artificial neural networks (ANN) may be more suited to modelling possible nonlinear metabolite covariance, and thus provide better predictive models. OBJECTIVES We hypothesise that for binary classification using metabolomics data, non-linear machine learning methods will provide superior generalised predictive ability when compared to linear alternatives, in particular when compared with the current gold standard PLS discriminant analysis. METHODS We compared the general predictive performance of eight archetypal machine learning algorithms across ten publicly available clinical metabolomics data sets. The algorithms were implemented in the Python programming language. All code and results have been made publicly available as Jupyter notebooks. RESULTS There was only marginal improvement in predictive ability for SVM and ANN over PLS across all data sets. RF performance was comparatively poor. The use of out-of-bag bootstrap confidence intervals provided a measure of uncertainty of model prediction such that the quality of metabolomics data was observed to be a bigger influence on generalised performance than model choice. CONCLUSION The size of the data set, and choice of performance metric, had a greater influence on generalised predictive performance than the choice of machine learning algorithm.
Collapse
|
113
|
Grümpel A, Krieter J, Dippel S. Reducing estimated tail biting risk in German weaner pigs using a management tool. Vet J 2019; 254:105406. [PMID: 31836167 DOI: 10.1016/j.tvjl.2019.105406] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2019] [Revised: 10/28/2019] [Accepted: 11/11/2019] [Indexed: 11/30/2022]
Abstract
The tail biting management tool 'SchwIP' was developed to analyse estimated farm individual risk for tail biting and to support farmers to reduce risk. The risk factors included in SchwIP had been weighted by 61 experts regarding their strength of influence on tail biting. SchwIP was applied on 21 conventional farms throughout Germany that kept weaner pigs in closed barns. All farms were assessed with the SchwIP questionnaire and received farm-individual feedback and advice on how to reduce tail biting risk. There were no control farms with assessment only, because asking questions could raise awareness thus triggering improvements. Each farm was visited three times at 6 monthly intervals. Risk factor data collected on farms were replaced with the corresponding expert weighting, and weightings were then standardised to a range of 0 - 1 across all farms and visits. All standardised risks were summarised per farm and visit. From this, within-farm differences in farm risk sums between visit 1 and 2 (ΔRS12), 2 and 3 (ΔRS23) and 1 and 3 (ΔRS13), and the association between changes in single risk factors with ΔRS, were calculated. Farm risk sums significantly decreased from visit 1 to visit 2 and 3, respectively, but not from visit 2 to visit 3. Change in farm risk sums between visit 1 and 2 was significantly correlated with 59 factors; ΔRS23 with 54 factors; and ΔRS13 with 57 factors. Eighteen factors were significantly associated with all three ΔRS. The management tool SchwIP contributed to a reduction in estimated risk for tail biting in weaners after the first visit. There was no apparent pattern of changes in risk factors on the farms, which underlines the multifactorial nature of tail biting. Further on-farm research on tail biting risk factors and tail lesions is needed to better understand the complex relationship.
Collapse
|
114
|
Mendez KM, Broadhurst DI, Reinke SN. The application of artificial neural networks in metabolomics: a historical perspective. Metabolomics 2019; 15:142. [PMID: 31628551 DOI: 10.1007/s11306-019-1608-0] [Citation(s) in RCA: 46] [Impact Index Per Article: 9.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/18/2019] [Accepted: 10/11/2019] [Indexed: 02/08/2023]
Abstract
BACKGROUND Metabolomics data, with its complex covariance structure, is typically modelled by projection-based machine learning (ML) methods such as partial least squares (PLS) regression, which project data into a latent structure. Biological data are often non-linear, so it is reasonable to hypothesize that metabolomics data may also have a non-linear latent structure, which in turn would be best modelled using non-linear equations. A non-linear ML method with a similar projection equation structure to PLS is artificial neural networks (ANNs). While ANNs were first applied to metabolic profiling data in the 1990s, the lack of community acceptance combined with limitations in computational capacity and the lack of volume of data for robust non-linear model optimisation inhibited their widespread use. Due to recent advances in computational power, modelling improvements, community acceptance, and the more demanding needs for data science, ANNs have made a recent resurgence in interest across research communities, including a small yet growing usage in metabolomics. As metabolomics experiments become more complex and start to be integrated with other omics data, there is potential for ANNs to become a viable alternative to linear projection methods. AIM OF REVIEW We aim to first describe ANNs and their structural equivalence to linear projection-based methods, including PLS regression. We then review the historical, current, and future uses of ANNs in the field of metabolomics. KEY SCIENTIFIC CONCEPT OF REVIEW Is metabolomics ready for the return of artificial neural networks?
Collapse
|
115
|
Huang L, Zhang X, Zhang Z. UV-vis sensor array combining with chemometric methods for quantitative analysis of binary dipeptide mixture (Gly-Gly/Ala-Gln). SPECTROCHIMICA ACTA. PART A, MOLECULAR AND BIOMOLECULAR SPECTROSCOPY 2019; 221:117205. [PMID: 31158767 DOI: 10.1016/j.saa.2019.117205] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/09/2019] [Revised: 05/09/2019] [Accepted: 05/26/2019] [Indexed: 06/09/2023]
Abstract
Many endogenous peptides are circulating in bodily fluids at micromole level, and accurate analysis of endogenous peptides at such low level is important. In this study, we presented an extensible, facile and sensitive sensor array based on UV-Vis spectroscopy of the AuNPs combined with chemometric methods for quantitative analysis of binary peptide mixture (Gly-Gly/Ala-Gln) using UV-Vis spectroscopy. High concentration arginine (Arg) and Cr3+ can induce aggregation of the AuNPs and DNA-AuNPs. However, the glycylglycine (Gly-Gly) and alanyl-glutamine (Ala-Gln) can prevent the AuNPs from aggregation. We investigated the prevention of AuNPs aggregation by using Gly-Gly and Ala-Gln mixtures and constructed sensor arrays for quantitative analyses of Gly-Gly and Ala-Gln mixtures. The color change of the solution is relevant to the dose of the target, and it can be visualized by the naked eyes or monitored by UV-Vis spectrometry. Results showed that the concentrations of Arg and Cr3+ are the key factors affecting the sensitivity of the sensor array. Whereas when Gly-Gly and Ala-Gln have to be analyzed simultaneously, concentrations of Arg and Cr3+ both for Gly-Gly and Ala-Gln are difficult to be optimized. Taking the advantages of multivariate analysis and data fusion, PLS models and backward interval PLS (BiPLS) models were built for fused dataset constructed by UV-Vis data obtained at different concentrations of Arg and Cr3+. The best results were obtained from the PLS models. The proposed method can be extended to analysis of other peptides in more complex mixture systems.
Collapse
|
116
|
Zhang G, Peng S, Cao S, Zhao J, Xie Q, Han Q, Wu Y, Huang Q. A fast progressive spectrum denoising combined with partial least squares algorithm and its application in online Fourier transform infrared quantitative analysis. Anal Chim Acta 2019; 1074:62-68. [PMID: 31159940 DOI: 10.1016/j.aca.2019.04.055] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2019] [Revised: 03/19/2019] [Accepted: 04/24/2019] [Indexed: 11/26/2022]
Abstract
Fourier transform infrared (FTIR) spectroscopy is an important method in analytical chemistry. A material can be qualitatively and quantitatively analyzed from its FTIR spectrum. Spectrum denoising is commonly performed before online FTIR quantitative analysis. The average method requires a long time to collect spectra, which weakens real-time online analysis. The Savitzky-Golay smoothing method makes peaks smoother with the increase of window width, causing useful information to be lost. The sparse representation method is a common denoising method, that is used to reconstruct spectrum. However, for the randomness of noise, we can't achieve the sparse representation of noise. Traditional sparse representation algorithms only perform denoising once, and the noise can not be removed completely. FTIR spectrum denoising should therefore be performed in a progressive way. However, it is difficult to determine to what degree of denoising is required. Here, a fast progressive spectrum denoising combined with partial least squares method was developed for online FTIR quantitative analysis. Two real sample data sets were used to test the performance of the proposed method. The experimental results indicated that the progressive spectrum denoising method combined with the partial least squares method performed markedly better than other methods in terms of root mean squared error of prediction and coefficient of determination in the FTIR quantitative analysis.
Collapse
|
117
|
Yu TK, Chang YJ, Chang IC, Yu TY. A pro-environmental behavior model for investigating the roles of social norm, risk perception, and place attachment on adaptation strategies of climate change. ENVIRONMENTAL SCIENCE AND POLLUTION RESEARCH INTERNATIONAL 2019; 26:25178-25189. [PMID: 31256407 DOI: 10.1007/s11356-019-05806-7] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/31/2018] [Accepted: 06/20/2019] [Indexed: 06/09/2023]
Abstract
Today's climate change is a major problem and challenge for the global environment and human civilization, and it can lead to dramatical floods over specific regions. As climate change intensifies, climate change adaptation strategies, such as flood insurance, energy taxes, and other risky financial strategies, have drawn worldwide attention and discussion. Risk control methods have been widely used to mitigate the impact of climate change on past flood losses, but past risk control strategies on climate change have not focused on the exploration of the relationship between environment, society, and humans. Based on the theoretical model of pro-environmental behavior, this study compares and analyzes four theoretical models and proposes a modified competitiveness model to effectively predict the pro-environmental behavior of college students with partial least squares (PLS) manner. Social norm could play a dominant role of mediator between risk perception, place attachment, and pro-environmental behavior. Although risk perception and local attachment are positively related to risk financial strategy, the promotion of social norms will increase the intention of risk financial strategy. For intention of risk financial strategies within pro-environmental behavior, the efficiency of enhancing local attachment was higher than that of risk perception.
Collapse
|
118
|
Zmnako SSF, Chalabi YI. Cross-cultural adaptation, reliability, and validity of the Vertigo symptom scale-short form in the central Kurdish dialect. Health Qual Life Outcomes 2019; 17:125. [PMID: 31315639 PMCID: PMC6637568 DOI: 10.1186/s12955-019-1168-z] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2018] [Accepted: 05/31/2019] [Indexed: 12/30/2022] Open
Abstract
Background Core vestibular symptoms are vague, hard for patients to describe, and difficult for examiners to quantify. Reliable and validated patient-reported outcome measures (PROMs) have obtained acceptance and popularity in the specialty of vestibular disorders. In Kurdish, there is a critical shortage of such measures. The aim of this survey was to assess the psychometric properties of a central Kurdish version (VSS − SF − CK) of the Vertigo Symptom Scale−Short Form (VSS − SF). Methods The study utilized a regulated process of cross-cultural adaptation to produce the VSS − SF − CK. We examined its psychometric properties by using a cross-sectional survey. Owing to a non-normal distribution, both principal axis factoring and polychoric correlation were used to examine the structure. The internal consistency of the scales was evaluated using Cronbach’s alpha coefficient (α) and composite reliability. The discriminant validity was evaluated using the heterotrait–monotrait ratio of correlations (HTMT.85) and the Fornell-Larcker criterion. To assess convergent validity, the instrument was correlated with two comparators. Results The participants (n = 195) were composed of 165 patients with vestibular symptoms (mean − age 45 ± 15.8, range 61 years; 56.4% women) and 30 healthy participants (mean − age 35 ± 18.6; range 52 years; 60% women). Based on the scree plot, along with other criteria such as Horn’s parallel analysis and minimum average partial, two factors were extracted: vestibular (VSS − V) and autonomic-anxiety (VSS − AA). Both constructs showed a robust structure in terms of adequate loadings and weak cross-loadings. The scales’ αs were 0.81, 0.81, and 0.87 for VSS-V, VSS-AA, and the total scale (VSS − T), respectively. Discriminant validity was established with a value of 0.71 for HTMT (< 0.85). Spearman’s correlation supported the study’s hypotheses and confirmed the convergent validity. Intraclass correlation coefficients revealed high external reliability: test-retest results were 0.93, 0.94, and 0.97 for VSS-V, VSS − AA, and VSS − T, respectively. Conclusion Given a critical shortage in PROMs for the vestibular field, the psychometric properties of VSS − SF − CK were evaluated. The results were promising, as they revealed external consistency and construct validity. The goodness of fit indices showed that the VSS − SF − CK is a reliable and validated PROM that can be used by clinicians and researchers in the Kurdish-speaking population. Electronic supplementary material The online version of this article (10.1186/s12955-019-1168-z) contains supplementary material, which is available to authorized users.
Collapse
|
119
|
Peña-Bautista C, Durand T, Oger C, Baquero M, Vento M, Cháfer-Pericás C. Assessment of lipid peroxidation and artificial neural network models in early Alzheimer Disease diagnosis. Clin Biochem 2019; 72:64-70. [PMID: 31319065 DOI: 10.1016/j.clinbiochem.2019.07.008] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2018] [Revised: 07/11/2019] [Accepted: 07/13/2019] [Indexed: 02/07/2023]
Abstract
OBJECTIVE Lipid peroxidation constitutes a molecular mechanism involved in early Alzheimer Disease (AD) stages, and artificial neural network (ANN) analysis is a promising non-linear regression model, characterized by its high flexibility and utility in clinical diagnosis. ANN simulates neuron learning procedures and it could provide good diagnostic performances in this complex and heterogeneous disease compared with linear regression analysis. DESIGN AND METHODS In our study, a new set of lipid peroxidation compounds were determined in urine and plasma samples from patients diagnosed with early Alzheimer Disease (n = 70) and healthy controls (n = 26) by means of ultra-performance liquid chromatography coupled with tandem mass-spectrometry. Then, a model based on ANN was developed to classify groups of participants. RESULTS The diagnostic performances obtained using an ANN model for each biological matrix were compared with the corresponding linear regression model based on partial least squares (PLS), and with the non-linear (radial and polynomial) support vector machine (SVM) models. Better accuracy, in terms of receiver operating characteristic-area under curve (ROC-AUC), was obtained for the ANN models (ROC-AUC 0.882 in plasma and 0.839 in urine) than for PLS and SVM models. CONCLUSION Lipid peroxidation and ANN constitute a useful approach to establish a reliable diagnosis when the prognosis is complex, multidimensional and non-linear.
Collapse
|
120
|
Zhao X, Yao LI, Chen K, Li KE, Zhang J, Guo X. Changes in the Functional and Structural Default Mode Network across the Adult Lifespan Based on Partial Least Squares. IEEE ACCESS : PRACTICAL INNOVATIONS, OPEN SOLUTIONS 2019; 7:82256-82265. [PMID: 33224696 PMCID: PMC7677917 DOI: 10.1109/access.2019.2923274] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
The default mode network (DMN) has been extensively investigated in the literature. However, previous studies have mainly focused on age-related changes in the DMN between old and young participants. Age-dependent changes in specific regions within the DMN have not been adequately explored across the entire adult lifespan. Thus, in the present study, we performed a seed partial least squares (PLS) analysis to investigate lifespan-wide changes in the regions of the functional and structural DMNs using resting-state functional magnetic resonance imaging (fMRI) and structural magnetic resonance imaging (MRI) data from healthy subjects aged 16-85 years. The posterior cingulate area was selected as the seed region based on prior fMRI studies. The single-group functional connectivity analysis showed a stable connection between the seed and the posterior cingulate cortex (PCC), middle temporal gyrus (MTG) and inferior temporal gyrus (ITG); a decreased connection between the seed and the medial prefrontal cortex (MPFC), anterior cingulate cortex (ACC) and superior frontal gyrus (SFG); and an increased connection between the seed and the precuneus (PreC), inferior parietal lobule (IPL) and middle frontal gyrus (MFG) across the entire lifespan. In contrast, in the single-group structural covariance analysis, the covariance connections of the seed to the DMN regions demonstrated a stable covariance trend to the PCC, MTG, superior temporal gyrus (STG) and ITG; an inverted U-shaped covariance trend to the MPFC, ACC, SFG, MFG and inferior frontal gyrus (IFG); and a U-shaped covariance trend to the PreC with age. Full-group analyses found significant linear decreases in functional and structural DMN integrity. Our findings provide crucial information regarding the influence of age on the function and structure of the DMN and may contribute to the understanding of the underlying mechanism of age-related changes in the DMN over the lifespan.
Collapse
|
121
|
Quantitative non-destructive analysis of paper fillers using ATR-FT-IR spectroscopy with PLS method. Anal Bioanal Chem 2019; 411:5127-5138. [PMID: 31147759 DOI: 10.1007/s00216-019-01888-x] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2019] [Revised: 04/18/2019] [Accepted: 04/30/2019] [Indexed: 10/26/2022]
Abstract
A quantitative non-destructive express method of determining fillers -kaolin and chalk- in paper was created using attenuated total reflectance Fourier transform infrared (ATR-FT-IR) spectroscopy in the mid-IR and far-IR region (3800-245 cm-1) combined with partial least squares (PLS) data analysis. Altogether, 30 two-component (cellulose pulp + kaolin and cellulose pulp + chalk) reference paper samples with known different filler concentrations and one reference paper sample without any fillers were prepared for calibration and validation. The reference values of filler concentrations in the prepared papers were determined by gravimetric analysis via dry ashing (for establishing accurate concentrations of fillers in paper) and ATR-FT-IR microspectroscopy (for evaluating homogeneity of the papers). Two-component (cellulose pulp + kaolin or cellulose pulp + chalk) PLS models were created with papers of different cellulose types and containing different amounts of fillers. The best model had root mean square errors of prediction (RMSEP) for determining the kaolin or chalk content in the two-component papers of 2.0 and 2.1 g/100 g, respectively. The performance indices were 90.4% and 92.9%, respectively. As a demonstration of practical applicability of the method, different papers from books, journals, etc. were analysed. It was concluded that the developed quantitative method is suitable for non-destructive express analysis of kaolin or chalk in paper. Graphical abstract.
Collapse
|
122
|
Li Q, Huang Y, Song X, Zhang J, Min S. Moving window smoothing on the ensemble of competitive adaptive reweighted sampling algorithm. SPECTROCHIMICA ACTA. PART A, MOLECULAR AND BIOMOLECULAR SPECTROSCOPY 2019; 214:129-138. [PMID: 30776713 DOI: 10.1016/j.saa.2019.02.023] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/23/2018] [Revised: 01/20/2019] [Accepted: 02/10/2019] [Indexed: 05/14/2023]
Abstract
A novel chemometrical method, named as MWS-ECARS, which is based on using the moving window smoothing upon an ensemble of competitive adaptive reweighted sampling, is proposed as the spectral variable selection approach for multivariate calibration in this study. In terms of elimination of uninformative variables, an ensemble of CARS is carried out first and MWS is then performed to search for effective variables around the high frequency variables. The variable subset with the lowest standard error of cross-validation (SECV) is treated as the optimal threshold and the corresponding moving window width is regarded as the optimal window width. The method was applied to mid-infrared (MIR) spectra of active ingredient in pesticide, near-infrared (NIR) spectra of soil organic matter and NIR spectra of total nitrogen in Solanaceae plants for variable selection. Overall results show that MWS-ECARS is a promising selection method with an improved prediction performance over three variable selection methods of variable importance projection (VIP), uninformative variables elimination (UVE) and genetic algorithms (GA).
Collapse
|
123
|
Li Q, Huang Y, Song X, Zhang J, Min S. Spectral interval combination optimization (ICO) on rapid quality assessment of Solanaceae plant: a validation study. JOURNAL OF FOOD SCIENCE AND TECHNOLOGY 2019; 56:2158-2166. [PMID: 30996449 PMCID: PMC6443740 DOI: 10.1007/s13197-019-03697-7] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Revised: 02/26/2019] [Accepted: 03/06/2019] [Indexed: 05/13/2023]
Abstract
A novel spectral variable selection method, named as interval combination optimization (ICO), was proposed in the previous study of us. In the present study, ICO coupled with near infrared (NIR) spectroscopy was applied to the rapid determination of four primary constituents including total sugar, reducing sugar, total nitrogen and nicotine in Nicotiana plant. Partial least squares regressions was performed after ICO algorithm. The full spectrum was divided into forty equal-width intervals, and the interval with lower root mean squared error of cross-validation was selected for further analysis. As a result, only 155 variables were retained from 1555 variables for each constituent. Particularly, as a variables selection method, ICO improved the prediction accuracy of calibration model and obtained a satisfactory result compared with full-spectrum data. Results revealed that NIR combined with ICO could be efficiently used for rapid analysis of quality associated constituents of Nicotiana plant. Moreover, this study provided a supplementary verification of the proposed variable selection method for the further applications.
Collapse
|
124
|
Akbarzadeh N, Mireei SA, Askari G, Mahdavi AH. Microwave spectroscopy based on the waveguide technique for the nondestructive freshness evaluation of egg. Food Chem 2019; 277:558-565. [PMID: 30502185 DOI: 10.1016/j.foodchem.2018.10.143] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2018] [Revised: 10/17/2018] [Accepted: 10/31/2018] [Indexed: 10/27/2022]
Abstract
A rectangular waveguide equipped with a network analyzer was used to assess the quality indices of shell egg. The scattering parameters of the eggs were acquired in the range of 0.9-1.7 GHz and they were then used to calculate microwave spectra of the samples. PLS and ANN regression methods were implemented to predict the egg quality indices and SIMCA and ANN classification methods were applied to classify the eggs based on their storage time. The best predictive models, however, obtained from ANN analysis where the yolk coefficient, air cell height, thick albumen height, Haugh unit, and albumen pH could be predicted with the residual predictive deviation (RPD) values of 3.500, 3.000, 2.411, 2.033, and 1.829, respectively. To classify the eggs according to their storage time, both SIMCA and ANN analyses resulted in the total accuracy of 100% when return loss spectra were used as the input.
Collapse
|
125
|
Liu Y, Wang Y, Xia Z, Wang Y, Wu Y, Gong Z. Rapid determination of phytosterols by NIRS and chemometric methods. SPECTROCHIMICA ACTA. PART A, MOLECULAR AND BIOMOLECULAR SPECTROSCOPY 2019; 211:336-341. [PMID: 30583164 DOI: 10.1016/j.saa.2018.12.030] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/15/2018] [Revised: 12/10/2018] [Accepted: 12/16/2018] [Indexed: 06/09/2023]
Abstract
Phytosterols have been extensively studied because it plays essential roles in the physiology of plants and can be used as nutritional supplement to promote human health. We use a rapid method by coupling near-infrared spectroscopy (NIRS) and chemometric techniques to quickly and efficiently determine three essential phytosterols (β-sitosterol, campesterol and stigmasterol) in vegetable oils. Continuous wavelet transform (CWT) method was adopted to remove the baseline shift in the spectra. The quantitative analysis models were constructed by partial least squares (PLS) regression and randomization test (RT) method was used to further improve the models. The optimized models were used to calculate the phytosterol contents in prediction set in order to evaluate their predictability. We have found that the phytosterol contents obtained by the optimized models and Gas Chromatography/Mass Spectrometry (GC/MS) analysis are almost consistent. The root mean square error of prediction (RMSEP) and ratio of prediction to deviation (RPD) for the three phytosterols are 525.7590, 212.2245, 65.1611 and 4.0060, 4.7195 and 3.5441, respectively. The results have proved the feasibility of the proposed method for rapid and non-destructive analysis of phytosterols in edible oils.
Collapse
|