1
|
Piercy T, Herrmann G, Cangelosi A, Zoulias ID, Lopez E. Using skeletal position to estimate human error rates in telemanipulator operators. Front Robot AI 2024; 10:1287417. [PMID: 38263958 PMCID: PMC10803571 DOI: 10.3389/frobt.2023.1287417] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2023] [Accepted: 12/15/2023] [Indexed: 01/25/2024] Open
Abstract
In current telerobotics and telemanipulator applications, operators must perform a wide variety of tasks, often with a high risk associated with failure. A system designed to generate data-based behavioural estimations using observed operator features could be used to reduce risks in industrial teleoperation. This paper describes a non-invasive bio-mechanical feature capture method for teleoperators used to trial novel human-error rate estimators which, in future work, are intended to improve operational safety by providing behavioural and postural feedback to the operator. Operator monitoring studies were conducted in situ using the MASCOT teleoperation system at UKAEA RACE; the operators were given controlled tasks to complete during observation. Building upon existing works for vehicle-driver intention estimation and robotic surgery operator analysis, we used 3D point-cloud data capture using a commercially available depth camera to estimate an operator's skeletal pose. A total of 14 operators were observed and recorded for a total of approximately 8 h, each completing a baseline task and a task designed to induce detectable but safe collisions. Skeletal pose was estimated, collision statistics were recorded, and questionnaire-based psychological assessments were made, providing a database of qualitative and quantitative data. We then trialled data-driven analysis by using statistical and machine learning regression techniques (SVR) to estimate collision rates. We further perform and present an input variable sensitivity analysis for our selected features.
Collapse
Affiliation(s)
- Thomas Piercy
- Faculty of Science and Engineering, The University of Manchester, Manchester, United Kingdom
| | - Guido Herrmann
- Faculty of Science and Engineering, The University of Manchester, Manchester, United Kingdom
| | - Angelo Cangelosi
- Faculty of Science and Engineering, The University of Manchester, Manchester, United Kingdom
| | - Ioannis Dimitrios Zoulias
- Remote Applications in Challenging Environments, United Kingdom Atomic Energy Authority, Culham Science Centre, Oxford, United Kingdom
| | - Erwin Lopez
- Faculty of Science and Engineering, The University of Manchester, Manchester, United Kingdom
| |
Collapse
|
2
|
El-Sappagh S, Alonso-Moral JM, Abuhmed T, Ali F, Bugarín-Diz A. Trustworthy artificial intelligence in Alzheimer’s disease: state of the art, opportunities, and challenges. Artif Intell Rev 2023. [DOI: 10.1007/s10462-023-10415-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/28/2023]
|
3
|
Li JQ, Dukes PV, Lee W, Sarkis M, Vo‐Dinh T. Machine learning using convolutional neural networks for SERS analysis of biomarkers in medical diagnostics. JOURNAL OF RAMAN SPECTROSCOPY : JRS 2022; 53:2044-2057. [PMID: 37067872 PMCID: PMC10087982 DOI: 10.1002/jrs.6447] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/29/2022] [Revised: 07/14/2022] [Accepted: 08/09/2022] [Indexed: 05/30/2023]
Abstract
Surface-enhanced Raman spectroscopy (SERS) has wide diagnostic applications because of narrow spectral features that allow multiplexed analysis. Machine learning (ML) has been used for non-dye-labeled SERS spectra but has not been applied to SERS dye-labeled materials with known spectral shapes. Here, we compare the performances of spectral decomposition, support vector regression, random forest regression, partial least squares regression, and convolutional neural network (CNN) for SERS "spectral unmixing" from a multiplexed mixture of 7 SERS-active "nanorattles" loaded with different dyes for mRNA biomarker detection. We showed that CNN most accurately determined relative contributions of each distinct dye-loaded nanorattle. CNN and comparative models were then used to analyze SERS spectra from a singleplexed, point-of-care assay detecting an mRNA biomarker for head and neck cancer in 20 samples. The CNN, trained on simulated multiplexed data, determined the correct dye contributions from the singleplex assay with RMSElabel = 6.42 × 10-2. These results demonstrate the potential of CNN-based ML to advance SERS-based diagnostics.
Collapse
Affiliation(s)
- Joy Qiaoyi Li
- Fitzpatrick Institute for PhotonicsDuke UniversityDurhamNorth CarolinaUSA
- Biomedical Engineering DepartmentDuke UniversityDurhamNorth CarolinaUSA
| | - Priya Vohra Dukes
- Department of Head and Neck Surgery and Communication SciencesDuke University School of MedicineDurhamNorth CarolinaUSA
| | - Walter Lee
- Department of Head and Neck Surgery and Communication SciencesDuke University School of MedicineDurhamNorth CarolinaUSA
- Global Health InstituteDuke UniversityDurhamNorth CarolinaUSA
| | - Michael Sarkis
- Department of Statistical ScienceDuke UniversityDurhamNorth CarolinaUSA
| | - Tuan Vo‐Dinh
- Fitzpatrick Institute for PhotonicsDuke UniversityDurhamNorth CarolinaUSA
- Biomedical Engineering DepartmentDuke UniversityDurhamNorth CarolinaUSA
- Chemistry DepartmentDuke UniversityDurhamNorth CarolinaUSA
| |
Collapse
|
4
|
Feldmann C, Bajorath J. Calculation of Exact Shapley Values for Support Vector Machines with Tanimoto Kernel Enables Model Interpretation. iScience 2022; 25:105023. [PMID: 36105596 PMCID: PMC9464958 DOI: 10.1016/j.isci.2022.105023] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2022] [Revised: 08/09/2022] [Accepted: 08/20/2022] [Indexed: 11/24/2022] Open
Abstract
The support vector machine (SVM) algorithm is popular in chemistry and drug discovery. SVM models have black box character. Their predictions can be interpreted through feature weighting or the model-agnostic Shapley additive explanations (SHAP) formalism that locally approximates Shapley values (SVs) originating from game theory. We introduce an algorithm termed SV-expressed Tanimoto similarity (SVETA) for the exact calculation of SVs to explain SVM models employing the Tanimoto kernel, the gold standard for the assessment of molecular similarity. For a model system, the exact calculation of SVs is demonstrated. In an SVM-based compound classification task from drug discovery, only a limited correlation between exact SV and SHAP values is observed, prohibiting the use of approximate values for rationalizing predictions. For exemplary test compounds, atom-based mapping of prioritized features delineates coherent substructures that closely resemble those obtained by analyzing independently derived random forest models, thus providing consistent explanations. SVETA: new methodology for explaining support vector machine (SVM) predictions Tanimoto similarity-based SVM models are popular in chemistry SVETA enables the calculation of exact Shapley values for rationalizing SVM models SVETA-based feature mapping provides intuitive explanations of SVM decisions
Collapse
|
5
|
A New Alternative Tool to Analyse Glycosylation in Monoclonal Antibodies Based on Drop-Coating Deposition Raman imaging: A Proof of Concept. Molecules 2022; 27:molecules27144405. [PMID: 35889277 PMCID: PMC9317070 DOI: 10.3390/molecules27144405] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2022] [Revised: 07/04/2022] [Accepted: 07/07/2022] [Indexed: 11/17/2022] Open
Abstract
Glycosylation is considered a critical quality attribute of therapeutic proteins as it affects their stability, bioactivity, and safety. Hence, the development of analytical methods able to characterize the composition and structure of glycoproteins is crucial. Existing methods are time consuming, expensive, and require significant sample preparation, which can alter the robustness of the analyses. In this context, we developed a fast, direct, and simple drop-coating deposition Raman imaging (DCDR) method combined with multivariate curve resolution alternating least square (MCR-ALS) to analyze glycosylation in monoclonal antibodies (mAbs). A database of hyperspectral Raman imaging data of glycoproteins was built, and the glycoproteins were characterized by LC-FLR-MS as a reference method to determine the composition in glycans and monosaccharides. The DCDR method was used and allowed the separation of excipient and protein by forming a "coffee ring". MCR-ALS analysis was performed to visualize the distribution of the compounds in the drop and to extract the pure spectral components. Further, the strategy of SVD-truncation was used to select the number of components to resolve by MCR-ALS. Raman spectra were processed by support vector regression (SVR). SVR models showed good predictive performance in terms of RMSECV, R2CV.
Collapse
|
6
|
Swain S, Bhushan B, Dhiman G, Viriyasitavat W. Appositeness of Optimized and Reliable Machine Learning for Healthcare: A Survey. ARCHIVES OF COMPUTATIONAL METHODS IN ENGINEERING : STATE OF THE ART REVIEWS 2022; 29:3981-4003. [PMID: 35342282 PMCID: PMC8939887 DOI: 10.1007/s11831-022-09733-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/05/2021] [Accepted: 02/09/2022] [Indexed: 05/04/2023]
Abstract
Machine Learning (ML) has been categorized as a branch of Artificial Intelligence (AI) under the Computer Science domain wherein programmable machines imitate human learning behavior with the help of statistical methods and data. The Healthcare industry is one of the largest and busiest sectors in the world, functioning with an extensive amount of manual moderation at every stage. Most of the clinical documents concerning patient care are hand-written by experts, selective reports are machine-generated. This process elevates the chances of misdiagnosis thereby, imposing a risk to a patient's life. Recent technological adoptions for automating manual operations have witnessed extensive use of ML in its applications. The paper surveys the applicability of ML approaches in automating medical systems. The paper discusses most of the optimized statistical ML frameworks that encourage better service delivery in clinical aspects. The universal adoption of various Deep Learning (DL) and ML techniques as the underlying systems for a variety of wellness applications, is delineated by challenges and elevated by myriads of security. This work tries to recognize a variety of vulnerabilities occurring in medical procurement, admitting the concerns over its predictive performance from a privacy point of view. Finally providing possible risk delimiting facts and directions for active challenges in the future.
Collapse
Affiliation(s)
- Subhasmita Swain
- Department of Computer Science and Engineering, School of Engineering and Technology, Sharda University, Greater Noida, India
| | - Bharat Bhushan
- Department of Computer Science and Engineering, School of Engineering and Technology, Sharda University, Greater Noida, India
| | - Gaurav Dhiman
- Department of Computer Science, Government Bikram College of Commerce, Patiala, India
- University Centre for Research and Development, Department of Computer Science and Engineering, Chandigarh University, Gharuan, Mohali, India
- Department of Computer Science and Engineering, Graphic Era Deemed to be University, Dehradun, India
| | - Wattana Viriyasitavat
- Department of Statistics, Faculty of Commerce and Accountancy, Chulalongkorn Business School, Bangkok, Thailand
| |
Collapse
|
7
|
Bahani K, Moujabbir M, Ramdani M. An accurate fuzzy rule-based classification systems for heart disease diagnosis. SCIENTIFIC AFRICAN 2021. [DOI: 10.1016/j.sciaf.2021.e01019] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022] Open
|
8
|
Haber EA, Santos MJ, Leitão PJ, Schwieder M, Ketner P, Ernst J, Rietkerk M, Wassen MJ, Eppinga MB. High spatial resolution mapping identifies habitat characteristics of the invasive vine
Antigonon leptopus
on St. Eustatius (Lesser Antilles). Biotropica 2021. [DOI: 10.1111/btp.12939] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022]
Affiliation(s)
- Elizabeth A. Haber
- Copernicus Institute of Sustainable Development Faculty of Geosciences Utrecht University Utrecht The Netherlands
| | - Maria J. Santos
- Department of Geography University of Zürich Zürich Switzerland
- University Research Priority Program in Global Change and Biodiversity University of Zurich Zürich Switzerland
| | - Pedro J. Leitão
- Department Landscape Ecology and Environmental System Analysis Institute of Geoecology Technische Universität Braunschweig Braunschweig Germany
- Geography Department Humboldt‐Universität zu Berlin Berlin Germany
| | - Marcel Schwieder
- Geography Department Humboldt‐Universität zu Berlin Berlin Germany
| | - Pieter Ketner
- Emeritus Tropical Nature Conservation and Vertebrate Ecology Group Department of Environmental Sciences Wageningen University The Netherlands
| | | | - Max Rietkerk
- Copernicus Institute of Sustainable Development Faculty of Geosciences Utrecht University Utrecht The Netherlands
| | - Martin J. Wassen
- Copernicus Institute of Sustainable Development Faculty of Geosciences Utrecht University Utrecht The Netherlands
| | - Maarten B. Eppinga
- Department of Geography University of Zürich Zürich Switzerland
- University Research Priority Program in Global Change and Biodiversity University of Zurich Zürich Switzerland
| |
Collapse
|
9
|
Improved Estimation of Winter Wheat Aboveground Biomass Using Multiscale Textures Extracted from UAV-Based Digital Images and Hyperspectral Feature Analysis. REMOTE SENSING 2021. [DOI: 10.3390/rs13040581] [Citation(s) in RCA: 22] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Rapid and accurate crop aboveground biomass estimation is beneficial for high-throughput phenotyping and site-specific field management. This study explored the utility of high-definition digital images acquired by a low-flying unmanned aerial vehicle (UAV) and ground-based hyperspectral data for improved estimates of winter wheat biomass. To extract fine textures for characterizing the variations in winter wheat canopy structure during growing seasons, we proposed a multiscale texture extraction method (Multiscale_Gabor_GLCM) that took advantages of multiscale Gabor transformation and gray-level co-occurrency matrix (GLCM) analysis. Narrowband normalized difference vegetation indices (NDVIs) involving all possible two-band combinations and continuum removal of red-edge spectra (SpeCR) were also extracted for biomass estimation. Subsequently, non-parametric linear (i.e., partial least squares regression, PLSR) and nonlinear regression (i.e., least squares support vector machine, LSSVM) analyses were conducted using the extracted spectral features, multiscale textural features and combinations thereof. The visualization technique of LSSVM was utilized to select the multiscale textures that contributed most to the biomass estimation for the first time. Compared with the best-performing NDVI (1193, 1222 nm), the SpeCR yielded higher coefficient of determination (R2), lower root mean square error (RMSE), and lower mean absolute error (MAE) for winter wheat biomass estimation and significantly alleviated the saturation problem after biomass exceeded 800 g/m2. The predictive performance of the PLSR and LSSVM regression models based on SpeCR decreased with increasing bandwidths, especially at bandwidths larger than 11 nm. Both the PLSR and LSSVM regression models based on the multiscale textures produced higher accuracies than those based on the single-scale GLCM-based textures. According to the evaluation of variable importance, the texture metrics “Mean” from different scales were determined as the most influential to winter wheat biomass. Using just 10 multiscale textures largely improved predictive performance over using all textures and achieved an accuracy comparable with using SpeCR. The LSSVM regression model based on the combination of the selected multiscale textures, and SpeCR with a bandwidth of 9 nm produced the highest estimation accuracy with R2val = 0.87, RMSEval = 119.76 g/m2, and MAEval = 91.61 g/m2. However, the combination did not significantly improve the estimation accuracy, compared to the use of SpeCR or multiscale textures only. The accuracy of the biomass predicted by the LSSVM regression models was higher than the results of the PLSR models, which demonstrated LSSVM was a potential candidate to characterize winter wheat biomass during multiple growth stages. The study suggests that multiscale textures derived from high-definition UAV-based digital images are competitive with hyperspectral features in predicting winter wheat biomass.
Collapse
|
10
|
Guo HN, Wu SB, Tian YJ, Zhang J, Liu HT. Application of machine learning methods for the prediction of organic solid waste treatment and recycling processes: A review. BIORESOURCE TECHNOLOGY 2021; 319:124114. [PMID: 32942236 DOI: 10.1016/j.biortech.2020.124114] [Citation(s) in RCA: 81] [Impact Index Per Article: 27.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/22/2020] [Revised: 09/04/2020] [Accepted: 09/07/2020] [Indexed: 05/23/2023]
Abstract
Conventional treatment and recycling methods of organic solid waste contain inherent flaws, such as low efficiency, low accuracy, high cost, and potential environmental risks. In the past decade, machine learning has gradually attracted increasing attention in solving the complex problems of organic solid waste treatment. Although significant research has been carried out, there is a lack of a systematic review of the research findings in this field. This study sorts the research studies published between 2003 and 2020, summarizes the specific application fields, characteristics, and suitability of different machine learning models, and discusses the relevant application limitations and future prospects. It can be concluded that studies mostly focused on municipal solid waste management, followed by anaerobic digestion, thermal treatment, composting, and landfill. The most widely used model is the artificial neural network, which has been successfully applied to various complicated non-linear organic solid waste related problems.
Collapse
Affiliation(s)
- Hao-Nan Guo
- Institute of Geographic Sciences and Natural Resources Research, Chinese Academy of Sciences, Beijing 100101, China; College of Resources and Environment, University of Chinese Academy of Sciences, Beijing 100049, China
| | - Shu-Biao Wu
- Aarhus Institute of Advanced Studies, Aarhus University, DK-8000 Aarhus C, Denmark
| | - Ying-Jie Tian
- CAS Research Center on Fictitious Economy & Data Science, Beijing 100190, China
| | - Jun Zhang
- Guangxi Key Laboratory of Environmental Pollution Control Theory and Technology, Guilin University of Technology, Guilin 541004, China
| | - Hong-Tao Liu
- Institute of Geographic Sciences and Natural Resources Research, Chinese Academy of Sciences, Beijing 100101, China; Engineering Laboratory for Yellow River Delta Modern Agriculture, Chinese Academy of Sciences, Beijing 100101, China.
| |
Collapse
|
11
|
de Carvalho Rocha WF, Sheen DA. Determination of physicochemical properties of petroleum derivatives and biodiesel using GC/MS and chemometric methods with uncertainty estimation. FUEL (LONDON, ENGLAND) 2019; 243:413-422. [PMID: 38516536 PMCID: PMC10956500 DOI: 10.1016/j.fuel.2018.12.126] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/23/2024]
Abstract
The physicochemical properties of a substance, such as a fuel, can vary significantly with composition. Determining these properties with ASTM standard methods is both expensive and time-consuming, which has led to a desire to use chemometric modeling as an alternative. In this study, we compare the accuracy and robustness of two chemometric models, partial least squares (PLS) regression and support vector machine (SVM) with uncertainty estimation to determine how the physicochemical properties depend on the composition. A set of hydrocarbon mixtures, including crude oil, oil, gasoline, and biofuel/biodiesel, were collected. GC-MS data were taken, and physicochemical properties were measured for these mixtures using ASTM standard methods. PLS and SVM were used to develop predictive models of the physicochemical properties. Uncertainty in the estimated property values was estimated using a bootstrapping technique. With this uncertainty estimate, it is possible to assess the trustworthiness of any prediction, which ensures that the chemometric models can be applied for general purposes. SVM was found to be generally better for predicting the physicochemical properties, although we expect that with a more comprehensive data set the performance of the PLS models can be improved. We show in this work that PLS and SVM can be used to generate a predictive model of physicochemical properties based on GC-MS data. Combined with uncertainty analysis, these models provide robust predictions that can be used for regulatory, economic, and safety purposes.
Collapse
Affiliation(s)
| | - David A Sheen
- Chemical Sciences Division, National Institute of Standards and Technology, Gaithersburg, MD 20899, USA
| |
Collapse
|
12
|
Moisture Content Quantization of Masson Pine Seedling Leaf Based on Stacked Autoencoder with Near-Infrared Spectroscopy. JOURNAL OF ELECTRICAL AND COMPUTER ENGINEERING 2018. [DOI: 10.1155/2018/8696202] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
Masson pine is widely planted in southern China, and moisture content of the pine seedling leaves is an important index for evaluating the vigor of seedlings. For precisely predicting leaf moisture content, near-infrared spectroscopy analysis is applied in the experiment, which is a cost-effective, high-speed, and noninvasive material content prediction tool. To further improve the spectroscopy analysis accuracy, in this study, a new analysis model is proposed which integrates a stacked autoencoder for extracting hierarchical output-related features layer by layer and a support vector regression model to leverage these features for precisely predicting moisture contents. Compared with traditional spectroscopy analysis method like partial least squares regression and basic support vector regression, the proposed model shows great superiority for leaf moisture content prediction, with R2 value 0.9946 and root-mean squared error (RMSE) value 0.1636 in calibration set and R2 value 0.9621 and RMSE 0.4249 in prediction set.
Collapse
|
13
|
Paz-Kagan T, Brodrick PG, Vaughn NR, Das AJ, Stephenson NL, Nydick KR, Asner GP. What mediates tree mortality during drought in the southern Sierra Nevada? ECOLOGICAL APPLICATIONS : A PUBLICATION OF THE ECOLOGICAL SOCIETY OF AMERICA 2017; 27:2443-2457. [PMID: 28871610 DOI: 10.1002/eap.1620] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/24/2017] [Revised: 07/13/2017] [Accepted: 07/18/2017] [Indexed: 06/07/2023]
Abstract
Severe drought has the potential to cause selective mortality within a forest, thereby inducing shifts in forest species composition. The southern Sierra Nevada foothills and mountains of California have experienced extensive forest dieback due to drought stress and insect outbreak. We used high-fidelity imaging spectroscopy (HiFIS) and light detection and ranging (LiDAR) from the Carnegie Airborne Observatory (CAO) to estimate the effect of forest dieback on species composition in response to drought stress in Sequoia National Park. Our aims were (1) to quantify site-specific conditions that mediate tree mortality along an elevation gradient in the southern Sierra Nevada Mountains, (2) to assess where mortality events have a greater probability of occurring, and (3) to estimate which tree species have a greater likelihood of mortality along the elevation gradient. A series of statistical models were generated to classify species composition and identify tree mortality, and the influences of different environmental factors were spatially quantified and analyzed to assess where mortality events have a greater likelihood of occurring. A higher probability of mortality was observed in the lower portion of the elevation gradient, on southwest- and west-facing slopes, in areas with shallow soils, on shallower slopes, and at greater distances from water. All of these factors are related to site water balance throughout the landscape. Our results also suggest that mortality is species-specific along the elevation gradient, mainly affecting Pinus ponderosa and Pinus lambertiana at lower elevations. Selective mortality within the forest may drive long-term shifts in community composition along the elevation gradient.
Collapse
Affiliation(s)
- Tarin Paz-Kagan
- Department of Global Ecology, Carnegie Institution for Science, Stanford, California, 94305, USA
| | - Philip G Brodrick
- Department of Global Ecology, Carnegie Institution for Science, Stanford, California, 94305, USA
| | - Nicholas R Vaughn
- Department of Global Ecology, Carnegie Institution for Science, Stanford, California, 94305, USA
| | - Adrian J Das
- U.S. Geological Survey, Western Ecological Research Center, Three Rivers, California, 93271, USA
| | - Nathan L Stephenson
- U.S. Geological Survey, Western Ecological Research Center, Three Rivers, California, 93271, USA
| | - Koren R Nydick
- Sequoia and Kings Canyon National Parks, Three Rivers, California, 93271, USA
| | - Gregory P Asner
- Department of Global Ecology, Carnegie Institution for Science, Stanford, California, 94305, USA
| |
Collapse
|
14
|
Exploring the Potential of WorldView-2 Red-Edge Band-Based Vegetation Indices for Estimation of Mangrove Leaf Area Index with Machine Learning Algorithms. REMOTE SENSING 2017. [DOI: 10.3390/rs9101060] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
|
15
|
Paz-Kagan T, Asner GP. Drivers of woody canopy water content responses to drought in a Mediterranean-type ecosystem. ECOLOGICAL APPLICATIONS : A PUBLICATION OF THE ECOLOGICAL SOCIETY OF AMERICA 2017; 27:2220-2233. [PMID: 28727205 DOI: 10.1002/eap.1603] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/26/2016] [Revised: 05/01/2017] [Accepted: 06/20/2017] [Indexed: 06/07/2023]
Abstract
Severe droughts increase physiological stress in woody plant species, which can lead to mortality, fundamentally altering the composition, structure, and biogeography of forests in many regions. Little is known, however, about the factors determining the physiological response of woody plants to drought at landscape scales. Our objective was to understand woody plant species responses to ongoing changes in climate, using remotely sensed canopy water content (CWC) as an indicator of plant physiological and phenological status. We used fused imaging spectroscopy and light detection and ranging from the Carnegie Airborne Observatory to quantify the factors affecting species compositional changes in CWC in a diverse Mediterranean-type ecosystem (Jasper Ridge Biological Preserve, California, USA) between 2013 and 2015. Mapped CWC was spatially variable in both of the observation years, and proved to be most closely tied to species composition and distribution across the landscape. The secondary predictors of CWC were elevation and soil substrate. In contrast, we found that CWC change was much more related to environmental factors than to the species composition. We suggest that the effect of environment on CWC change is mediated through species resistance and resilience to drought. Monitoring CWC change with imaging spectroscopy is a powerful approach to identifying species-level responses to climatic events and long-term change, which may provide support for policy decisions and conservation at large spatial scales.
Collapse
Affiliation(s)
- Tarin Paz-Kagan
- Department of Global Ecology, Carnegie Institution for Science, Stanford, California, 94305, USA
| | - Gregory P Asner
- Department of Global Ecology, Carnegie Institution for Science, Stanford, California, 94305, USA
| |
Collapse
|
16
|
Malenovský Z, Lucieer A, King DH, Turnbull JD, Robinson SA. Unmanned aircraft system advances health mapping of fragile polar vegetation. Methods Ecol Evol 2017. [DOI: 10.1111/2041-210x.12833] [Citation(s) in RCA: 54] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Affiliation(s)
- Zbyněk Malenovský
- Surveying and Spatial Sciences Group School of Land and Food University of Tasmania Hobart Tas. Australia
- Centre for Sustainable Ecosystem Solutions School of Biological Sciences University of Wollongong Wollongong NSW Australia
- Biospheric Sciences Laboratory USRA/GESTAR NASA Goddard Space Flight Center Greenbelt MD USA
| | - Arko Lucieer
- Surveying and Spatial Sciences Group School of Land and Food University of Tasmania Hobart Tas. Australia
| | - Diana H. King
- Centre for Sustainable Ecosystem Solutions School of Biological Sciences University of Wollongong Wollongong NSW Australia
| | - Johanna D. Turnbull
- Centre for Sustainable Ecosystem Solutions School of Biological Sciences University of Wollongong Wollongong NSW Australia
| | - Sharon A. Robinson
- Centre for Sustainable Ecosystem Solutions School of Biological Sciences University of Wollongong Wollongong NSW Australia
| |
Collapse
|
17
|
Skowronek S, Asner GP, Feilhauer H. Performance of one-class classifiers for invasive species mapping using airborne imaging spectroscopy. ECOL INFORM 2017. [DOI: 10.1016/j.ecoinf.2016.11.005] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
|
18
|
Sun H, Nguyen K, Kerns E, Yan Z, Yu KR, Shah P, Jadhav A, Xu X. Highly predictive and interpretable models for PAMPA permeability. Bioorg Med Chem 2016; 25:1266-1276. [PMID: 28082071 DOI: 10.1016/j.bmc.2016.12.049] [Citation(s) in RCA: 63] [Impact Index Per Article: 7.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2016] [Revised: 12/22/2016] [Accepted: 12/27/2016] [Indexed: 11/28/2022]
Abstract
Cell membrane permeability is an important determinant for oral absorption and bioavailability of a drug molecule. An in silico model predicting drug permeability is described, which is built based on a large permeability dataset of 7488 compound entries or 5435 structurally unique molecules measured by the same lab using parallel artificial membrane permeability assay (PAMPA). On the basis of customized molecular descriptors, the support vector regression (SVR) model trained with 4071 compounds with quantitative data is able to predict the remaining 1364 compounds with the qualitative data with an area under the curve of receiver operating characteristic (AUC-ROC) of 0.90. The support vector classification (SVC) model trained with half of the whole dataset comprised of both the quantitative and the qualitative data produced accurate predictions to the remaining data with the AUC-ROC of 0.88. The results suggest that the developed SVR model is highly predictive and provides medicinal chemists a useful in silico tool to facilitate design and synthesis of novel compounds with optimal drug-like properties, and thus accelerate the lead optimization in drug discovery.
Collapse
Affiliation(s)
- Hongmao Sun
- National Center for Advancing Translational Sciences (NCATS), National Institutes of Health, Bethesda, MD 20892, USA.
| | - Kimloan Nguyen
- National Center for Advancing Translational Sciences (NCATS), National Institutes of Health, Bethesda, MD 20892, USA
| | - Edward Kerns
- National Center for Advancing Translational Sciences (NCATS), National Institutes of Health, Bethesda, MD 20892, USA
| | - Zhengyin Yan
- National Center for Advancing Translational Sciences (NCATS), National Institutes of Health, Bethesda, MD 20892, USA
| | - Kyeong Ri Yu
- National Center for Advancing Translational Sciences (NCATS), National Institutes of Health, Bethesda, MD 20892, USA
| | - Pranav Shah
- National Center for Advancing Translational Sciences (NCATS), National Institutes of Health, Bethesda, MD 20892, USA
| | - Ajit Jadhav
- National Center for Advancing Translational Sciences (NCATS), National Institutes of Health, Bethesda, MD 20892, USA
| | - Xin Xu
- National Center for Advancing Translational Sciences (NCATS), National Institutes of Health, Bethesda, MD 20892, USA.
| |
Collapse
|
19
|
Cruz-Correa OF, León-Cachón RBR, Barrera-Saldaña HA, Soberón X. Prediction of atorvastatin plasmatic concentrations in healthy volunteers using integrated pharmacogenetics sequencing. Pharmacogenomics 2016; 18:121-131. [PMID: 27976987 DOI: 10.2217/pgs-2016-0072] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022] Open
Abstract
AIM To use variants found by next-generation sequencing to predict atorvastatin plasmatic concentration profiles (AUC) in healthy volunteers. SUBJECTS & METHODS A total of 60 healthy Mexican volunteers were enrolled in this study. We used variants with a predicted functional effect across 20 genes involved in atorvastatin metabolism to construct a regression model using a support vector approach with a radial basis function kernel to predict AUC refining it afterwards in order to explain a greater extent of the variance. RESULTS The final support vector regression model using 60 variants (including six novel variants) explained 94.52% of the variance in atorvastatin AUC. CONCLUSION An integrated analysis of several genes known to intervene in the different steps of metabolism is required to predict atorvastatin's AUC.
Collapse
Affiliation(s)
- Omar Fernando Cruz-Correa
- Instituto Nacional de Medicina Genómica, Periférico Sur No. 4809, Col. Arenal Tepepan, Delegación Tlalpan, México, D.F. C.P. 14610, Mexico
| | - Rafael Baltazar Reyes León-Cachón
- Departamento de Bioquímica y Medicina Molecular, Facultad de Medicina, Universidad Autónoma de Nuevo León, Ave. Madero, Col. Mitras Centro, Monterrey, Nuevo León, C.P. 64640, Mexico.,División Ciencias de la Salud, Departamento de Ciencias Básicas, Centro de Diagnóstico Molecular y Medicina Personalizada, Universidad de Monterrey, Ave. Ignacio Morones Prieto Pte. 4500, Col. Jesús M. Garza, San Pedro Garza García, Nuevo León, C.P. 66238, Mexico
| | - Hugo Alberto Barrera-Saldaña
- Departamento de Bioquímica y Medicina Molecular, Facultad de Medicina, Universidad Autónoma de Nuevo León, Ave. Madero, Col. Mitras Centro, Monterrey, Nuevo León, C.P. 64640, Mexico.,Vitagénesis, SA de CV., Col. Colinas de San Jerónimo. Monterrey, Nuevo León, C.P. 64630, Mexico
| | - Xavier Soberón
- Instituto Nacional de Medicina Genómica, Periférico Sur No. 4809, Col. Arenal Tepepan, Delegación Tlalpan, México, D.F. C.P. 14610, Mexico.,Instituto de Biotecnología, Universidad Nacional Autónoma de México, Avenida Universidad 2001, Cuernavaca, Morelos, C.P. 62210, Mexico
| |
Collapse
|
20
|
Explaining Support Vector Machines: A Color Based Nomogram. PLoS One 2016; 11:e0164568. [PMID: 27723811 PMCID: PMC5056733 DOI: 10.1371/journal.pone.0164568] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2016] [Accepted: 09/27/2016] [Indexed: 02/05/2023] Open
Abstract
Problem setting Support vector machines (SVMs) are very popular tools for classification, regression and other problems. Due to the large choice of kernels they can be applied with, a large variety of data can be analysed using these tools. Machine learning thanks its popularity to the good performance of the resulting models. However, interpreting the models is far from obvious, especially when non-linear kernels are used. Hence, the methods are used as black boxes. As a consequence, the use of SVMs is less supported in areas where interpretability is important and where people are held responsible for the decisions made by models. Objective In this work, we investigate whether SVMs using linear, polynomial and RBF kernels can be explained such that interpretations for model-based decisions can be provided. We further indicate when SVMs can be explained and in which situations interpretation of SVMs is (hitherto) not possible. Here, explainability is defined as the ability to produce the final decision based on a sum of contributions which depend on one single or at most two input variables. Results Our experiments on simulated and real-life data show that explainability of an SVM depends on the chosen parameter values (degree of polynomial kernel, width of RBF kernel and regularization constant). When several combinations of parameter values yield the same cross-validation performance, combinations with a lower polynomial degree or a larger kernel width have a higher chance of being explainable. Conclusions This work summarizes SVM classifiers obtained with linear, polynomial and RBF kernels in a single plot. Linear and polynomial kernels up to the second degree are represented exactly. For other kernels an indication of the reliability of the approximation is presented. The complete methodology is available as an R package and two apps and a movie are provided to illustrate the possibilities offered by the method.
Collapse
|
21
|
Filgueiras PR, Terra LA, Castro EV, Oliveira LM, Dias JC, Poppi RJ. Prediction of the distillation temperatures of crude oils using 1H NMR and support vector regression with estimated confidence intervals. Talanta 2015; 142:197-205. [DOI: 10.1016/j.talanta.2015.04.046] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2015] [Revised: 04/13/2015] [Accepted: 04/16/2015] [Indexed: 10/23/2022]
|
22
|
Igne B, Drennen JK, Anderson CA. Improving near-infrared prediction model robustness with support vector machine regression: a pharmaceutical tablet assay example. APPLIED SPECTROSCOPY 2014; 68:1348-1356. [PMID: 25358108 DOI: 10.1366/14-07486] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/04/2023]
Abstract
Changes in raw materials and process wear and tear can have significant effects on the prediction error of near-infrared calibration models. When the variability that is present during routine manufacturing is not included in the calibration, test, and validation sets, the long-term performance and robustness of the model will be limited. Nonlinearity is a major source of interference. In near-infrared spectroscopy, nonlinearity can arise from light path-length differences that can come from differences in particle size or density. The usefulness of support vector machine (SVM) regression to handle nonlinearity and improve the robustness of calibration models in scenarios where the calibration set did not include all the variability present in test was evaluated. Compared to partial least squares (PLS) regression, SVM regression was less affected by physical (particle size) and chemical (moisture) differences. The linearity of the SVM predicted values was also improved. Nevertheless, although visualization and interpretation tools have been developed to enhance the usability of SVM-based methods, work is yet to be done to provide chemometricians in the pharmaceutical industry with a regression method that can supplement PLS-based methods.
Collapse
Affiliation(s)
- Benoît Igne
- Duquesne University Center for Pharmaceutical Technology, School of Pharmacy, 600 Forbes Avenue, Pittsburgh, PA 15282 USA
| | | | | |
Collapse
|
23
|
Alves JCL, Poppi RJ. Simultaneous determination of hydrocarbon renewable diesel, biodiesel and petroleum diesel contents in diesel fuel blends using near infrared (NIR) spectroscopy and chemometrics. Analyst 2013; 138:6477-87. [DOI: 10.1039/c3an00883e] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
|
24
|
Platikanov S, Martín J, Tauler R. Linear and non-linear chemometric modeling of THM formation in Barcelona's water treatment plant. THE SCIENCE OF THE TOTAL ENVIRONMENT 2012; 432:365-374. [PMID: 22750183 DOI: 10.1016/j.scitotenv.2012.05.097] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/12/2012] [Revised: 05/22/2012] [Accepted: 05/31/2012] [Indexed: 06/01/2023]
Abstract
The complex behavior observed for the dependence of trihalomethane formation on forty one water treatment plant (WTP) operational variables is investigated by means of linear and non-linear regression methods, including kernel-partial least squares (K-PLS), and support vector machine regression (SVR). Lower prediction errors of total trihalomethane concentrations (lower than 14% for external validation samples) were obtained when these two methods were applied in comparison to when linear regression methods were applied. A new visualization technique revealed the complex nonlinear relationships among the operational variables and displayed the existing correlations between input variables and the kernel matrix on one side and the support vectors on the other side. Whereas some water treatment plant variables like river water TOC and chloride concentrations, and breakpoint chlorination were not considered to be significant due to the multi-collinear effect in straight linear regression modeling methods, they were now confirmed to be significant using K-PLS and SVR non-linear modeling regression methods, proving the better performance of these methods for the prediction of complex formation of trihalomethanes in water disinfection plants.
Collapse
Affiliation(s)
- Stefan Platikanov
- Department of Environmental Chemistry, IDAEA-CSIC, Jordi Girona, 18-26, Barcelona 08026, Spain
| | | | | |
Collapse
|
25
|
Stockl A, Oechsner H. Near-infrared spectroscopic online monitoring of process stability in biogas plants. Eng Life Sci 2012. [DOI: 10.1002/elsc.201100065] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022] Open
Affiliation(s)
- Andrea Stockl
- State Institute of Agricultural Engineering and Bioenergy; University of Hohenheim; Stuttgart; Germany
| | - Hans Oechsner
- State Institute of Agricultural Engineering and Bioenergy; University of Hohenheim; Stuttgart; Germany
| |
Collapse
|
26
|
Postma G, Krooshof P, Buydens L. Opening the kernel of kernel partial least squares and support vector machines. Anal Chim Acta 2011; 705:123-34. [DOI: 10.1016/j.aca.2011.04.025] [Citation(s) in RCA: 37] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2011] [Revised: 03/31/2011] [Accepted: 04/14/2011] [Indexed: 02/08/2023]
|
27
|
Vidal M, Amigo J, Bro R, Ostra M, Ubide C, Zuriarrain J. Flatbed scanners as a source of imaging. Brightness assessment and additives determination in a nickel electroplating bath. Anal Chim Acta 2011; 694:38-45. [DOI: 10.1016/j.aca.2011.03.030] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2010] [Revised: 02/28/2011] [Accepted: 03/15/2011] [Indexed: 11/17/2022]
|
28
|
Krooshof PWT, Ustün B, Postma GJ, Buydens LMC. Visualization and recovery of the (bio)chemical interesting variables in data analysis with support vector machine classification. Anal Chem 2010; 82:7000-7. [PMID: 20704390 DOI: 10.1021/ac101338y] [Citation(s) in RCA: 44] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]
Abstract
Support vector machines (SVMs) have become a popular technique in the chemometrics and bioinformatics field, and other fields, for the classification of complex data sets. Especially because SVMs are able to model nonlinear relationships, the usage of this technique has increased substantially. This modeling is obtained by mapping the data in a higher-dimensional feature space. The disadvantage of such a transformation is, however, that information about the contribution of the original variables in the classification is lost. In this paper we introduce an innovative method which can retrieve the information about the variables of complex data sets. We apply the proposed method to several benchmark data sets and a metabolomics data set to illustrate that we can determine the contribution of the original variables in SVM classifications. The corresponding visualization of the contribution of the variables can assist in a better understanding of the underlying chemical or biological process.
Collapse
Affiliation(s)
- Patrick W T Krooshof
- Radboud University Nijmegen, Institute for Molecules and Materials, Analytical Chemistry, P.O. Box 9010, 6500 GL Nijmegen, The Netherlands
| | | | | | | |
Collapse
|
29
|
Nath A, Zientek MA, Burke BJ, Jiang Y, Atkins WM. Quantifying and predicting the promiscuity and isoform specificity of small-molecule cytochrome P450 inhibitors. Drug Metab Dispos 2010; 38:2195-203. [PMID: 20841376 DOI: 10.1124/dmd.110.034645] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023] Open
Abstract
Drug promiscuity (i.e., inhibition of multiple enzymes by a single compound) is increasingly recognized as an important pharmacological consideration in the drug development process. However, systematic studies of functional or physicochemical characteristics that correlate with drug promiscuity are handicapped by the lack of a good way of quantifying promiscuity. In this article, we present a new entropy-based index of drug promiscuity. We apply this index to two high-throughput data sets describing inhibition of cytochrome P450 isoforms by small-molecule drugs and drug candidates, and we demonstrate how drug promiscuity or specificity can be quantified. For these drug-metabolizing enzymes, we find that there is essentially no correlation between a drug's potency and specificity. We also present an index to quantify the susceptibilities of different enzymes to inhibition by diverse substrates. Finally, we use partial least-squares regression to successfully predict isoform specificity and promiscuity of small molecules, using a set of fingerprint-based descriptors.
Collapse
Affiliation(s)
- Abhinav Nath
- Department of Molecular Biophysics & Biochemistry, Yale University, P.O. Box 208114, New Haven, CT 06520-8114, USA.
| | | | | | | | | |
Collapse
|
30
|
Lapins M, Wikberg JE. Kinome-wide interaction modelling using alignment-based and alignment-independent approaches for kinase description and linear and non-linear data analysis techniques. BMC Bioinformatics 2010; 11:339. [PMID: 20569422 PMCID: PMC2910025 DOI: 10.1186/1471-2105-11-339] [Citation(s) in RCA: 45] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2009] [Accepted: 06/22/2010] [Indexed: 01/03/2023] Open
Abstract
BACKGROUND Protein kinases play crucial roles in cell growth, differentiation, and apoptosis. Abnormal function of protein kinases can lead to many serious diseases, such as cancer. Kinase inhibitors have potential for treatment of these diseases. However, current inhibitors interact with a broad variety of kinases and interfere with multiple vital cellular processes, which causes toxic effects. Bioinformatics approaches that can predict inhibitor-kinase interactions from the chemical properties of the inhibitors and the kinase macromolecules might aid in design of more selective therapeutic agents, that show better efficacy and lower toxicity. RESULTS We applied proteochemometric modelling to correlate the properties of 317 wild-type and mutated kinases and 38 inhibitors (12,046 inhibitor-kinase combinations) to the respective combination's interaction dissociation constant (Kd). We compared six approaches for description of protein kinases and several linear and non-linear correlation methods. The best performing models encoded kinase sequences with amino acid physico-chemical z-scale descriptors and used support vector machines or partial least- squares projections to latent structures for the correlations. Modelling performance was estimated by double cross-validation. The best models showed high predictive ability; the squared correlation coefficient for new kinase-inhibitor pairs ranging P2 = 0.67-0.73; for new kinases it ranged P2kin = 0.65-0.70. Models could also separate interacting from non-interacting inhibitor-kinase pairs with high sensitivity and specificity; the areas under the ROC curves ranging AUC = 0.92-0.93. We also investigated the relationship between the number of protein kinases in the dataset and the modelling results. Using only 10% of all data still a valid model was obtained with P2 = 0.47, P2kin = 0.42 and AUC = 0.83. CONCLUSIONS Our results strongly support the applicability of proteochemometrics for kinome-wide interaction modelling. Proteochemometrics might be used to speed-up identification and optimization of protein kinase targeted and multi-targeted inhibitors.
Collapse
Affiliation(s)
- Maris Lapins
- Department of Pharmaceutical Pharmacology, Uppsala University, Sweden
| | | |
Collapse
|
31
|
Sugimoto M, Koseki T, Hirayama A, Abe S, Sano T, Tomita M, Soga T. Correlation between sensory evaluation scores of Japanese sake and metabolome profiles. JOURNAL OF AGRICULTURAL AND FOOD CHEMISTRY 2010; 58:374-383. [PMID: 19961224 DOI: 10.1021/jf903680d] [Citation(s) in RCA: 51] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/28/2023]
Abstract
The aim of this study was to explore the association between taste and metabolite profiles of Japanese refined sake. Nontarget metabolome analysis was conducted using capillary electrophoresis mass spectrometry. Zatsumi, an unpleasant not clear flavor, and sweetness, bitterness, and sourness were graded by four experienced panelists. Regression models based on support vector regression (SVR) were used to estimate the relationships among sensory evaluation scores and quantified metabolites and visualized as a nonlinear relationship between sensory scores and metabolite components. The SVR model was highly accurate and versatile: the correlation coefficients for whole training data, cross-validation, and separated validation data were 0.86, 0.73, and 0.73, respectively, for zatsumi. Other sensory scores were also analyzed and modeled by SVR. The methodology demonstrated here carries great potential for predicting the relevant parameters and quantitative relationships between charged metabolites and sensory evaluation in Japanese refined sake.
Collapse
Affiliation(s)
- Masahiro Sugimoto
- Institute for Advanced Biosciences, Keio University, Tsuruoka, Yamagata 997-0052, Japan.
| | | | | | | | | | | | | |
Collapse
|
32
|
Abstract
The increasing interest in Support Vector Machines (SVMs) over the past 15 years is described. Methods are illustrated using simulated case studies, and 4 experimental case studies, namely mass spectrometry for studying pollution, near infrared analysis of food, thermal analysis of polymers and UV/visible spectroscopy of polyaromatic hydrocarbons. The basis of SVMs as two-class classifiers is shown with extensive visualisation, including learning machines, kernels and penalty functions. The influence of the penalty error and radial basis function radius on the model is illustrated. Multiclass implementations including one vs. all, one vs. one, fuzzy rules and Directed Acyclic Graph (DAG) trees are described. One-class Support Vector Domain Description (SVDD) is described and contrasted to conventional two- or multi-class classifiers. The use of Support Vector Regression (SVR) is illustrated including its application to multivariate calibration, and why it is useful when there are outliers and non-linearities.
Collapse
Affiliation(s)
- Richard G Brereton
- Centre for Chemometrics, School of Chemistry, University of Bristol, Cantock's Close, Bristol, UK BS8 1TS.
| | | |
Collapse
|
33
|
The Feature Importance Ranking Measure. MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES 2009. [DOI: 10.1007/978-3-642-04174-7_45] [Citation(s) in RCA: 36] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/02/2022]
|
34
|
Guha R. On the interpretation and interpretability of quantitative structure–activity relationship models. J Comput Aided Mol Des 2008; 22:857-71. [DOI: 10.1007/s10822-008-9240-5] [Citation(s) in RCA: 58] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2008] [Accepted: 08/14/2008] [Indexed: 01/28/2023]
|
35
|
Sonnenburg S, Zien A, Philips P, Rätsch G. POIMs: positional oligomer importance matrices--understanding support vector machine-based signal detectors. Bioinformatics 2008; 24:i6-14. [PMID: 18586746 PMCID: PMC2718648 DOI: 10.1093/bioinformatics/btn170] [Citation(s) in RCA: 47] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION At the heart of many important bioinformatics problems, such as gene finding and function prediction, is the classification of biological sequences. Frequently the most accurate classifiers are obtained by training support vector machines (SVMs) with complex sequence kernels. However, a cumbersome shortcoming of SVMs is that their learned decision rules are very hard to understand for humans and cannot easily be related to biological facts. RESULTS To make SVM-based sequence classifiers more accessible and profitable, we introduce the concept of positional oligomer importance matrices (POIMs) and propose an efficient algorithm for their computation. In contrast to the raw SVM feature weighting, POIMs take the underlying correlation structure of k-mer features induced by overlaps of related k-mers into account. POIMs can be seen as a powerful generalization of sequence logos: they allow to capture and visualize sequence patterns that are relevant for the investigated biological phenomena. AVAILABILITY All source code, datasets, tables and figures are available at http://www.fml.tuebingen.mpg.de/raetsch/projects/POIM. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Sören Sonnenburg
- Fraunhofer Institute FIRST, Department IDA, Kekulèstr. 7, 12489 Berlin, Germany.
| | | | | | | |
Collapse
|
36
|
Affiliation(s)
- Barry Lavine
- Department of Chemistry, Oklahoma State University, Stillwater, Oklahoma 74078, USA
| | | |
Collapse
|