1
|
Avval TG, Moeini B, Carver V, Fairley N, Smith EF, Baltrusaitis J, Fernandez V, Tyler BJ, Gallagher N, Linford MR. The Often-Overlooked Power of Summary Statistics in Exploratory Data Analysis: Comparison of Pattern Recognition Entropy (PRE) to Other Summary Statistics and Introduction of Divided Spectrum-PRE (DS-PRE). J Chem Inf Model 2021; 61:4173-4189. [PMID: 34499501 DOI: 10.1021/acs.jcim.1c00244] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Unsupervised exploratory data analysis (EDA) is often the first step in understanding complex data sets. While summary statistics are among the most efficient and convenient tools for exploring and describing sets of data, they are often overlooked in EDA. In this paper, we show multiple case studies that compare the performance, including clustering, of a series of summary statistics in EDA. The summary statistics considered here are pattern recognition entropy (PRE), the mean, standard deviation (STD), 1-norm, range, sum of squares (SSQ), and X4, which are compared with principal component analysis (PCA), multivariate curve resolution (MCR), and/or cluster analysis. PRE and the other summary statistics are direct methods for analyzing data-they are not factor-based approaches. To quantify the performance of summary statistics, we use the concept of the "critical pair," which is employed in chromatography. The data analyzed here come from different analytical methods. Hyperspectral images, including one of a biological material, are also analyzed. In general, PRE outperforms the other summary statistics, especially in image analysis, although a suite of summary statistics is useful in exploring complex data sets. While PRE results were generally comparable to those from PCA and MCR, PRE is easier to apply. For example, there is no need to determine the number of factors that describe a data set. Finally, we introduce the concept of divided spectrum-PRE (DS-PRE) as a new EDA method. DS-PRE increases the discrimination power of PRE. We also show that DS-PRE can be used to provide the inputs for the k-nearest neighbor (kNN) algorithm. We recommend PRE and DS-PRE as rapid new tools for unsupervised EDA.
Collapse
Affiliation(s)
- Tahereh G Avval
- Department of Chemistry and Biochemistry, Brigham Young University, C100 BNSN, Provo, Utah 84602, United States
| | - Behnam Moeini
- Department of Chemistry and Biochemistry, Brigham Young University, C100 BNSN, Provo, Utah 84602, United States
| | - Victoria Carver
- Department of Chemistry and Biochemistry, Brigham Young University, C100 BNSN, Provo, Utah 84602, United States
| | - Neal Fairley
- Casa Software Ltd., Bay House, 5 Grosvenor Terrace, Teignmouth, Devon TQ14 8NE, U.K
| | - Emily F Smith
- Nanoscale and Microscale Research Centre (NMRC) and School of Chemistry, University of Nottingham, University Park, Nottingham NG7 2RD, U.K
| | - Jonas Baltrusaitis
- Department of Chemical and Biomolecular Engineering, Lehigh University, B336 Iacocca Hall, 111 Research Drive, Bethlehem, Pennsylvania 18015, United States
| | - Vincent Fernandez
- Institut des Matériaux Jean Rouxel, IMN, Université de Nantes, CNRS, F-44000 Nantes, France
| | - Bonnie J Tyler
- Institut für Physik, Westfälische Wilhelms-Universität, 48149 Münster, Germany
| | - Neal Gallagher
- Eigenvector Research, Inc., Manson, Washington 98831, United States
| | - Matthew R Linford
- Department of Chemistry and Biochemistry, Brigham Young University, C100 BNSN, Provo, Utah 84602, United States
| |
Collapse
|
2
|
Alves TM, Marston ZP, MacRae IV, Koch RL. Effects of Foliar Insecticides on Leaf-Level Spectral Reflectance of Soybean. JOURNAL OF ECONOMIC ENTOMOLOGY 2017; 110:2436-2442. [PMID: 29029168 DOI: 10.1093/jee/tox250] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/29/2017] [Indexed: 06/07/2023]
Abstract
Pest-induced changes in plant reflectance are crucial for the development of pest management programs using remote sensing. However, it is unknown if plant reflectance data is also affected by foliar insecticides applied for pest management. Our study assessed the effects of foliar insecticides on leaf reflectance of soybean. A 2-yr field trial and a greenhouse trial were conducted using randomized complete block and completely randomized designs, respectively. Treatments consisted of an untreated check, a new systemic insecticide (sulfoxaflor), and two representatives of the most common insecticide classes used for soybean pest management in the north-central United States (i.e., λ-cyhalothrin and chlorpyrifos). Insecticides were applied at labeled rates recommended for controlling soybean aphid; the primary insect pest in the north-central United States. Leaf-level reflectance was measured using ground-based spectroradiometers. Sulfoxaflor affected leaf reflectance at some red and blue wavelengths but had no effect at near-infrared or green wavelengths. Chlorpyrifos affected leaf reflectance at some green, red, and near-infrared wavelengths but had no effect at blue wavelengths. λ-cyhalothrin had the least effect on spectral reflectance among the insecticides, with changes to only a few near-infrared wavelengths. Our results showing immediate and delayed effects of foliar insecticides on soybean reflectance indicate that application of some insecticides may confound the use of remote sensing for detection of not only insects but also plant diseases, nutritional and water deficiencies, and other crop stressors.
Collapse
Affiliation(s)
| | | | - Ian V MacRae
- Department of Entomology, University of Minnesota
| | | |
Collapse
|
3
|
Nansen C. The potential and prospects of proximal remote sensing of arthropod pests. PEST MANAGEMENT SCIENCE 2016; 72:653-659. [PMID: 26663253 DOI: 10.1002/ps.4209] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/23/2015] [Revised: 12/05/2015] [Accepted: 12/10/2015] [Indexed: 06/05/2023]
Abstract
Bench-top or proximal remote sensing applications are widely used as part of quality control and machine vision systems in commercial operations. In addition, these technologies are becoming increasingly important in insect systematics and studies of insect physiology and pest management. This paper provides a review and discussion of how proximal remote sensing may contribute valuable quantitative information regarding identification of species, assessment of insect responses to insecticides, insect host responses to parasitoids and performance of biological control agents. The future role of proximal remote sensing is discussed as an exciting path for novel paths of multidisciplinary research among entomologists and scientists from a wide range of other disciplines, including image processing engineers, medical engineers, research pharmacists and computer scientists. © 2015 Society of Chemical Industry.
Collapse
Affiliation(s)
- Christian Nansen
- Department of Entomology and Nematology, University of California, Davis, CA, USA
| |
Collapse
|
4
|
Abstract
Remote sensing describes the characterization of the status of objects and/or the classification of their identity based on a combination of spectral features extracted from reflectance or transmission profiles of radiometric energy. Remote sensing can be benchtop based, and therefore acquired at a high spatial resolution, or airborne at lower spatial resolution to cover large areas. Despite important challenges, airborne remote sensing technologies will undoubtedly be of major importance in optimized management of agricultural systems in the twenty-first century. Benchtop remote sensing applications are becoming important in insect systematics and in phenomics studies of insect behavior and physiology. This review highlights how remote sensing influences entomological research by enabling scientists to nondestructively monitor how individual insects respond to treatments and ambient conditions. Furthermore, novel remote sensing technologies are creating intriguing interdisciplinary bridges between entomology and disciplines such as informatics and electrical engineering.
Collapse
Affiliation(s)
- Christian Nansen
- Department of Entomology and Nematology, University of California, Davis, California 95616;
| | | |
Collapse
|
5
|
Zhang X, Nansen C, Aryamanesh N, Yan G, Boussaid F. Importance of spatial and spectral data reduction in the detection of internal defects in food products. APPLIED SPECTROSCOPY 2015; 69:473-80. [PMID: 25742260 DOI: 10.1366/14-07672] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/04/2023]
Abstract
Despite the importance of data reduction as part of the processing of reflection-based classifications, this study represents one of the first in which the effects of both spatial and spectral data reductions on classification accuracies are quantified. Furthermore, the effects of approaches to data reduction were quantified for two separate classification methods, linear discriminant analysis (LDA) and support vector machine (SVM). As the model dataset, reflection data were acquired using a hyperspectral camera in 230 spectral channels from 401 to 879 nm (spectral resolution of 2.1 nm) from field pea (Pisum sativum) samples with and without internal pea weevil (Bruchus pisorum) infestation. We deployed five levels of spatial data reduction (binning) and eight levels of spectral data reduction (40 datasets). Forward stepwise LDA was used to select and include only spectral channels contributing the most to the separation of pixels from non-infested and infested field peas. Classification accuracies obtained with LDA and SVM were based on the classification of independent validation datasets. Overall, SVMs had significantly higher classification accuracies than LDAs (P < 0.01). There was a negative association between pixel resolution and classification accuracy, while spectral binning equivalent to up to 98% data reduction had negligible effect on classification accuracies. This study supports the potential use of reflection-based technologies in the quality control of food products with internal defects, and it highlights that spatial and spectral data reductions can (1) improve classification accuracies, (2) vastly decrease computer constraints, and (3) reduce analytical concerns associated with classifications of large and high-dimensional datasets.
Collapse
Affiliation(s)
- Xuechen Zhang
- University of Western Australia, School of Animal Biology, Faculty of Science, 35 Stirling Highway, Crawley, Perth, WA 6009, Australia
| | | | | | | | | |
Collapse
|
7
|
Nansen C, Coelho A, Vieira JM, Parra JRP. Reflectance-based identification of parasitized host eggs and adult Trichogramma specimens. ACTA ACUST UNITED AC 2013; 217:1187-92. [PMID: 24363420 DOI: 10.1242/jeb.095661] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/10/2023]
Abstract
A wide range of imaging and spectroscopy technologies is used in medical diagnostics, quality control in production systems, military applications, stress detection in agriculture, and ecological studies of both terrestrial and aquatic organisms. In this study, we hypothesized that reflectance profiling can be used to successfully classify animals that are otherwise very challenging to classify. We acquired hyperspectral images from adult specimens of the egg parasitoid genus Trichogramma (T. galloi, T. pretiosum and T. atopovirilia), which are ~1.0 mm in length. We also acquired hyperspectral images from host eggs containing developing Trichogramma instar and pupae. These obligate egg endoparasitoid species are commercially available as natural enemies of lepidopteran pests in food production systems. Because of their minute size and physical resemblance, classification is time consuming and requires a high level of technical experience. The classification of reflectance profiles was based on a combination of average reflectance and variogram parameters (describing the spatial structure of reflectance data) of reflectance values in individual spectral bands. Although variogram parameters (variogram analysis) are commonly used in large-scale spatial research (i.e. geoscience and landscape ecology), they have only recently been used in classification of high-resolution hyperspectral imaging data. The classification model of parasitized host eggs was equally successful for each of the three species and was successfully validated with independent data sets (>90% classification accuracy). The classification model of adult specimens accurately separated T. atopovirilia from the other two species, but specimens of T. galloi and T. pretiosum could not be accurately separated. Interestingly, molecular-based classification (using the DNA sequence of the internally transcribed spacer ITS2) of Trichogramma species published elsewhere corroborates the classification, as T. galloi and T. pretiosum are closely related and comparatively distant from T. atopovirilia. Our results emphasize the importance of using high-spectral and high-spatial resolution data in the classification of organism relatedness, and hyperspectral imaging may be of relevance to a wide range of commercial (i.e. producers of biocontrol agents), taxonomic and evolutionary research applications.
Collapse
Affiliation(s)
- Christian Nansen
- The University of Western Australia, School of Animal Biology, The UWA Institute of Agriculture, 35 Stirling Highway, Crawley, Perth, WA 6009, Australia
| | | | | | | |
Collapse
|
9
|
Abstract
Successful classifications of reflectance and vibrational data are to a large extent dependent upon robustness of input data. In this study, a well-known geostatistical approach, variogram analysis, was described and its robustness was assessed through comprehensive evaluation of 3,200 variogram settings. High-resolution hyperspectral imaging data were acquired from greenhouse maize plants, and the robustness (radiometric repeatability) of three variogram parameters (nugget, sill, and range) was examined when generated from imaging data collected from two different sets of plants and with imaging data collected on seven different days in two years. Robustness of variogram parameters was compared with average reflectance values in six spectral bands, three standard vegetation indices (NDVI, SI, and PRI), and PCA scores from principal component analysis.
Collapse
Affiliation(s)
- Christian Nansen
- Texas AgriLife Research, 1102 East FM 1294, Lubbock, TX 79403, USA.
| |
Collapse
|