1
|
Si K, Sun Z, Song H, Jiang X, Wang X. Machine learning-assisted design and prediction of materials for batteries based on alkali metals. Phys Chem Chem Phys 2025. [PMID: 40029241 DOI: 10.1039/d4cp04214j] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/05/2025]
Abstract
Since the commercialization of lithium-ion batteries in the 1990s, batteries based on alkali metals have been promising candidates for energy storage. The performances of these batteries, in terms of cost-efficiency, energy density, safety, and cycle life need continuous improvement. Battery performances are highly dependent on electrode materials, yet the long experimental period, intensive labor, and high cost remain bottlenecks in the improvement of electrode materials. Machine learning (ML), which is being increasingly integrated into materials science, offers transformative potential by reducing the R&D period and cost. ML also demonstrates significant advantages in the performance prediction of various materials, and it can also help reveal the structure-performance relationship of materials. ML-assisted material design and performance prediction thus enable the innovation of advanced materials. Herein, implementation of ML for exploring alkali metal-based batteries is outlined, highlighting various ML algorithms as well as electrode reaction mechanisms.
Collapse
Affiliation(s)
- Kexin Si
- State Key Laboratory of Mechanics and Control of Mechanical Structures, Key Laboratory for Intelligent Nano Materials and Devices of the Ministry of Education, College of Material Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing 210016, China.
| | - Zhipeng Sun
- National Laboratory of Solid State Microstructures (NLSSM), Frontiers Science Center for Critical Earth Material Cycling, College of Engineering and Applied Sciences, Nanjing University, Nanjing 210093, China.
| | - Huaxin Song
- State Key Laboratory of Mechanics and Control of Mechanical Structures, Key Laboratory for Intelligent Nano Materials and Devices of the Ministry of Education, College of Material Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing 210016, China.
| | - Xiangfen Jiang
- State Key Laboratory of Mechanics and Control of Mechanical Structures, Key Laboratory for Intelligent Nano Materials and Devices of the Ministry of Education, College of Material Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing 210016, China.
| | - Xuebin Wang
- National Laboratory of Solid State Microstructures (NLSSM), Frontiers Science Center for Critical Earth Material Cycling, College of Engineering and Applied Sciences, Nanjing University, Nanjing 210093, China.
| |
Collapse
|
2
|
Martens RR, Gozdzialski L, Newman E, Gill C, Wallace B, Hore DK. Optimized machine learning approaches to combine surface-enhanced Raman scattering and infrared data for trace detection of xylazine in illicit opioids. Analyst 2025; 150:700-711. [PMID: 39835803 DOI: 10.1039/d4an01496k] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2025]
Abstract
Infrared absorption spectroscopy and surface-enhanced Raman spectroscopy were integrated into three data fusion strategies-hybrid (concatenated spectra), mid-level (extracted features from both datasets) and high-level (fusion of predictions from both models)-to enhance the predictive accuracy for xylazine detection in illicit opioid samples. Three chemometric approaches-random forest, support vector machine, and k-nearest neighbor algorithms-were employed and optimized using a 5-fold cross-validation grid search for all fusion strategies. Validation results identified the random forest classifier as the optimal model for all fusion strategies, achieving high sensitivity (88% for hybrid, 92% for mid-level, and 96% for high-level) and specificity (88% for hybrid, mid-level, and high-level). The enhanced performance of the high-level fusion approach (F1 score of 92%) is demonstrated, effectively leveraging the surface-enhanced Raman data with a 90% voting weight, without compromising prediction accuracy (92%) when combined with infrared spectral data. This highlights the viability of a multi-instrument approach using data fusion and random forest classification to improve the detection of various components in complex opioid samples in a point-of-care setting.
Collapse
Affiliation(s)
- Rebecca R Martens
- Department of Chemistry, University of Victoria, Victoria, British Columbia, V8W 3V6, Canada.
| | - Lea Gozdzialski
- Department of Chemistry, University of Victoria, Victoria, British Columbia, V8W 3V6, Canada.
| | - Ella Newman
- Department of Chemistry, University of Victoria, Victoria, British Columbia, V8W 3V6, Canada.
| | - Chris Gill
- Department of Chemistry, Vancouver Island University, Nanaimo, British Columbia, V9R 5S5, Canada
- Department of Chemistry, University of Victoria, Victoria, British Columbia, V8W 3V6, Canada.
- Department of Environmental and Occupational Health Sciences, University of Washington, Seattle, WA 98195, USA
- Canadian Institute for Substance Use Research, University of Victoria, Victoria, British Columbia, V8W 2Y2, Canada
| | - Bruce Wallace
- School of Social Work, University of Victoria, Victoria, British Columbia, V8W 2Y2, Canada
- Canadian Institute for Substance Use Research, University of Victoria, Victoria, British Columbia, V8W 2Y2, Canada
| | - Dennis K Hore
- Department of Chemistry, University of Victoria, Victoria, British Columbia, V8W 3V6, Canada.
- Canadian Institute for Substance Use Research, University of Victoria, Victoria, British Columbia, V8W 2Y2, Canada
- Department of Computer Science, University of Victoria, Victoria, British Columbia, V8W 3P6, Canada
| |
Collapse
|
3
|
Tolu‐Akinnawo OZ, Ezekwueme F, Omolayo O, Batheja S, Awoyemi T. Advancements in Artificial Intelligence in Noninvasive Cardiac Imaging: A Comprehensive Review. Clin Cardiol 2025; 48:e70087. [PMID: 39871619 PMCID: PMC11772728 DOI: 10.1002/clc.70087] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/13/2024] [Accepted: 01/06/2025] [Indexed: 01/29/2025] Open
Abstract
BACKGROUND Technological advancements in artificial intelligence (AI) are redefining cardiac imaging by providing advanced tools for analyzing complex health data. AI is increasingly applied across various imaging modalities, including echocardiography, magnetic resonance imaging (MRI), computed tomography (CT), and nuclear imaging, to enhance diagnostic workflows and improve patient outcomes. HYPOTHESIS Integrating AI into cardiac imaging enhances image quality, accelerates processing times, and improves diagnostic accuracy, enabling timely and personalized interventions that lead to better health outcomes. METHODS A comprehensive literature review was conducted to examine the impact of machine learning and deep learning algorithms on diagnostic accuracy, the detection of subtle patterns and anomalies, and key challenges such as data quality, patient safety, and regulatory barriers. RESULTS Findings indicate that AI integration in cardiac imaging enhances image quality, reduces processing times, and improves diagnostic precision, contributing to better clinical decision-making. Emerging machine learning techniques demonstrate the ability to identify subtle cardiac abnormalities that traditional methods may overlook. However, significant challenges persist, including data standardization, regulatory compliance, and patient safety concerns. CONCLUSIONS AI holds transformative potential in cardiac imaging, significantly advancing diagnosis and patient outcomes. Overcoming barriers to implementation will require ongoing collaboration among clinicians, researchers, and regulatory bodies. Further research is essential to ensure the safe, ethical, and effective integration of AI in cardiology, supporting its broader application to improve cardiovascular health.
Collapse
Affiliation(s)
| | - Francis Ezekwueme
- Department of Internal MedicineUniversity of Pittsburgh Medical CenterMcKeesportPennsylvaniaUSA
| | - Olukunle Omolayo
- Department of Internal MedicineLugansk State Medical UniversityLuganskUkraine
| | - Sasha Batheja
- Department of Internal MedicineGovernment Medical CollegePatialaPunjabIndia
| | - Toluwalase Awoyemi
- Department of Internal MedicineFeinberg School of Medicine, Northwestern UniversityChicagoIllinoisUSA
| |
Collapse
|
4
|
Bernardes RC, Botina LL, Ribas A, Soares JM, Martins GF. Artificial intelligence-driven tool for spectral analysis: identifying pesticide contamination in bees from reflectance profiling. JOURNAL OF HAZARDOUS MATERIALS 2024; 480:136425. [PMID: 39547034 DOI: 10.1016/j.jhazmat.2024.136425] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/18/2024] [Revised: 10/21/2024] [Accepted: 11/05/2024] [Indexed: 11/17/2024]
Abstract
Pesticide poisoning constantly threatens bees as they forage for resources in pesticide-treated crops. This poisoning requires thorough investigation to identify its causes, underscoring the importance of reliable pesticide detection methods for bee monitoring. Infrared spectroscopy provides reflectance data across hundreds of spectral bands (hyperspectral reflectance), presumably enabling the efficient classification of pesticide contamination in bee carcasses using artificial intelligence (AI) models, such as machine learning. In this study, bee contamination by commercial formulations of three insecticides-dimethoate (organophosphate), fipronil (phenylpyrazole), and imidacloprid (neonicotinoid)-as well as glyphosate, the most widely used herbicide globally, was detected using machine learning models. These models classified the hyperspectral reflectance profiles of the body surfaces of contaminated bees. The best-performing model, the linear discriminant analysis, achieved 98 % accuracy in discriminating contamination across species Apis mellifera, Melipona mondury, and Partamona helleri, with prediction speeds of 0.27 s. Our pioneering study introduced an effective method for discerning multiple classes of bees contaminated with pesticides using hyperspectral reflectance. An AI-driven spectral data analysis tool (https://github.com/bernardesrodrigoc/MACSS) was developed for the purpose of identifying and characterizing new samples through their spectral characteristics. This platform aids efforts to monitor and conserve bee populations and holds potential importance in environmental monitoring, agricultural research, and industrial quality control.
Collapse
Affiliation(s)
| | - Lorena Lisbetd Botina
- Departamento de Biologia Geral, Universidade Federal de Viçosa, Viçosa, MG 36570-900, Brazil
| | - Andreza Ribas
- Departamento de Entomologia, Universidade Federal de Viçosa, Viçosa, MG 36570-900, Brazil
| | - Júlia Martins Soares
- Departamento de Agronomia, Universidade Federal de Viçosa, Viçosa, MG 36570-900, Brazil
| | | |
Collapse
|
5
|
Daru BH. Predicting undetected native vascular plant diversity at a global scale. Proc Natl Acad Sci U S A 2024; 121:e2319989121. [PMID: 39133854 PMCID: PMC11348117 DOI: 10.1073/pnas.2319989121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2023] [Accepted: 06/28/2024] [Indexed: 08/29/2024] Open
Abstract
Vascular plants are diverse and a major component of terrestrial ecosystems, yet their geographic distributions remain incomplete. Here, I present a global database of vascular plant distributions by integrating species distribution models calibrated to species' dispersal ability and natural habitats to predict native range maps for 201,681 vascular plant species into unsurveyed areas. Using these maps, I uncover unique patterns of native vascular plant diversity, endemism, and phylogenetic diversity revealing hotspots in underdocumented biodiversity-rich regions. These hotspots, based on detailed species-level maps, show a pronounced latitudinal gradient, strongly supporting the theory of increasing diversity toward the equator. I trained random forest models to extrapolate diversity patterns under unbiased global sampling and identify overlaps with modeled estimations but unveiled cryptic hotspots that were not captured by modeled estimations. Only 29% to 36% of extrapolated plant hotspots are inside protected areas, leaving more than 60% outside and vulnerable. However, the unprotected hotspots harbor species with unique attributes that make them good candidates for conservation prioritization.
Collapse
|
6
|
Shen L, LaRue E, Fei S, Zhang H. Spatial prediction of plant invasion using a hybrid of machine learning and geostatistical method. Ecol Evol 2024; 14:e11605. [PMID: 38932949 PMCID: PMC11199124 DOI: 10.1002/ece3.11605] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2023] [Revised: 05/30/2024] [Accepted: 06/05/2024] [Indexed: 06/28/2024] Open
Abstract
Modeling ecological patterns and processes often involve large-scale and complex high-dimensional spatial data. Due to the nonlinearity and multicollinearity of ecological data, traditional geostatistical methods have faced great challenges in model accuracy. As machine learning has increased our ability to construct models on big data, the main focus of the study is to propose the use of statistical models that hybridize machine learning and spatial interpolation methods to cope with increasingly large-scale and complex ecological data. Here, two machine learning algorithms, boosted regression tree (BRT) and least absolute shrinkage and selection operator (LASSO), were combined with ordinary kriging (OK) to model plant invasions across the eastern United States. The accuracies of the hybrid models and conventional models were evaluated by 10-fold cross-validation. Based on an invasive plants dataset of 15 ecoregions across the eastern United States, the results showed that the hybrid algorithms were significantly better at predicting plant invasion when compared to commonly used algorithms in terms of RMSE and paired-samples t-test (with the p-value < .0001). Besides, the additional aspect of the combined algorithms is to have the ability to select influential variables associated with the establishment of invasive cover, which cannot be achieved by conventional geostatistics. Higher accuracy in the prediction of large-scale biological invasions improves our understanding of the ecological conditions that lead to the establishment and spread of plants into novel habitats across spatial scales. The results demonstrate the effectiveness and robustness of the hybrid BRTOK and LASOK that can be used to analyze large-scale and high-dimensional spatial datasets, and it has offered an optional source of models for spatial interpolation of ecology properties. It will also provide a better basis for management decisions in early-detection modeling of invasive species.
Collapse
Affiliation(s)
- Liang Shen
- Department of StatisticsQingdao University of TechnologyQingdaoChina
| | - Elizabeth LaRue
- Department of Biological SciencesUniversity of Texas at EI PasoEI PasoTexasUSA
| | - Songlin Fei
- Department of Forestry and Natural ResourcesPurdue UniversityWest LafayetteIndianaUSA
| | - Hao Zhang
- Department of Statistics and ProbabilityMichigan State UniversityEast LansingMichiganUSA
| |
Collapse
|
7
|
Hooda S, Mondal P. Predictive modeling of plastic pyrolysis process for the evaluation of activation energy: Explainable artificial intelligence based comprehensive insights. JOURNAL OF ENVIRONMENTAL MANAGEMENT 2024; 360:121189. [PMID: 38759553 DOI: 10.1016/j.jenvman.2024.121189] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/08/2024] [Revised: 04/30/2024] [Accepted: 05/13/2024] [Indexed: 05/19/2024]
Abstract
Pyrolysis, a thermochemical conversion approach of transforming plastic waste to energy has tremendous potential to manage the exponentially increasing plastic waste. However, understanding the process kinetics is fundamental to engineering a sustainable process. Conventional analysis techniques do not provide insights into the influence of characteristics of feedstock on the process kinetics. Present study exemplifies the efficacy of using machine learning for predictive modeling of pyrolysis of waste plastics to understand the complexities of the interrelations of predictor variables and their influence on activation energy. The activation energy for pyrolysis of waste plastics was evaluated using machine learning models namely Random Forest, XGBoost, CatBoost, and AdaBoost regression models. Feature selection based on the multicollinearity of data and hyperparameter tuning of the models utilizing RandomizedSearchCV was conducted. Random forest model outperformed the other models with coefficient of determination (R2) value of 0.941, root mean square error (RMSE) value of 14.69 and mean absolute error (MAE) value of 8.66 for the testing dataset. The explainable artificial intelligence-based feature importance plot and the summary plot of the shapely additive explanations projected fixed carbon content, ash content, conversion value, and carbon content as significant parameters of the model in the order; fixed carbon > carbon > ash content > degree of conversion. Present study highlighted the potential of machine learning as a powerful tool to understand the influence of the characteristics of plastic waste and the degree of conversion on the activation energy of a process that is essential for designing the large-scale operations and future scale-up of the process.
Collapse
Affiliation(s)
- Sanjeevani Hooda
- Department of Chemical Engineering, Indian Institute of Technology Roorkee, Roorkee, Uttarakhand, 247667, India
| | - Prasenjit Mondal
- Department of Chemical Engineering, Indian Institute of Technology Roorkee, Roorkee, Uttarakhand, 247667, India.
| |
Collapse
|
8
|
Borteiro C, Laufer G, Gobel N, Arleo M, Kolenc F, Cortizas S, Barrasso DA, de Sá RO, Soutullo A, Ubilla M, Martínez-Debat C. Widespread occurrence of the amphibian chytrid panzootic lineage in Uruguay is constrained by climate. DISEASES OF AQUATIC ORGANISMS 2024; 158:123-132. [PMID: 38813853 DOI: 10.3354/dao03783] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/31/2024]
Abstract
The amphibian chytrid fungus Batrachochytrium dendrobatidis (Bd) causes chytridiomycosis, a disease among the main causes of amphibian declines worldwide. However, Bd studies on Neotropical amphibians from temperate areas are scarce. We present a comprehensive survey of Bd in Uruguay, in temperate central eastern South America, carried out between 2006 and 2014. Skin swabs of 535 specimens of 21 native and exotic frogs were tested by PCR. We used individual-level data to examine the relationship between infection, climatic variables, and their effects on body condition and the number of prey items found in stomach contents. Infection was widespread in free-ranging anurans with an overall prevalence of 41.9%, detected in 15 native species, wild American bullfrogs Aquarana catesbeiana, and captive specimens of Ceratophrys ornata and Xenopus laevis. Three haplotypes of the Bd ITS region were identified in native amphibians, all belonging to the global panzootic lineage (BdGPL), of which only one was present in exotic hosts. Despite high infection frequencies in different anurans, we found no evidence of morbidity or mortality attributable to chytridiomycosis, and we observed no discernible impact on body condition or consumed prey. Climatic conditions at the time of our surveys suggested that the chance of infection is associated with monthly mean temperature, mean humidity, and total precipitation. Temperatures below 21°C combined with moderate humidity and pronounced rainfall may increase the likelihood of infection. Multiple haplotypes of BdGPL combined with high frequencies of infection suggest an enzootic pattern in native species, underscoring the need for continued monitoring.
Collapse
Affiliation(s)
- Claudio Borteiro
- Sección Herpetología, Museo Nacional de Historia Natural, Montevideo 11800, Uruguay
| | - Gabriel Laufer
- Área Biodiversidad y Conservación, Museo Nacional de Historia Natural, Montevideo 11800, Uruguay
- Vida Silvestre Uruguay, Montevideo 11100, Uruguay
| | - Noelia Gobel
- Área Biodiversidad y Conservación, Museo Nacional de Historia Natural, Montevideo 11800, Uruguay
- Vida Silvestre Uruguay, Montevideo 11100, Uruguay
| | - Mailén Arleo
- Sección Bioquímica, Departamento de Biología, Facultad de Ciencias, Universidad de la República, Montevideo 11400, Uruguay
| | - Francisco Kolenc
- Sección Herpetología, Museo Nacional de Historia Natural, Montevideo 11800, Uruguay
| | - Sofía Cortizas
- Grupo de Agroecología, Sustentabilidad y Medio Ambiente, Universidad Tecnológica del Uruguay, Durazno 97000, Uruguay
| | - Diego A Barrasso
- Instituto de Diversidad y Evolución Austral (IDEAus-CONICET), and Facultad de Ciencias Naturales y Ciencias de la Salud, Universidad Nacional de la Patagonia 'San Juan Bosco' (UNPSJB), Puerto Madryn 9120, Chubut, Argentina
| | - Rafael O de Sá
- Department of Biology, University of Richmond, Richmond, Virginia 23173, USA
| | - Alvaro Soutullo
- Departamento de Ecología y Gestión Ambiental, Centro Universitario Regional del Este, Punta del Este 20100, Universidad de la República, Uruguay
| | - Martin Ubilla
- Departamento de Paleontología-ICG, Facultad de Ciencias, Universidad de la República, Montevideo 11400, Uruguay
| | - Claudio Martínez-Debat
- Sección Bioquímica, Departamento de Biología, Facultad de Ciencias, Universidad de la República, Montevideo 11400, Uruguay
| |
Collapse
|
9
|
Subbarayan S, Thiyagarajan S, Karuppannan S, Panneerselvam B. Enhancing groundwater vulnerability assessment: Comparative study of three machine learning models and five classification schemes for Cuddalore district. ENVIRONMENTAL RESEARCH 2024; 242:117769. [PMID: 38029825 DOI: 10.1016/j.envres.2023.117769] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/18/2023] [Revised: 10/25/2023] [Accepted: 11/22/2023] [Indexed: 12/01/2023]
Abstract
Most of the groundwater vulnerability assessment methods using machine learning are binary classification. This study attempts multi-class classification models to map the groundwater vulnerability against Nitrate contamination. Further, the significance of the number of classes used in the multi-class classification is studied by considering three and five classes. Three machine learning models, namely Random Forest, Extreme Gradient Boosting and CART, with two classification schemes, were developed for the present study. The parameters used in the conventional DRASTIC method and with an additional parameter, Landuse, have been employed for the study. Evaluation metrics such as Accuracy, Kappa, Positive Predictive Value, Negative Predictive Value, and Area Under the Curve of the Receiver Operating Characteristic (AUC-ROC) were compared among all six models to select the optimal one. Based on the model evaluation metrics and consistent distribution of area among the classes Random Forest model with a three-class classification with an AUC of 0.95 is considered optimum for the selected objective. This study highlights the importance of the data classification process and the selection of the number of classes for ML model prediction in assessing groundwater vulnerability. Leveraging the effectiveness of the Geographic Information system and advanced machine learning techniques, the proposed approach offers valuable insights for enhanced groundwater management and contamination mitigation strategies.
Collapse
Affiliation(s)
- Saravanan Subbarayan
- Department of Civil Engineering, National Institute of Technology, Tiruchirappalli, India.
| | - Saranya Thiyagarajan
- Department of Civil Engineering, National Institute of Technology, Tiruchirappalli, India.
| | - Shankar Karuppannan
- Department of Applied Geology, School of Applied Natural Sciences, Adama Science and Technology University, Adama, Ethiopia; Department of Research Analytics, Saveetha Institute of Medical and Technical Sciences (SIMATS), Saveetha University, Chennai, India.
| | - Balamurugan Panneerselvam
- Center of Excellence in Interdisciplinary Research for Sustainable Development, Faculty of Engineering, Chulalongkorn University, Thailand.
| |
Collapse
|
10
|
Gaida M, Cain CN, Synovec RE, Focant JF, Stefanuto PH. Tile-Based Random Forest Analysis for Analyte Discovery in Balanced and Unbalanced GC × GC-TOFMS Data Sets. Anal Chem 2023; 95:13519-13527. [PMID: 37647642 DOI: 10.1021/acs.analchem.3c01872] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/01/2023]
Abstract
In this study, we introduce a new nontargeted tile-based supervised analysis method that combines the four-grid tiling scheme previously established for the Fisher ratio (F-ratio) analysis (FRA) with the estimation of tile hit importance using the machine learning (ML) algorithm Random Forest (RF). This approach is termed tile-based RF analysis. As opposed to the standard tile-based F-ratio analysis, the RF approach can be extended to the analysis of unbalanced data sets, i.e., different numbers of samples per class. Tile-based RF computes out-of-bag (oob) tile hit importance estimates for every summed chromatographic signal within each tile on a per-mass channel basis (m/z). These estimates are then used to rank tile hits in a descending order of importance. In the present investigation, the RF approach was applied for a two-class comparison of stool samples collected from omnivore (O) subjects and stored using two different storage conditions: liquid (Liq) and lyophilized (Lyo). Two final hit lists were generated using balanced (8 vs Eight comparison) and unbalanced (8 vs Nine comparison) data sets and compared to the hit list generated by the standard F-ratio analysis. Similar class-distinguishing analytes (p < 0.01) were discovered by both methods. However, while the FRA discovered a more comprehensive hit list (65 hits), the RF approach strictly discovered hits (31 hits for the balanced data set comparison and 29 hits for the unbalanced data set comparison) with concentration ratios, [OLiq]/[OLyo], greater than 2 (or less than 0.5). This difference is attributed to the more stringent feature selection process used by the RF algorithm. Moreover, our findings suggest that the RF approach is a promising method for identifying class-distinguishing analytes in settings characterized by both high between-class variance and high within-class variance, making it an advantageous method in the study of complex biological matrices.
Collapse
Affiliation(s)
- Meriem Gaida
- Organic and Biological Analytical Chemistry Group, Molecular Systems Research Unit, University of Liège, 4000 Liège, Belgium
| | - Caitlin N Cain
- Department of Chemistry, University of Washington, Seattle, Washington 98195-1700, United States
| | - Robert E Synovec
- Department of Chemistry, University of Washington, Seattle, Washington 98195-1700, United States
| | - Jean-François Focant
- Organic and Biological Analytical Chemistry Group, Molecular Systems Research Unit, University of Liège, 4000 Liège, Belgium
| | - Pierre-Hugues Stefanuto
- Organic and Biological Analytical Chemistry Group, Molecular Systems Research Unit, University of Liège, 4000 Liège, Belgium
| |
Collapse
|
11
|
Chafai N, Hayah I, Houaga I, Badaoui B. A review of machine learning models applied to genomic prediction in animal breeding. Front Genet 2023; 14:1150596. [PMID: 37745853 PMCID: PMC10516561 DOI: 10.3389/fgene.2023.1150596] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2023] [Accepted: 08/22/2023] [Indexed: 09/26/2023] Open
Abstract
The advent of modern genotyping technologies has revolutionized genomic selection in animal breeding. Large marker datasets have shown several drawbacks for traditional genomic prediction methods in terms of flexibility, accuracy, and computational power. Recently, the application of machine learning models in animal breeding has gained a lot of interest due to their tremendous flexibility and their ability to capture patterns in large noisy datasets. Here, we present a general overview of a handful of machine learning algorithms and their application in genomic prediction to provide a meta-picture of their performance in genomic estimated breeding values estimation, genotype imputation, and feature selection. Finally, we discuss a potential adoption of machine learning models in genomic prediction in developing countries. The results of the reviewed studies showed that machine learning models have indeed performed well in fitting large noisy data sets and modeling minor nonadditive effects in some of the studies. However, sometimes conventional methods outperformed machine learning models, which confirms that there's no universal method for genomic prediction. In summary, machine learning models have great potential for extracting patterns from single nucleotide polymorphism datasets. Nonetheless, the level of their adoption in animal breeding is still low due to data limitations, complex genetic interactions, a lack of standardization and reproducibility, and the lack of interpretability of machine learning models when trained with biological data. Consequently, there is no remarkable outperformance of machine learning methods compared to traditional methods in genomic prediction. Therefore, more research should be conducted to discover new insights that could enhance livestock breeding programs.
Collapse
Affiliation(s)
- Narjice Chafai
- Laboratory of Biodiversity, Ecology, and Genome, Department of Biology, Faculty of Sciences, Mohammed V University in Rabat, Rabat, Morocco
| | - Ichrak Hayah
- Laboratory of Biodiversity, Ecology, and Genome, Department of Biology, Faculty of Sciences, Mohammed V University in Rabat, Rabat, Morocco
| | - Isidore Houaga
- Centre for Tropical Livestock Genetics and Health, The Roslin Institute, Royal (Dick) School of Veterinary Medicine, The University of Edinburgh, Edinburgh, United Kingdom
- The Roslin Institute, Royal (Dick) School of Veterinary Studies, University of Edinburgh, Edinburgh, United Kingdom
| | - Bouabid Badaoui
- Laboratory of Biodiversity, Ecology, and Genome, Department of Biology, Faculty of Sciences, Mohammed V University in Rabat, Rabat, Morocco
- African Sustainable Agriculture Research Institute (ASARI), Mohammed VI Polytechnic University (UM6P), Laayoune, Morocco
| |
Collapse
|
12
|
Hekimoglu O, Elverici C, Kuyucu AC. Predicting climate-driven distribution shifts in Hyalomma marginatum (Ixodidae). Parasitology 2023; 150:883-893. [PMID: 37519234 PMCID: PMC10577666 DOI: 10.1017/s0031182023000689] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2023] [Revised: 06/02/2023] [Accepted: 07/14/2023] [Indexed: 08/01/2023]
Abstract
Hyalomma marginatum is an important tick species which is the main vector of Crimean–Congo haemorrhagic fever and spotted fever. The species is predominantly distributed in parts of southern Europe, North Africa and West Asia. However, due to ongoing climate change and increasing reports of H. marginatum in central and northern Europe, the expansion of this range poses a potential future risk. In this study, an ecological niche modelling approach to model the current and future climatic suitability of H. marginatum was followed. Using high-resolution climatic variables from the Chelsa dataset and an updated list of locations for H. marginatum, ecological niche models were constructed under current environmental conditions using MaxEnt for both current conditions and future projections under the ssp370 and ssp585 scenarios. Models show that the climatically suitable region for H. marginatum matches the current distributional area in the Mediterranean basin and West Asia. When applied to future projections, the models suggest a considerable expansion of H. marginatum's range in the north in Europe as a result of rising temperatures. However, a decline in central Anatolia is also predicted, potentially due to the exacerbation of drought conditions in that region.
Collapse
Affiliation(s)
| | - Can Elverici
- Biology Department, Hacettepe University, Ankara, Turkey
- Biodiversity Institute, University of Kansas, Lawrence, KS, USA
| | | |
Collapse
|
13
|
Sanyal S, Sarkar S, Chakrabarty M. Time series analysis of groundwater quality at selected sites of Purba and Paschim Burdwan, West Bengal, India. ENVIRONMENTAL MONITORING AND ASSESSMENT 2023; 195:1039. [PMID: 37572142 DOI: 10.1007/s10661-023-11627-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/21/2023] [Accepted: 07/22/2023] [Indexed: 08/14/2023]
Abstract
The Water Quality Index (WQI) is used to monitor the health and usability of a water body. In this study, we aimed to construct time series prediction models using groundwater WQI (GW-WQI) at four sites: IISCO-Asansol, Durgapur Town, Burdwan University, and Burdwan Station. While statistical spatio-temporal analysis has been reported earlier, no time series analysis of the data or predictive modelling has been done. Pre-monsoon and post-monsoon physico-chemical data from 2010 to 2022 were obtained from the West Bengal Pollution Control Board website to calculate the GW-WQI. Prediction modelling was performed using R 4.1.3 software. Best fit forecast models were selected to predict future trends of GW-WQI with 80% of the data. Subsequently, the models were validated using R-squared, root mean square error (RMSE), mean absolute error (MAE), maximum absolute percentage error (MAPE), and Thiel's U for the model using 20% of the data. Our results show that GW-WQI was good in pre-monsoon but unfit for drinking in post-monsoon in IISCO-Asansol, Durgapur Town, Burdwan University, and Burdwan Station. Arsenic, fluoride, and mercury were the major contaminants resulting in poor GW-WQI. Seasonal ARIMA was the best model for Burdwan University and IISCO-Asansol, ETS for Durgapur Station, and BaggedARIMA for Burdwan Station. The forecast model for Durgapur and Burdwan Station predicted a sharp increase until 2027 but was fluctuating for IISCO-Asansol and Burdwan University. Thus, GW-WQI is a major problem in the industrial belt of West Bengal that is likely to remain high or worsen in the future.
Collapse
Affiliation(s)
- Sanghamitra Sanyal
- Department of Conservation Biology, Durgapur Government College, Kazi Nazrul University, Durgapur, West Bengal, 713214, India
| | - Sanchari Sarkar
- Department of Conservation Biology, Durgapur Government College, Kazi Nazrul University, Durgapur, West Bengal, 713214, India
| | - Moitreyee Chakrabarty
- Department of Conservation Biology, Durgapur Government College, Kazi Nazrul University, Durgapur, West Bengal, 713214, India.
| |
Collapse
|
14
|
Trzepieciński T, Najm SM, Ibrahim OM, Kowalik M. Analysis of the Frictional Performance of AW-5251 Aluminium Alloy Sheets Using the Random Forest Machine Learning Algorithm and Multilayer Perceptron. MATERIALS (BASEL, SWITZERLAND) 2023; 16:5207. [PMID: 37569911 PMCID: PMC10420024 DOI: 10.3390/ma16155207] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/05/2023] [Revised: 07/18/2023] [Accepted: 07/20/2023] [Indexed: 08/13/2023]
Abstract
This paper is devoted to the determination of the coefficient of friction (COF) in the drawbead region in metal forming processes. As the test material, AW-5251 aluminium alloys sheets fabricated under various hardening conditions (AW-5251-O, AW-5251-H14, AW-5251-H16 and AW-5251H22) were used. The sheets were tested using a drawbead simulator with different countersample roughness and different orientations of the specimens in relation to the sheet rolling direction. A drawbead simulator was designed to model the friction conditions when the sheet metal passed through the drawbead in sheet metal forming. The experimental tests were carried out under conditions of dry friction and lubrication of the sheet metal surfaces with three lubricants: machine oil, hydraulic oil, and engine oil. Based on the results of the experimental tests, the value of the COF was determined. The Random Forest (RF) machine learning algorithm and artificial neural networks (ANNs) were used to identify the parameters affecting the COF. The R statistical package software version 4.1.0 was used for running the RF model and neural network. The relative importance of the inputs was analysed using 12 different activation functions in ANNs and nine different loss functions in the RF. Based on the experimental tests, it was concluded that the COF for samples cut along the sheet rolling direction was greater than for samples cut in the transverse direction. However, the COF's most relevant input was oil viscosity (0.59), followed by the average counter sample roughness Ra (0.30) and the yield stress Rp0.2 and strength coefficient K (0.05 and 0.06, respectively). The hard sigmoid activation function had the poorest R2 (0.25) and nRMSE (0.30). The ideal run was found after training and testing the RF model (R2 = 0.90 ± 0.028). Ra values greater than 1.1 and Rp0.2 values between 105 and 190 resulted in a decreased COF. The COF values dropped to 9-35 for viscosity and 105-190 for Rp0.2, with a gap between 110 and 130 when the oil viscosity was added. The COF was low when the oil viscosity was 9-35, and the Ra was 0.95-1.25. The interaction between K and the other inputs, which produces a relatively limited range of reduced COF values, was the least relevant. The COF was reduced by setting the Rp0.2 between 105 and 190, the Ra between 0.95 and 1.25, and the oil viscosity between 9 and 35.
Collapse
Affiliation(s)
- Tomasz Trzepieciński
- Department of Manufacturing Processes and Production Engineering, Faculty of Mechanical Engineering and Aeronautics, Rzeszow University of Technology, al. Powst. Warszawy 8, 35-959 Rzeszów, Poland
| | - Sherwan Mohammed Najm
- Kirkuk Technical Institute, Northern Technical University, 36001 Kirkuk, Iraq;
- Department of Manufacturing Science and Engineering, Budapest University of Technology and Economics, Műegyetemrkp 3, H-1111 Budapest, Hungary
| | - Omar Maghawry Ibrahim
- Plant Production Department, Arid Land Cultivation Research Institute, City of Scientific Research and Technological Applications SRTA-City, Borg Al-Arab 21934, Egypt;
| | - Marek Kowalik
- Faculty of Mechanical Engineering, Kazimierz Pulaski University of Technology and Humanities in Radom, 54 Stasieckiego Street, 26-600 Radom, Poland;
| |
Collapse
|
15
|
Işık YE, Aydın Z. Comparative analysis of machine learning approaches for predicting respiratory virus infection and symptom severity. PeerJ 2023; 11:e15552. [PMID: 37404475 PMCID: PMC10317018 DOI: 10.7717/peerj.15552] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2022] [Accepted: 05/23/2023] [Indexed: 07/06/2023] Open
Abstract
Respiratory diseases are among the major health problems causing a burden on hospitals. Diagnosis of infection and rapid prediction of severity without time-consuming clinical tests could be beneficial in preventing the spread and progression of the disease, especially in countries where health systems remain incapable. Personalized medicine studies involving statistics and computer technologies could help to address this need. In addition to individual studies, competitions are also held such as Dialogue for Reverse Engineering Assessment and Methods (DREAM) challenge which is a community-driven organization with a mission to research biology, bioinformatics, and biomedicine. One of these competitions was the Respiratory Viral DREAM Challenge, which aimed to develop early predictive biomarkers for respiratory virus infections. These efforts are promising, however, the prediction performance of the computational methods developed for detecting respiratory diseases still has room for improvement. In this study, we focused on improving the performance of predicting the infection and symptom severity of individuals infected with various respiratory viruses using gene expression data collected before and after exposure. The publicly available gene expression dataset in the Gene Expression Omnibus, named GSE73072, containing samples exposed to four respiratory viruses (H1N1, H3N2, human rhinovirus (HRV), and respiratory syncytial virus (RSV)) was used as input data. Various preprocessing methods and machine learning algorithms were implemented and compared to achieve the best prediction performance. The experimental results showed that the proposed approaches obtained a prediction performance of 0.9746 area under the precision-recall curve (AUPRC) for infection (i.e., shedding) prediction (SC-1), 0.9182 AUPRC for symptom class prediction (SC-2), and 0.6733 Pearson correlation for symptom score prediction (SC-3) by outperforming the best leaderboard scores of Respiratory Viral DREAM Challenge (a 4.48% improvement for SC-1, a 13.68% improvement for SC-2, and a 13.98% improvement for SC-3). Additionally, over-representation analysis (ORA), which is a statistical method for objectively determining whether certain genes are more prevalent in pre-defined sets such as pathways, was applied using the most significant genes selected by feature selection methods. The results show that pathways associated with the 'adaptive immune system' and 'immune disease' are strongly linked to pre-infection and symptom development. These findings contribute to our knowledge about predicting respiratory infections and are expected to facilitate the development of future studies that concentrate on predicting not only infections but also the associated symptoms.
Collapse
Affiliation(s)
- Yunus Emre Işık
- Department of Management Information Systems, Sivas Cumhuriyet University, Sivas, Turkey
| | - Zafer Aydın
- Department of Computer Engineering, Abdullah Gül University, Kayseri, Turkey
| |
Collapse
|
16
|
Tang Z, You TT, Li YF, Tang ZX, Bao MQ, Dong G, Xu ZR, Wang P, Zhao FJ. Rapid identification of high and low cadmium (Cd) accumulating rice cultivars using machine learning models with molecular markers and soil Cd levels as input data. ENVIRONMENTAL POLLUTION (BARKING, ESSEX : 1987) 2023; 326:121501. [PMID: 36963454 DOI: 10.1016/j.envpol.2023.121501] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/29/2023] [Revised: 02/28/2023] [Accepted: 03/21/2023] [Indexed: 06/18/2023]
Abstract
Excessive accumulation of cadmium (Cd) in rice grains threatens food safety and human health. Growing low Cd accumulating rice cultivars is an effective approach to produce low-Cd rice. However, field screening of low-Cd rice cultivars is laborious, time-consuming, and subjected to the influence of environment × genotype interactions. In the present study, we investigated whether machine learning-based methods incorporating genotype and soil Cd concentration can identify high and low-Cd accumulating rice cultivars. One hundred and sixty-seven locally adapted high-yielding rice cultivars were grown in three fields with different soil Cd levels and genotyped using four molecular markers related to grain Cd accumulation. We identified sixteen cultivars as stable low-Cd accumulators with grain Cd concentrations below the 0.2 mg kg-1 food safety limit in all three paddy fields. In addition, we developed eight machine learning-based models to predict low- and high-Cd accumulating rice cultivars with genotypes and soil Cd levels as input data. The optimized model classifies low- or high-Cd cultivars (i.e., the grain Cd concentration below or above 0.2 mg kg-1) with an overall accuracy of 76%. These results indicate that machine learning-based classification models constructed with molecular markers and soil Cd levels can quickly and accurately identify the high- and low-Cd accumulating rice cultivars.
Collapse
Affiliation(s)
- Zhong Tang
- State Key Laboratory of Crop Genetics & Germplasm Enhancement and Utilization, College of Resources and Environmental Sciences, Nanjing Agricultural University, Nanjing, 210095, China
| | - Ting-Ting You
- State Key Laboratory of Crop Genetics & Germplasm Enhancement and Utilization, College of Resources and Environmental Sciences, Nanjing Agricultural University, Nanjing, 210095, China
| | - Ya-Fang Li
- State Key Laboratory of Crop Genetics & Germplasm Enhancement and Utilization, College of Resources and Environmental Sciences, Nanjing Agricultural University, Nanjing, 210095, China
| | - Zhi-Xian Tang
- State Key Laboratory of Crop Genetics & Germplasm Enhancement and Utilization, College of Resources and Environmental Sciences, Nanjing Agricultural University, Nanjing, 210095, China
| | - Miao-Qing Bao
- State Key Laboratory of Crop Genetics & Germplasm Enhancement and Utilization, College of Resources and Environmental Sciences, Nanjing Agricultural University, Nanjing, 210095, China
| | - Ge Dong
- State Key Laboratory of Crop Genetics & Germplasm Enhancement and Utilization, College of Resources and Environmental Sciences, Nanjing Agricultural University, Nanjing, 210095, China
| | - Zhong-Rui Xu
- State Key Laboratory of Crop Genetics & Germplasm Enhancement and Utilization, College of Resources and Environmental Sciences, Nanjing Agricultural University, Nanjing, 210095, China
| | - Peng Wang
- State Key Laboratory of Crop Genetics & Germplasm Enhancement and Utilization, College of Resources and Environmental Sciences, Nanjing Agricultural University, Nanjing, 210095, China; Centre for Agriculture and Health, Academy for Advanced Interdisciplinary Studies, Nanjing Agricultural University, Nanjing, 210095, China.
| | - Fang-Jie Zhao
- State Key Laboratory of Crop Genetics & Germplasm Enhancement and Utilization, College of Resources and Environmental Sciences, Nanjing Agricultural University, Nanjing, 210095, China
| |
Collapse
|
17
|
Robillard AJ, Trizna MG, Ruiz‐Tafur M, Dávila Panduro EL, de Santana CD, White AE, Dikow RB, Deichmann JL. Application of a deep learning image classifier for identification of Amazonian fishes. Ecol Evol 2023; 13:e9987. [PMID: 37143991 PMCID: PMC10151603 DOI: 10.1002/ece3.9987] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2022] [Revised: 03/10/2023] [Accepted: 03/24/2023] [Indexed: 05/06/2023] Open
Abstract
Given the sharp increase in agricultural and infrastructure development and the paucity of widespread data available to support conservation management decisions, a more rapid and accurate tool for identifying fish fauna in the world's largest freshwater ecosystem, the Amazon, is needed. Current strategies for identification of freshwater fishes require high levels of training and taxonomic expertise for morphological identification or genetic testing for species recognition at a molecular level. To overcome these challenges, we built an image masking model (U-Net) and a convolutional neural net (CNN) to classify Amazonian fish in photographs. Fish used to generate training data were collected and photographed in tributaries in seasonally flooded forests of the upper Morona River valley in Loreto, Peru in 2018 and 2019. Species identifications in the training images (n = 3068) were verified by expert ichthyologists. These images were supplemented with photographs taken of additional Amazonian fish specimens housed in the ichthyological collection of the Smithsonian's National Museum of Natural History. We generated a CNN model that identified 33 genera of fishes with a mean accuracy of 97.9%. Wider availability of accurate freshwater fish image recognition tools, such as the one described here, will enable fishermen, local communities, and citizen scientists to more effectively participate in collecting and sharing data from their territories to inform policy and management decisions that impact them directly.
Collapse
Affiliation(s)
- Alexander J. Robillard
- Data Science LabOffice of the Chief Information Officer, Smithsonian InstitutionWashingtonDistrict of ColumbiaUSA
- Center for Conservation and SustainabilitySmithsonian National Zoo and Conservation Biology InstituteWashingtonDistrict of ColumbiaUSA
- Chesapeake Biological LaboratoryUniversity of Maryland Center for Environmental ScienceSolomonsMarylandUSA
| | - Michael G. Trizna
- Data Science LabOffice of the Chief Information Officer, Smithsonian InstitutionWashingtonDistrict of ColumbiaUSA
| | - Morgan Ruiz‐Tafur
- Center for Conservation and SustainabilitySmithsonian National Zoo and Conservation Biology InstituteWashingtonDistrict of ColumbiaUSA
- Laboratorio de Taxonomía de PecesInstituto de Investigaciones de la Amazonía Peruana (IIAP)San Juan BautistaPeru
| | - Edgard Leonardo Dávila Panduro
- Center for Conservation and SustainabilitySmithsonian National Zoo and Conservation Biology InstituteWashingtonDistrict of ColumbiaUSA
| | - C. David de Santana
- Division of Fishes, Department of Vertebrate Zoology, MRC 159, National Museum of Natural HistorySmithsonian InstitutionWashingtonDistrict of ColumbiaUSA
| | - Alexander E. White
- Data Science LabOffice of the Chief Information Officer, Smithsonian InstitutionWashingtonDistrict of ColumbiaUSA
| | - Rebecca B. Dikow
- Data Science LabOffice of the Chief Information Officer, Smithsonian InstitutionWashingtonDistrict of ColumbiaUSA
| | - Jessica L. Deichmann
- Center for Conservation and SustainabilitySmithsonian National Zoo and Conservation Biology InstituteWashingtonDistrict of ColumbiaUSA
- Working Land and Seascapes, Conservation CommonsSmithsonian InstitutionWashingtonDistrict of ColumbiaUSA
| |
Collapse
|
18
|
Volf G, Žutinić P, Gligora Udovič M, Kulaš A, Mustafić P. Describing and simulating phytoplankton of a small and shallow reservoir using decision trees and rule-based models. ENVIRONMENTAL MONITORING AND ASSESSMENT 2023; 195:508. [PMID: 36964248 DOI: 10.1007/s10661-023-11060-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/06/2022] [Accepted: 02/25/2023] [Indexed: 06/18/2023]
Abstract
Phytoplankton represents one of the most important biological components of primary production, trophic interactions, and circulation of organic matter in lakes and reservoirs. To contribute to the understanding of eutrophication processes and ecological status of the small, shallow Butoniga reservoir, a machine learning tool for induction of models in form of decision trees and rule-based models was applied on a dataset comprising physical, chemical, and biological variables measured at four stations. Two types of models were successfully elaborated, i.e., (1) model describing phytoplankton Phylum, which describes and connects phytoplankton Phylum with phytoplankton abundance and biomass, and (2) model simulating phytoplankton biomass according to environmental variables which could be used in management purposes. Such models and their presentation contribute to a better understanding of the Butoniga reservoir ecosystem functioning.
Collapse
Affiliation(s)
- Goran Volf
- Department of Hydraulic Engineering, Faculty of Civil Engineering, University of Rijeka, Radmile Matejčić 3, 51000, Rijeka, Croatia.
| | - Petar Žutinić
- Department of Biology, Faculty of Science, University of Zagreb, Rooseveltov trg 6, 10000, Zagreb, Croatia
| | - Marija Gligora Udovič
- Department of Biology, Faculty of Science, University of Zagreb, Rooseveltov trg 6, 10000, Zagreb, Croatia
| | - Antonija Kulaš
- Department of Biology, Faculty of Science, University of Zagreb, Rooseveltov trg 6, 10000, Zagreb, Croatia
| | - Perica Mustafić
- Department of Biology, Faculty of Science, University of Zagreb, Rooseveltov trg 6, 10000, Zagreb, Croatia
| |
Collapse
|
19
|
CNN_FunBar: Advanced Learning Technique for Fungi ITS Region Classification. Genes (Basel) 2023; 14:genes14030634. [PMID: 36980906 PMCID: PMC10048311 DOI: 10.3390/genes14030634] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2022] [Revised: 12/28/2022] [Accepted: 01/09/2023] [Indexed: 03/06/2023] Open
Abstract
Fungal species identification from metagenomic data is a highly challenging task. Internal Transcribed Spacer (ITS) region is a potential DNA marker for fungi taxonomy prediction. Computational approaches, especially deep learning algorithms, are highly efficient for better pattern recognition and classification of large datasets compared to in silico techniques such as BLAST and machine learning methods. Here in this study, we present CNN_FunBar, a convolutional neural network-based approach for the classification of fungi ITS sequences from UNITE+INSDC reference datasets. Effects of convolution kernel size, filter numbers, k-mer size, degree of diversity and category-wise frequency of ITS sequences on classification performances of CNN models have been assessed at all taxonomic levels (species, genus, family, order, class and phylum). It is observed that CNN models can produce >93% average accuracy for classifying ITS sequences from balanced datasets with 500 sequences per category and 6-mer frequency features at all levels. The comparative study has revealed that CNN_FunBar can outperform machine learning-based algorithms (SVM, KNN, Naïve-Bayes and Random Forest) as well as existing fungal taxonomy prediction software (funbarRF, Mothur, RDP Classifier and SINTAX). The present study will be helpful for fungal taxonomy classification using large metagenomic datasets.
Collapse
|
20
|
Cai L, Kreft H, Taylor A, Denelle P, Schrader J, Essl F, van Kleunen M, Pergl J, Pyšek P, Stein A, Winter M, Barcelona JF, Fuentes N, Karger DN, Kartesz J, Kuprijanov A, Nishino M, Nickrent D, Nowak A, Patzelt A, Pelser PB, Singh P, Wieringa JJ, Weigelt P. Global models and predictions of plant diversity based on advanced machine learning techniques. THE NEW PHYTOLOGIST 2023; 237:1432-1445. [PMID: 36375492 DOI: 10.1111/nph.18533] [Citation(s) in RCA: 32] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/13/2022] [Accepted: 09/29/2022] [Indexed: 06/16/2023]
Abstract
Despite the paramount role of plant diversity for ecosystem functioning, biogeochemical cycles, and human welfare, knowledge of its global distribution is still incomplete, hampering basic research and biodiversity conservation. Here, we used machine learning (random forests, extreme gradient boosting, and neural networks) and conventional statistical methods (generalized linear models and generalized additive models) to test environment-related hypotheses of broad-scale vascular plant diversity gradients and to model and predict species richness and phylogenetic richness worldwide. To this end, we used 830 regional plant inventories including c. 300 000 species and predictors of past and present environmental conditions. Machine learning showed a superior performance, explaining up to 80.9% of species richness and 83.3% of phylogenetic richness, illustrating the great potential of such techniques for disentangling complex and interacting associations between the environment and plant diversity. Current climate and environmental heterogeneity emerged as the primary drivers, while past environmental conditions left only small but detectable imprints on plant diversity. Finally, we combined predictions from multiple modeling techniques (ensemble predictions) to reveal global patterns and centers of plant diversity at multiple resolutions down to 7774 km2 . Our predictive maps provide accurate estimates of global plant diversity available at grain sizes relevant for conservation and macroecology.
Collapse
Affiliation(s)
- Lirong Cai
- Biodiversity, Macroecology and Biogeography, University of Göttingen, 37077, Göttingen, Germany
| | - Holger Kreft
- Biodiversity, Macroecology and Biogeography, University of Göttingen, 37077, Göttingen, Germany
- Centre of Biodiversity and Sustainable Land Use, University of Göttingen, 37077, Göttingen, Germany
| | - Amanda Taylor
- Biodiversity, Macroecology and Biogeography, University of Göttingen, 37077, Göttingen, Germany
| | - Pierre Denelle
- Biodiversity, Macroecology and Biogeography, University of Göttingen, 37077, Göttingen, Germany
| | - Julian Schrader
- Biodiversity, Macroecology and Biogeography, University of Göttingen, 37077, Göttingen, Germany
- School of Natural Sciences, Macquarie University, 2109, Sydney, NSW, Australia
| | - Franz Essl
- Bioinvasions, Global Change, Macroecology-Group, University of Vienna, 1030, Vienna, Austria
| | - Mark van Kleunen
- Ecology, Department of Biology, University of Konstanz, 78464, Konstanz, Germany
- Zhejiang Provincial Key Laboratory of Plant Evolutionary Ecology and Conservation, Taizhou University, 318000, Taizhou, China
| | - Jan Pergl
- Department of Invasion Ecology, Czech Academy of Sciences, Institute of Botany, 25243, Průhonice, Czech Republic
| | - Petr Pyšek
- Department of Invasion Ecology, Czech Academy of Sciences, Institute of Botany, 25243, Průhonice, Czech Republic
- Department of Ecology, Faculty of Science, Charles University, 12844, Prague, Czech Republic
| | - Anke Stein
- Ecology, Department of Biology, University of Konstanz, 78464, Konstanz, Germany
| | - Marten Winter
- German Centre for Integrative Biodiversity Research (iDiv) Halle-Jena-Leipzig, 04103, Leipzig, Germany
| | - Julie F Barcelona
- School of Biological Sciences, University of Canterbury, 8140, Christchurch, New Zealand
| | - Nicol Fuentes
- Departamento de Botánica, Facultad de Ciencias Naturales y Oceanográficas, Universidad de Concepción, 4030000, Concepción, Chile
| | - Dirk Nikolaus Karger
- Swiss Federal Institute for Forest, Snow and Landscape Research WSL, 8903, Birmensdorf, Switzerland
| | - John Kartesz
- Biota of North America Program (BONAP), Chapel Hill, NC, 27516, USA
| | | | - Misako Nishino
- Biota of North America Program (BONAP), Chapel Hill, NC, 27516, USA
| | - Daniel Nickrent
- Plant Biology Section, School of Integrative Plant Science, College of Agriculture and Life Science, Cornell University, Ithaca, NY, 14853, USA
| | - Arkadiusz Nowak
- Department of Botany and Nature Protection, University of Warmia and Mazury in Olsztyn, 10-728, Olsztyn, Poland
- PAS Botanical Garden, 02-973, Warszawa, Poland
| | - Annette Patzelt
- Hochschule Weihenstephan-Triesdorf, University of Applied Sciences, Vegetation Ecology, 85354, Freising, Germany
| | - Pieter B Pelser
- School of Biological Sciences, University of Canterbury, 8140, Christchurch, New Zealand
| | | | - Jan J Wieringa
- Naturalis Biodiversity Center, 2333 CR, Leiden, the Netherlands
| | - Patrick Weigelt
- Biodiversity, Macroecology and Biogeography, University of Göttingen, 37077, Göttingen, Germany
- Centre of Biodiversity and Sustainable Land Use, University of Göttingen, 37077, Göttingen, Germany
- Campus-Institut Data Science, 37077, Göttingen, Germany
| |
Collapse
|
21
|
Guo Z, Zhang Y, Xu R, Xie H, Xiao X, Peng C. Contamination vertical distribution and key factors identification of metal(loid)s in site soil from an abandoned Pb/Zn smelter using machine learning. THE SCIENCE OF THE TOTAL ENVIRONMENT 2023; 856:159264. [PMID: 36208763 DOI: 10.1016/j.scitotenv.2022.159264] [Citation(s) in RCA: 15] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/12/2022] [Revised: 09/29/2022] [Accepted: 10/02/2022] [Indexed: 06/16/2023]
Abstract
Soil heterogeneity makes the vertical distribution of metal(loid)s in site soil vary considerably and poses a challenge for identifying the key factors of metal(loid)s migration in site soil profiles. In this study, a machine learning (ML) model was developed to study a typical abandoned Pb/Zn smelter using 267 site soils from 46 drilling points. Results showed that a well-trained ML model could be used to identify the key factors in determining the contamination vertical distribution and predict the metal(loid)s contents in subsurface soil. As, Cd, Pb, and Zn were the primary pollutants and their vertical migration depth arrived to 4-6 m. Based on the predictive performance of different ML algorithms, the extreme gradient boosting (XGB) was selected as the best model to produce accurate predictions for the most metal(loid)s content. Contents of As, Cd, Pb, and Zn in the heavily contaminated zones declined with an increase of soil depth. The metal(loid) contents in surface soil of 0-2 m could be readily used to predict the content of Cd, Cr, Hg, and Zn in subsurface soil from 2 m to 10 m. Based on the metal-specific XGB models, sulfur content, functional area, and soil texture were identified as key factors affecting the vertical distribution of As, Cd, Pb, and Zn in site soil. Results suggested the ML method is helpful to manage the potential environmental risks of metal(loid)s in Pb/Zn smelting site.
Collapse
Affiliation(s)
- Zhaohui Guo
- Institute of Environmental Engineering, School of Metallurgy and Environment, Central South University, Changsha 410083, PR China
| | - Yunxia Zhang
- Institute of Environmental Engineering, School of Metallurgy and Environment, Central South University, Changsha 410083, PR China
| | - Rui Xu
- Institute of Environmental Engineering, School of Metallurgy and Environment, Central South University, Changsha 410083, PR China.
| | - Huimin Xie
- Institute of Environmental Engineering, School of Metallurgy and Environment, Central South University, Changsha 410083, PR China
| | - Xiyuan Xiao
- Institute of Environmental Engineering, School of Metallurgy and Environment, Central South University, Changsha 410083, PR China
| | - Chi Peng
- Institute of Environmental Engineering, School of Metallurgy and Environment, Central South University, Changsha 410083, PR China
| |
Collapse
|
22
|
Ota R, Yamashita F. Application of machine learning techniques to the analysis and prediction of drug pharmacokinetics. J Control Release 2022; 352:961-969. [PMID: 36370876 DOI: 10.1016/j.jconrel.2022.11.014] [Citation(s) in RCA: 22] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2022] [Revised: 10/23/2022] [Accepted: 11/07/2022] [Indexed: 11/17/2022]
Abstract
In this review, we describe the current status and challenges in applying machine-learning techniques to the analysis and prediction of pharmacokinetic data. The theory of pharmacokinetics has been developed over decades on the basis of physiology and reaction kinetics. Mathematical models allow the reduction of pharmacokinetic data to parameter values, giving insight and understanding into ADME processes and predicting the outcome of different dosing scenarios. However, much information hidden in the data is lost through conceptual simplification with models. It is difficult to use mechanistic models alone to predict diverse pharmacokinetic time profiles, including inter-drug and inter-individual differences, in a cross-sectional manner. Machine learning is a prediction platform that can handle complex phenomena through data-driven analysis. As a resule, machine learning has been successfully adopted in various fields, including image recognition and language processing, and has been used for over two decades in pharmacokinetic research, primarily in the area of quantitative structure-activity relationships for pharmacokinetic parameters. Machine-learning models are generally known to provide better predictive performance than conventional linear models. Owing to the recent success in deep learning, models with new structures are being consistently proposed. These models include transfer learning and generative adversarial networks, which contribute to the effective use of a limited amount of data by diverting existing similar models or generating pseudo-data. How to make such newly emerging machine learning technologies applicable to meet challenges in the pharmacokinetics/pharmacodynamics field is now the key issue.
Collapse
Affiliation(s)
- Ryosaku Ota
- Department of Drug Delivery Research, Graduate School of Pharmaceutical Sciences, Kyoto University, Sakyo-ku, Kyoto 606-8501, Japan
| | - Fumiyoshi Yamashita
- Department of Drug Delivery Research, Graduate School of Pharmaceutical Sciences, Kyoto University, Sakyo-ku, Kyoto 606-8501, Japan; Department of Applied Pharmacy and Pharmacokinetics, Graduate School of Pharmaceutical Sciences, Kyoto University, Sakyo-ku, Kyoto 606-8501, Japan.
| |
Collapse
|
23
|
Monitoring ecological status of wetlands using linked fuzzy inference system- remote sensing analysis. ECOL INFORM 2022. [DOI: 10.1016/j.ecoinf.2022.101971] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]
|
24
|
An approach to multi-class imbalanced problem in ecology using machine learning. ECOL INFORM 2022. [DOI: 10.1016/j.ecoinf.2022.101822] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
|
25
|
Machine learning approach to predict terrestrial gross primary productivity using topographical and remote sensing data. ECOL INFORM 2022. [DOI: 10.1016/j.ecoinf.2022.101697] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
26
|
Cho E, Cho S, Kim M, Ediriweera TK, Seo D, Lee SS, Cha J, Jin D, Kim YK, Lee JH. Single nucleotide polymorphism marker combinations for classifying Yeonsan Ogye chicken using a machine learning approach. JOURNAL OF ANIMAL SCIENCE AND TECHNOLOGY 2022; 64:830-841. [PMID: 36287747 PMCID: PMC9574617 DOI: 10.5187/jast.2022.e64] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/13/2022] [Revised: 07/15/2022] [Accepted: 08/01/2022] [Indexed: 11/27/2022]
Abstract
Genetic analysis has great potential as a tool to differentiate between different species and breeds of livestock. In this study, the optimal combinations of single nucleotide polymorphism (SNP) markers for discriminating the Yeonsan Ogye chicken (Gallus gallus domesticus) breed were identified using high-density 600K SNP array data. In 3,904 individuals from 198 chicken breeds, SNP markers specific to the target population were discovered through a case-control genome-wide association study (GWAS) and filtered out based on the linkage disequilibrium blocks. Significant SNP markers were selected by feature selection applying two machine learning algorithms: Random Forest (RF) and AdaBoost (AB). Using a machine learning approach, the 38 (RF) and 43 (AB) optimal SNP marker combinations for the Yeonsan Ogye chicken population demonstrated 100% accuracy. Hence, the GWAS and machine learning models used in this study can be efficiently utilized to identify the optimal combination of markers for discriminating target populations using multiple SNP markers.
Collapse
Affiliation(s)
- Eunjin Cho
- Department of Bio-AI Convergence, Chungnam
National University, Daejeon 34134, Korea
| | - Sunghyun Cho
- Research and Development Center,
Insilicogen Inc., Yongin 19654, Korea
| | - Minjun Kim
- Division of Animal and Dairy Science,
Chungnam National University, Daejeon 34134, Korea
| | | | - Dongwon Seo
- Department of Bio-AI Convergence, Chungnam
National University, Daejeon 34134, Korea,Research Institute TNT Research
Company, Jeonju 54810, Korea
| | | | - Jihye Cha
- Animal Genome & Bioinformatics,
National Institute of Animal Science, Rural Development
Administration, Wanju 55365, Korea
| | - Daehyeok Jin
- Animal Genetic Resources Research Center,
National Institute of Animal Science, Rural Development
Administration, Hamyang 50000, Korea
| | - Young-Kuk Kim
- Department of Bio-AI Convergence, Chungnam
National University, Daejeon 34134, Korea
| | - Jun Heon Lee
- Department of Bio-AI Convergence, Chungnam
National University, Daejeon 34134, Korea,Division of Animal and Dairy Science,
Chungnam National University, Daejeon 34134, Korea,Corresponding author: Jun Heon Lee,
Department of Bio-AI Convergence, Chungnam National University, Daejeon 34134,
Korea. Tel: +82-42-821-5779, E-mail:
| |
Collapse
|
27
|
Nour M, Kandaz D, Ucar MK, Polat K, Alhudhaif A. Machine Learning and Electrocardiography Signal-Based Minimum Calculation Time Detection for Blood Pressure Detection. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2022; 2022:5714454. [PMID: 35903432 PMCID: PMC9325348 DOI: 10.1155/2022/5714454] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/23/2022] [Revised: 06/21/2022] [Accepted: 06/28/2022] [Indexed: 11/18/2022]
Abstract
Objective Measurement and monitoring of blood pressure are of great importance for preventing diseases such as cardiovascular and stroke caused by hypertension. Therefore, there is a need for advanced artificial intelligence-based systolic and diastolic blood pressure systems with a new technological infrastructure with a noninvasive process. The study is aimed at determining the minimum ECG time required for calculating systolic and diastolic blood pressure based on the Electrocardiography (ECG) signal. Methodology. The study includes ECG recordings of five individuals taken from the IEEE database, measured during daily activity. For the study, each signal was divided into epochs of 2-4-6-8-10-12-14-16-18-20 seconds. Twenty-five features were extracted from each epoched signal. The dimension of the dataset was reduced by using Spearman's feature selection algorithm. Analysis based on metrics was carried out by applying machine learning algorithms to the obtained dataset. Gaussian process regression exponential (GPR) machine learning algorithm was preferred because it is easy to integrate into embedded systems. Results The MAPE estimation performance values for diastolic and systolic blood pressure values for 16-second epochs were 2.44 mmHg and 1.92 mmHg, respectively. Conclusion According to the study results, it is evaluated that systolic and diastolic blood pressure values can be calculated with a high-performance ratio with 16-second ECG signals.
Collapse
Affiliation(s)
- Majid Nour
- Department of Electrical and Computer Engineering, Faculty of Engineering, King Abdulaziz University, Jeddah 21589, Saudi Arabia
| | - Derya Kandaz
- Electrical-Electronics Engineering, Faculty of Engineering, Sakarya University, 54187 Sakarya, Turkey
| | - Muhammed Kursad Ucar
- Electrical-Electronics Engineering, Faculty of Engineering, Sakarya University, 54187 Sakarya, Turkey
| | - Kemal Polat
- Department of Electrical and Electronics Engineering, Faculty of Engineering, Bolu Abant Izzet Baysal University, Bolu 14280, Turkey
| | - Adi Alhudhaif
- Department of Computer Science, College of Computer Engineering and Sciences in Al-Kharj, Prince Sattam Bin Abdulaziz University, P.O. Box 151, Al-Kharj 11942, Saudi Arabia
| |
Collapse
|
28
|
Fuentes-Cortés LF, Flores-Tlacuahuac A, Nigam KDP. Machine Learning Algorithms Used in PSE Environments: A Didactic Approach and Critical Perspective. Ind Eng Chem Res 2022. [DOI: 10.1021/acs.iecr.2c00335] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Affiliation(s)
- Luis Fabián Fuentes-Cortés
- Departamento de Ingeniería Química, Tecnologico Nacional de México - Instituto Tecnológico de Celaya, Celaya, Guanajuato 38010, Mexico
| | - Antonio Flores-Tlacuahuac
- Tecnologico de Monterrey, Escuela de Ingeniería y Ciencias Ave. Eugenio Garza Sada 2501, Monterrey, N.L. 64849, Mexico
| | - Krishna D. P. Nigam
- Tecnologico de Monterrey, Escuela de Ingeniería y Ciencias Ave. Eugenio Garza Sada 2501, Monterrey, N.L. 64849, Mexico
- Department of Chemical Engineering, Indian Institute of Technology Delhi 600036, India
| |
Collapse
|
29
|
Bhattacharyya SS. Monetization of customer futures through machine learning and artificial intelligence based persuasive technologies. JOURNAL OF SCIENCE AND TECHNOLOGY POLICY MANAGEMENT 2022. [DOI: 10.1108/jstpm-09-2021-0136] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
Purpose
The purpose of this study was to ascertain how real options investment perspective could be applied towards monetization of customer futures through the deployment of machine learning (ML) and artificial intelligence (AI)-based persuasive technologies.
Design/methodology/approach
The authors embarked on a theoretical treatise as advocated by scholars (Cornelissen, 2019; Barney, 2018; Cornelissen, 2017; Smithey Fulmer, 2012; Bacharach, 1989; Whetten, 1989; Weick,1989). Towards this end, theoretical argumentative logic was incrementally used to build an integrated perspective on the deployment of learning and AI-based persuasive technologies. This was carried out with strategic real options investment perspective to secure customer futures on m-commerce apps and e-commerce sites.
Findings
M-commerce apps and e-commerce sites have been deploying ML and AI-based tools (referred to as persuasive technologies), to nudge customers for increased and quicker purchase. The primary objective was to increase engagement time of customers (at an individual level), grow the number of customers (at market level) and increase firm revenue (at an organizational level). The deployment of any persuasive technology entailed increased investment (cash outflow) but was also expected to increase the level of revenue and margin (cash inflow). Given the dynamics of market and the emergent nature of persuasive technologies, ascertaining favourable cash flow was challenging. Real options strategy provided a robust theoretical perspective to time the persuasive technology-related investment in stages. This helped managers to be on time with loading customer purchase with increased temporal immediacy. A real options investment space involving six spaces has also been developed in this conceptual work. These were Never Invest, Immediately Investment, Present-day Investment Possibility, Possibly Invest Later, Invest Probably Later and Possibly Never Invest.
Research limitations/implications
The foundations of this study domain encompassed work done by an eclectic mix of scholars like from technology management (Siggelkow and Terwiesch, 2019a; Porter and Heppelmann, 2014), real options (Trigeorgis and Reuer, 2017; Luehrman, 1998a, 1998b), marketing intelligence and planning (Appel et al., 2020; Thaichon et al., 2019; Thaichon et al., 2020; Ye et al., 2019) and strategy from a demand positioning school of thought (Adner and Zemsky, 2006).
Practical implications
The findings would help managers to comprehend what level of investments need to be done in a staggered manner. The phased way of investing towards the deployment of ML and AI-based persuasive technologies would enable better monetization of customer futures. This would aid marketing managers for increased customer engagement at the individual level, fast monetization of customer futures and increased number of customers and consumption on m-commerce apps and e-commerce sites.
Originality/value
This was one of the first studies to apply real options investment perspective towards the deployment of ML and AI-based persuasive technologies for monetizing customer futures.
Collapse
|
30
|
Isik YE, Gormez Y, Aydin Z, Bakir-Gungor B. The Determination of Distinctive Single Nucleotide Polymorphism Sets for the Diagnosis of Behçet's Disease. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:1909-1918. [PMID: 33476272 DOI: 10.1109/tcbb.2021.3053429] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Behçet's Disease (BD) is a multi-system inflammatory disorder in which the etiology remains unclear. The most probable hypothesis is that genetic tendency and environmental factors play roles in the development of BD. In order to find the essential reasons, genetic changes on thousands of genes should be analyzed. Besides, there is a need for extra analysis to find out which genetic factor affects the disease. Machine learning approaches have high potential for extracting the knowledge from genomics and selecting the representative Single Nucleotide Polymorphisms (SNPs) as the most effective features for the clinical diagnosis process. In this study, we have attempted to identify representative SNPs using feature selection methods, incorporating biological information and aimed to develop a machine-learning model for diagnosing Behçet's disease. By combining biological information and machine learning classifiers, up to 99.64 percent accuracy of disease prediction is achieved using only 13,611 out of 311,459 SNPs. In addition, we revealed the SNPs that are most distinctive by performing repeated feature selection in cross-validation experiments.
Collapse
|
31
|
Identifying Key Environmental Factors for Paulownia coreana Habitats: Implementing National On-Site Survey and Machine Learning Algorithms. LAND 2022. [DOI: 10.3390/land11040578] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/10/2022]
Abstract
Monitoring and preserving natural habitats has become an essential activity in many countries today. As a native tree species in Korea, Paulownia coreana has periodically been surveyed in national ecological surveys and was identified as an important target for conservation as well as habitat monitoring and management. This study explores habitat suitability models (HSMs) for Paulownia coreana in conjunction with national ecological survey data and various environmental factors. Together with environmental variables, the national ecological survey data were run through machine learning algorithms such as Artificial Neural Network and Decision Tree & Rules, which were used to identify the impact of individual variables and create HSMs for Paulownia coreana, respectively. Unlike other studies, which used remote sensing data to create HSMs, this study employed periodical on-site survey data for enhanced validity. Moreover, localized environmental resources such as topography, soil, and rainfall were taken into account to project habitat suitability. Among the environment variables used, the study identified critical attributes that affect the habitat conditions of Paulownia coreana. Therefore, the habitat suitability modelling methods employed in this study could play key roles in planning, monitoring, and managing plants species in regional and national levels. Furthermore, it could shed light on existing challenges and future research needs.
Collapse
|
32
|
Oyewola DO, Dada EG. Exploring machine learning: a scientometrics approach using bibliometrix and VOSviewer. SN APPLIED SCIENCES 2022; 4:143. [PMID: 35434524 PMCID: PMC8996204 DOI: 10.1007/s42452-022-05027-7] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2021] [Accepted: 11/10/2021] [Indexed: 02/07/2023] Open
Abstract
Machine Learning has found application in solving complex problems in different fields of human endeavors such as intelligent gaming, automated transportation, cyborg technology, environmental protection, enhanced health care, innovation in banking and home security, and smart homes. This research is motivated by the need to explore the global structure of machine learning to ascertain the level of bibliographic coupling, collaboration among research institutions, co-authorship network of countries, and sources coupling in publications on machine learning techniques. The Hierarchical Density-Based Spatial Clustering of Applications with Noise (HDBSCAN) was applied to clustering prediction of authors dominance ranking in this paper. Publications related to machine learning were retrieved and extracted from the Dimensions database with no language restrictions. Bibliometrix was employed in computation and visualization to extract bibliographic information and perform a descriptive analysis. VOSviewer (version 1.6.16) tool was used to construct and visualize structure map of source coupling networks of researchers and co-authorship. About 10,814 research papers on machine learning published from 2010 to 2020 were retrieved for the research. Experimental results showed that the highest degree of betweenness centrality was obtained from cluster 3 with 153.86 from the University of California and Harvard University with 24.70. In cluster 1, the national university of Singapore has the highest degree betweenness of 91.72. Also, in cluster 5, the University of Cambridge (52.24) and imperial college London (4.52) having the highest betweenness centrality manifesting that he could control the collaborative relationship and that they possessed and controlled a large number of research resources. Findings revealed that this work has the potential to provide valuable guidance for new perspectives and future research work in the rapidly developing field of machine learning.
Collapse
Affiliation(s)
- David Opeoluwa Oyewola
- Department of Mathematics and Computer Science, Faculty of Science, Federal University of Kashere, P.M.B 0182, Gombe, Nigeria
| | - Emmanuel Gbenga Dada
- Department of Mathematical Sciences, Faculty of Science, University of Maiduguri, Maiduguri, Nigeria
| |
Collapse
|
33
|
Hsieh HC, Lin PT, Sung KB. Characterization and identification of cell death dynamics by quantitative phase imaging. JOURNAL OF BIOMEDICAL OPTICS 2022; 27:046502. [PMID: 35484694 PMCID: PMC9047449 DOI: 10.1117/1.jbo.27.4.046502] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 02/01/2022] [Accepted: 04/05/2022] [Indexed: 06/14/2023]
Abstract
SIGNIFICANCE Investigating cell death dynamics at the single-cell level plays an essential role in biological research. Quantitative phase imaging (QPI), a label-free method without adverse effects of exogenous labels, has been widely used to image many types of cells under various conditions. However, the dynamics of QPI features during cell death have not been thoroughly characterized. AIM We aim to develop a label-free technique to quantitatively characterize single-cell dynamics of cellular morphology and intracellular mass distribution of cells undergoing apoptosis and necrosis. APPROACH QPI was used to capture time-lapse phase images of apoptotic, necrotic, and normal cells. The dynamics of morphological and QPI features during cell death were fitted by a sigmoid function to quantify both the extent and rate of changes. RESULTS The two types of cell death mainly differed from normal cells in the lower phase of the central region and differed from each other in the sharp nuclear boundary shown in apoptotic cells. CONCLUSIONS The proposed method characterizes the dynamics of cellular morphology and intracellular mass distributions, which could be applied to studying cells undergoing state transition such as drug response.
Collapse
Affiliation(s)
- Huai-Ching Hsieh
- National Taiwan University, Department of Life Science, Taipei, Taiwan
- National Taiwan University, Department of Electrical Engineering, Taipei, Taiwan
| | - Po-Ting Lin
- National Taiwan University, Graduate Institute of Biomedical Electronics and Bioinformatics, Taipei, Taiwan
| | - Kung-Bin Sung
- National Taiwan University, Department of Electrical Engineering, Taipei, Taiwan
- National Taiwan University, Graduate Institute of Biomedical Electronics and Bioinformatics, Taipei, Taiwan
- National Taiwan University, Molecular Imaging Center, Taipei, Taiwan
| |
Collapse
|
34
|
Machine Learning for Pan Evaporation Modeling in Different Agroclimatic Zones of the Slovak Republic (Macro-Regions). SUSTAINABILITY 2022. [DOI: 10.3390/su14063475] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Global climate change is likely to influence evapotranspiration (ET); as a result, many ET calculation methods may not give accurate results under different climatic conditions. The main objective of this study is to verify the suitability of machine learning (ML) models as calculation methods for pan evaporation modeling on the macro-regional scale. The most significant PE changes in the different agroclimatic zones of the Slovak Republic were compared, and their considerable impacts were analyzed. On the basis of the agroclimatic zones, 35 meteorological stations distributed across Slovakia were classified into six macro-regions. For each of the meteorological stations, 11 variables were applied during the vegetation period in the years from 2010 to 2020 with a daily time step. The performance of eight different ML models—the neural network (NN) model, the autoneural network (AN) model, the decision tree (DT) model, the Dmine regression (DR) model, the DM neural network (DM NN) model, the gradient boosting (GB) model, the least angle regression (LARS) model, and the ensemble model (EM)—was employed to predict PE. It was found that the different models had diverse prediction accuracies in various geographical locations. In this study, the results of the values predicted by the individual models are compared.
Collapse
|
35
|
Productivity-Based Land Suitability and Management Sensitivity Analysis: The Eucalyptus E. urophylla × E. grandis Case. FORESTS 2022. [DOI: 10.3390/f13020340] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
Eucalyptus plantations are productive and short rotation forests prevalent in tropical areas that experience fast expansion and face controversies in ecological issues. In this study, we perform a systematic analysis of factors influencing eucalyptus growth through plot records from the National Forest Inventories and satellite images. We find primary restricting factors for eucalyptus growth via machine learning algorithms with random forests and accumulated local effects plots, as conventional forest growth models are inadequate to calculate the causal effect with the large number of environmental and socioeconomic factors. As a result, despite common belief that temperature affects eucalyptus growth the most, we find that precipitation is the most evident restricting factor for eucalyptus growth. We then identify and rank key factors that affect timber growth, such as tree density, rotation period, and wood ownership. Finally, we suggest optimal management and planting strategies for local farmers and policymakers to facilitate eucalyptus growth.
Collapse
|
36
|
Bernardes RC, Botina LL, da Silva FP, Fernandes KM, Lima MAP, Martins GF. Toxicological assessment of agrochemicals on bees using machine learning tools. JOURNAL OF HAZARDOUS MATERIALS 2022; 424:127344. [PMID: 34607030 DOI: 10.1016/j.jhazmat.2021.127344] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/01/2021] [Revised: 09/22/2021] [Accepted: 09/22/2021] [Indexed: 06/13/2023]
Abstract
Machine learning (ML) is a branch of artificial intelligence (AI) that enables the analysis of complex multivariate data. ML has significant potential in risk assessments of non-target insects for modeling the multiple factors affecting insect health, including the adverse effects of agrochemicals. Here, the potential of ML for risk assessments of glyphosate (herbicide; formulation) and imidacloprid (insecticide, neonicotinoid; formulation) on the stingless bee Melipona quadrifasciata was explored. The collective behavior of forager bees was analyzed after in vitro exposure to agrochemicals. ML algorithms were applied to identify the agrochemicals that the bees have been exposed to based on multivariate behavioral features. Changes in the in situ detection of different proteins in the midgut were also studied. Imidacloprid exposure leads to the greatest changes in behavior. The ML algorithms achieved a higher accuracy (up to 91%) in identifying agrochemical contamination. The two agrochemicals altered the detection of cells positive for different proteins, which can be detrimental to midgut physiology. This study provides a holistic assessment of the sublethal effects of glyphosate and imidacloprid on a key pollinator. The procedures used here can be applied in future studies to monitor and predict multiple environmental factors affecting insect health in the field.
Collapse
Affiliation(s)
| | - Lorena Lisbetd Botina
- Departamento de Entomologia, Universidade Federal de Viçosa, Viçosa, Minas Gerais, Brazil
| | | | - Kenner Morais Fernandes
- Departamento de Biologia Geral, Universidade Federal de Viçosa, Viçosa, Minas Gerais, Brazil
| | | | | |
Collapse
|
37
|
Hancock PA, Lynd A, Wiebe A, Devine M, Essandoh J, Wat'senga F, Manzambi EZ, Agossa F, Donnelly MJ, Weetman D, Moyes CL. Modelling spatiotemporal trends in the frequency of genetic mutations conferring insecticide target-site resistance in African mosquito malaria vector species. BMC Biol 2022; 20:46. [PMID: 35164747 PMCID: PMC8845222 DOI: 10.1186/s12915-022-01242-1] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2021] [Accepted: 01/28/2022] [Indexed: 12/16/2022] Open
Abstract
BACKGROUND Resistance in malaria vectors to pyrethroids, the most widely used class of insecticides for malaria vector control, threatens the continued efficacy of vector control tools. Target-site resistance is an important genetic resistance mechanism caused by mutations in the voltage-gated sodium channel (Vgsc) gene that encodes the pyrethroid target-site. Understanding the geographic distribution of target-site resistance, and temporal trends across different vector species, can inform strategic deployment of vector control tools. RESULTS We develop a Bayesian statistical spatiotemporal model to interpret species-specific trends in the frequency of the most common resistance mutations, Vgsc-995S and Vgsc-995F, in three major malaria vector species Anopheles gambiae, An. coluzzii, and An. arabiensis over the period 2005-2017. The models are informed by 2418 observations of the frequency of each mutation in field sampled mosquitoes collected from 27 countries spanning western and eastern regions of Africa. For nine selected countries, we develop annual predictive maps which reveal geographically structured patterns of spread of each mutation at regional and continental scales. The results show associations, as well as stark differences, in spread dynamics of the two mutations across the three vector species. The coverage of ITNs was an influential predictor of Vgsc allele frequencies, with modelled relationships between ITN coverage and allele frequencies varying across species and geographic regions. We found that our mapped Vgsc allele frequencies are a significant partial predictor of phenotypic resistance to the pyrethroid deltamethrin in An. gambiae complex populations. CONCLUSIONS Our predictive maps show how spatiotemporal trends in insecticide target-site resistance mechanisms in African An. gambiae vary across individual vector species and geographic regions. Molecular surveillance of resistance mechanisms will help to predict resistance phenotypes and track their spread.
Collapse
Affiliation(s)
| | - Amy Lynd
- Department of Vector Biology, Liverpool School of Tropical Medicine, Liverpool, L35QA, UK
| | | | - Maria Devine
- Big Data Institute, University of Oxford, Oxford, OX3 7LF, UK
| | - John Essandoh
- Department of Vector Biology, Liverpool School of Tropical Medicine, Liverpool, L35QA, UK
| | - Francis Wat'senga
- Institut National de Recherche Biomédicale, PO Box 1192, Kinshasa, Democratic Republic of Congo
| | - Emile Z Manzambi
- Institut National de Recherche Biomédicale, PO Box 1192, Kinshasa, Democratic Republic of Congo
| | - Fiacre Agossa
- USAID President's Malaria Initiative, VectorLink Project, Abt Associates, 6130 Executive Blvd 16, Rockville, MD, 20852, USA
| | - Martin J Donnelly
- Department of Vector Biology, Liverpool School of Tropical Medicine, Liverpool, L35QA, UK
| | - David Weetman
- Department of Vector Biology, Liverpool School of Tropical Medicine, Liverpool, L35QA, UK
| | | |
Collapse
|
38
|
Ioannou A, Lycett M, Marshan A. The Role of Mindfulness in Mitigating the Negative Consequences of Technostress. INFORMATION SYSTEMS FRONTIERS : A JOURNAL OF RESEARCH AND INNOVATION 2022:1-27. [PMID: 35095332 PMCID: PMC8790950 DOI: 10.1007/s10796-021-10239-0] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Accepted: 12/18/2021] [Indexed: 05/07/2023]
Abstract
IT offers significant benefits both to individuals and organisations, such as during the Covid-19 pandemic where technology played a primary role in aiding remote working environments; however, IT use comes with consequences such as 'technostress' - stress arising from extended use of technology. Addressing the paucity of research related to this topic, in this study, we examine the role of mindfulness and IT mindfulness to both mitigate the impact of technostress and alleviate its negative consequences; revealing that mindfulness can reduce technostress and increase job satisfaction, while IT mindfulness can enhance user satisfaction and improve task performance. Moreover, our work sheds light on the under-researched relationship between mindfulness and IT mindfulness; showing that the latter has a stronger influence on IT related outcomes; revealing the valuable role of mindfulness and IT mindfulness in the workplace and offering important implications to theory and practice.
Collapse
Affiliation(s)
- Athina Ioannou
- Surrey Business School, University of Surrey, Guildford, UK
| | - Mark Lycett
- School of Management, Royal Holloway, University of London, London, UK
| | - Alaa Marshan
- Department of Computer Science, Brunel University London, Uxbridge, UK
| |
Collapse
|
39
|
Bohm C, Albani S, Ofria C, Ackles A. Using the Comparative Hybrid Approach to Disentangle the Role of Substrate Choice on the Evolution of Cognition. ARTIFICIAL LIFE 2022; 28:423-439. [PMID: 35929774 DOI: 10.1162/artl_a_00372] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Understanding the structure and evolution of natural cognition is a topic of broad scientific interest, as is the development of an engineering toolkit to construct artificial cognitive systems. One open question is determining which components and techniques to use in such a toolkit. To investigate this question, we employ agent-based AI, using simple computational substrates (i.e., digital brains) undergoing rapid evolution. Such systems are an ideal choice as they are fast to process, easy to manipulate, and transparent for analysis. Even in this limited domain, however, hundreds of different computational substrates are used. While benchmarks exist to compare the quality of different substrates, little work has been done to build broader theory on how substrate features interact. We propose a technique called the Comparative Hybrid Approach and develop a proof-of-concept by systematically analyzing components from three evolvable substrates: recurrent artificial neural networks, Markov brains, and Cartesian genetic programming. We study the role and interaction of individual elements of these substrates by recombining them in a piecewise manner to form new hybrid substrates that can be empirically tested. Here, we focus on network sparsity, memory discretization, and logic operators of each substrate. We test the original substrates and the hybrids across a suite of distinct environments with different logic and memory requirements. While we observe many trends, we see that discreteness of memory and the Markov brain logic gates correlate with high performance across our test conditions. Our results demonstrate that the Comparative Hybrid Approach can identify structural subcomponents that predict task performance across multiple computational substrates.
Collapse
Affiliation(s)
- Clifford Bohm
- Michigan State University, Department of Integrative Biology, BEACON Center for the Study of Evolution in Action.
| | - Sarah Albani
- Michigan State University, Department of Neuroscience, Lyman Briggs College BEACON Center for the Study of Evolution in Action
| | - Charles Ofria
- Michigan State University, Department of Computer Science Program in Ecology, Evolution, and Biology BEACON Center for the Study of Evolution in Action
| | - Acacia Ackles
- Michigan State University, Department of Integrative Biology Program in Ecology, Evolution, and Biology BEACON Center for the Study of Evolution in Action
| |
Collapse
|
40
|
Sharma A, Nigam S. Parametric Model for Flora Detection in Middle Himalayas. INTERNATIONAL JOURNAL OF DECISION SUPPORT SYSTEM TECHNOLOGY 2022. [DOI: 10.4018/ijdsst.286698] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Plant detection forms an integral part of the life of the forest guards, researchers, and students in the field of Botany and for common people also who are curious about knowing a plant. But detecting plants suffer a major drawback that the true identifier is only the flower and in certain species flowering occurs at major time period gaps spanning from few months to over 100 years (in certain types of bamboos). Machine Learning-based systems could be used in developing models where the experience of researchers in the field of plant sciences can be incorporated into the model. In this paper, we present a machine learning-based approach based upon other quantifiable parameters for the detection of the plant presented. The system takes plant parameters as the inputs and will detect the plant family as the output.
Collapse
Affiliation(s)
- Aviral Sharma
- Department of Informatics, School of Computer Science, University of Petroleum and Energy Studies, Dehradun, India
| | - Saumya Nigam
- University of Petroleum and Energy Studies, India
| |
Collapse
|
41
|
Empirical Modeling of Stream Nutrients for Countries without Robust Water Quality Monitoring Systems. ENVIRONMENTS 2021. [DOI: 10.3390/environments8110129] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Water quality models are useful tools to understand and mitigate eutrophication processes. However, gaining access to high-resolution data and fitting models to local conditions can interfere with their implementation. This paper analyzes whether it is possible to create a spatial model of nutrient water level at a local scale that is applicable in different geophysical and land-use conditions. The total nitrogen and phosphorus concentrations were modeled by integrating Geographical Information Systems, Remote Sensing, and Generalized Additive and Land-Use Changes Modeling. The research was based on two case studies, which included 204 drainage basins, with nutrient and limnological data collected during two seasons. The models performed well under local conditions, with small errors calculated from the independent samples. The recorded and predicted concentrations of nutrients indicated a significant risk of water eutrophication in both areas, showing the impact of agricultural intensification and population growth on water quality. The models are a contribution to the sustainable land-use planning process, which can help to prevent or promote land-use transformation and new practices in agricultural production and urban design. The ability to implement models using secondary information, which is easily collected at a low cost, is the most remarkable feature of this approach.
Collapse
|
42
|
Rodrigues TF, Nogueira K, Chiarello AG. Noninvasive Low‐cost Method to Identify Armadillos' Burrows: A Machine Learning Approach. WILDLIFE SOC B 2021. [DOI: 10.1002/wsb.1222] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Affiliation(s)
- Thiago F. Rodrigues
- Applied Ecology Program, Luiz de Queiroz College of Agriculture University of São Paulo Av. Pádua Dias 11 Piracicaba SP 13418‐900 Brazil
| | - Keiller Nogueira
- Data Science Research Group, Computing Science and Mathematics Division University of Stirling Scotland FK9 4LA UK
| | - Adriano G. Chiarello
- Department of Biology, Faculty of Philosophy, Sciences and Languages of Ribeirão Preto University of São Paulo Av. Bandeirantes 3900 Ribeirão Preto SP 14040‐901 Brazil
| |
Collapse
|
43
|
Ribeiro AP, da Silva NFF, Mesquita FN, Araújo PDCS, Rosa TC, Mesquita-Neto JN. Machine learning approach for automatic recognition of tomato-pollinating bees based on their buzzing-sounds. PLoS Comput Biol 2021; 17:e1009426. [PMID: 34529654 PMCID: PMC8478199 DOI: 10.1371/journal.pcbi.1009426] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2021] [Revised: 09/28/2021] [Accepted: 09/06/2021] [Indexed: 11/18/2022] Open
Abstract
Bee-mediated pollination greatly increases the size and weight of tomato fruits. Therefore, distinguishing between the local set of bees–those that are efficient pollinators–is essential to improve the economic returns for farmers. To achieve this, it is important to know the identity of the visiting bees. Nevertheless, the traditional taxonomic identification of bees is not an easy task, requiring the participation of experts and the use of specialized equipment. Due to these limitations, the development and implementation of new technologies for the automatic recognition of bees become relevant. Hence, we aim to verify the capacity of Machine Learning (ML) algorithms in recognizing the taxonomic identity of visiting bees to tomato flowers based on the characteristics of their buzzing sounds. We compared the performance of the ML algorithms combined with the Mel Frequency Cepstral Coefficients (MFCC) and with classifications based solely on the fundamental frequency, leading to a direct comparison between the two approaches. In fact, some classifiers powered by the MFCC–especially the SVM–achieved better performance compared to the randomized and sound frequency-based trials. Moreover, the buzzing sounds produced during sonication were more relevant for the taxonomic recognition of bee species than analysis based on flight sounds alone. On the other hand, the ML classifiers performed better in recognizing bees genera based on flight sounds. Despite that, the maximum accuracy obtained here (73.39% by SVM) is still low compared to ML standards. Further studies analyzing larger recording samples, and applying unsupervised learning systems may yield better classification performance. Therefore, ML techniques could be used to automate the taxonomic recognition of flower-visiting bees of the cultivated tomato and other buzz-pollinated crops. This would be an interesting option for farmers and other professionals who have no experience in bee taxonomy but are interested in improving crop yields by increasing pollination. Bees are the most important pollinators of cultivated tomatoes. We also know that the distinct species of bees have different performances as pollinators, and these performances are directly related to the size and weight of the fruits. Moreover, the characteristics of the buzzing sounds tend to vary between the bee species. However, the buzzing sounds are complex and can widely vary over time, making the analysis of this data difficult using the usual statistical methods in Ecology. In the face of this problem, we proposed to automatically recognize pollinating bees of tomato flowers based on their buzzing sounds using Machine Learning (ML) tools. In fact, we found that the ML algorithms are capable of recognizing bees just based on their buzzing sounds. This could lead to automating the recognition of flower-visiting bees of the cultivated tomato, which would be a nice option for farmers and other professionals who have no experience in bee taxonomy but are interested in improving crop yields. On the other hand, this encourages the farmer to adopt sustainable agricultural practices for the conservation of native tomato pollinators. To achieve this goal, the next step is to develop applications compatible with smartphones capable of recognizing bees by their buzzing sounds.
Collapse
Affiliation(s)
| | | | | | | | - Thierson Couto Rosa
- Instituto de Informática, Universidade Federal de Goiás, Goiánia, Goiás, Brazil
| | - José Neiva Mesquita-Neto
- Centro de Investigación en Estudios Avanzados del Maule, Vicerrectoría de Investigación y Postgrado, Universidad Católica del Maule, Talca, Chile
- * E-mail:
| |
Collapse
|
44
|
Mao J, Miao J, Lu Y, Tong Z. Machine learning of materials design and state prediction for lithium ion batteries. Chin J Chem Eng 2021. [DOI: 10.1016/j.cjche.2021.04.009] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]
|
45
|
Potential distribution of piscivores across the Atlantic Forest: From bats and marsupials to large-bodied mammals under a trophic-guild viewpoint. ECOL INFORM 2021. [DOI: 10.1016/j.ecoinf.2021.101357] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/31/2023]
|
46
|
Bourel M, Segura AM, Crisci C, López G, Sampognaro L, Vidal V, Kruk C, Piccini C, Perera G. Machine learning methods for imbalanced data set for prediction of faecal contamination in beach waters. WATER RESEARCH 2021; 202:117450. [PMID: 34352535 DOI: 10.1016/j.watres.2021.117450] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/11/2020] [Revised: 07/09/2021] [Accepted: 07/15/2021] [Indexed: 06/13/2023]
Abstract
Predicting water contamination by statistical models is a useful tool to manage health risk in recreational beaches. Extreme contamination events, i.e. those exceeding normative are generally rare with respect to bathing conditions and thus the data is said to be imbalanced. Modeling and predicting those rare events present unique challenges. Here we introduce and evaluate several machine learning techniques and metrics to model imbalanced data and evaluate model performance. We do so by using a) simulated data-sets and b) a real data base with records of faecal coliform abundance monitored for 10 years in 21 recreational beaches in Uruguay (N ≈ 19000) using in situ and meteorological variables. We discuss advantages and disadvantages of the methods and provide a simple guide to perform models for a general audience. We also provide R codes to reproduce model fitting and testing. We found that most Machine Learning techniques are sensitive to imbalance and require specific data pre-treatment (e.g. upsampling) to improve performance. Accuracy (i.e. correctly classified cases over total cases) is not adequate to evaluate model performance on imbalanced data set. Instead, true positive rates (TPR) and false positive rates (FPR) are recommended. Among the 52 possible candidate algorithms tested, the stratified Random forest presented the better performance improving TPR in 50% with respect to baseline (0.4) and outperformed baseline in the evaluated metrics. Support vector machines combined with upsampling method or synthetic minority oversampling technique (SMOTE) performed well, similar to Adaboost with SMOTE. These results suggests that combining modeling strategies is necessary to improve our capacity to anticipate water contamination and avoid health risk.
Collapse
Affiliation(s)
- Mathias Bourel
- IMERL, Facultad de Ingeniería, Universidad de la República, Montevideo, Uruguay; Departamento de Modelización Estadística de Datos e Inteligencia Artificial (MEDIA), Centro Universitario Regional Este, Universidad de la República, Rocha, Uruguay.
| | - Angel M Segura
- Departamento de Modelización Estadística de Datos e Inteligencia Artificial (MEDIA), Centro Universitario Regional Este, Universidad de la República, Rocha, Uruguay
| | - Carolina Crisci
- Departamento de Modelización Estadística de Datos e Inteligencia Artificial (MEDIA), Centro Universitario Regional Este, Universidad de la República, Rocha, Uruguay
| | - Guzmán López
- Departamento de Modelización Estadística de Datos e Inteligencia Artificial (MEDIA), Centro Universitario Regional Este, Universidad de la República, Rocha, Uruguay
| | - Lia Sampognaro
- Departamento de Modelización Estadística de Datos e Inteligencia Artificial (MEDIA), Centro Universitario Regional Este, Universidad de la República, Rocha, Uruguay
| | - Victoria Vidal
- Departamento de Modelización Estadística de Datos e Inteligencia Artificial (MEDIA), Centro Universitario Regional Este, Universidad de la República, Rocha, Uruguay
| | - Carla Kruk
- Departamento de Modelización Estadística de Datos e Inteligencia Artificial (MEDIA), Centro Universitario Regional Este, Universidad de la República, Rocha, Uruguay; Departamento de Microbiología, Instituto de Investigaciones Biológicas Clemente Estable, Ministerio de Educación y Cultura, Montevideo, Uruguay; Instituto de Ecología y Ciencias Ambientales, Facultad de Ciencias, Universidad de la República, Montevideo, Uruguay
| | - Claudia Piccini
- Departamento de Modelización Estadística de Datos e Inteligencia Artificial (MEDIA), Centro Universitario Regional Este, Universidad de la República, Rocha, Uruguay; Departamento de Microbiología, Instituto de Investigaciones Biológicas Clemente Estable, Ministerio de Educación y Cultura, Montevideo, Uruguay
| | - Gonzalo Perera
- Departamento de Modelización Estadística de Datos e Inteligencia Artificial (MEDIA), Centro Universitario Regional Este, Universidad de la República, Rocha, Uruguay
| |
Collapse
|
47
|
Campos TL, Korhonen PK, Hofmann A, Gasser RB, Young ND. Harnessing model organism genomics to underpin the machine learning-based prediction of essential genes in eukaryotes - Biotechnological implications. Biotechnol Adv 2021; 54:107822. [PMID: 34461202 DOI: 10.1016/j.biotechadv.2021.107822] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2021] [Revised: 08/17/2021] [Accepted: 08/24/2021] [Indexed: 12/17/2022]
Abstract
The availability of high-quality genomes and advances in functional genomics have enabled large-scale studies of essential genes in model eukaryotes, including the 'elegant worm' (Caenorhabditis elegans; Nematoda) and the 'vinegar fly' (Drosophila melanogaster; Arthropoda). However, this is not the case for other, much less-studied organisms, such as socioeconomically important parasites, for which functional genomic platforms usually do not exist. Thus, there is a need to develop innovative techniques or approaches for the prediction, identification and investigation of essential genes. A key approach that could enable the prediction of such genes is machine learning (ML). Here, we undertake an historical review of experimental and computational approaches employed for the characterisation of essential genes in eukaryotes, with a particular focus on model ecdysozoans (C. elegans and D. melanogaster), and discuss the possible applicability of ML-approaches to organisms such as socioeconomically important parasites. We highlight some recent results showing that high-performance ML, combined with feature engineering, allows a reliable prediction of essential genes from extensive, publicly available 'omic data sets, with major potential to prioritise such genes (with statistical confidence) for subsequent functional genomic validation. These findings could 'open the door' to fundamental and applied research areas. Evidence of some commonality in the essential gene-complement between these two organisms indicates that an ML-engineering approach could find broader applicability to ecdysozoans such as parasitic nematodes or arthropods, provided that suitably large and informative data sets become/are available for proper feature engineering, and for the robust training and validation of algorithms. This area warrants detailed exploration to, for example, facilitate the identification and characterisation of essential molecules as novel targets for drugs and vaccines against parasitic diseases. This focus is particularly important, given the substantial impact that such diseases have worldwide, and the current challenges associated with their prevention and control and with drug resistance in parasite populations.
Collapse
Affiliation(s)
- Tulio L Campos
- Department of Veterinary Biosciences, Melbourne Veterinary School, The University of Melbourne, Parkville, Victoria 3010, Australia; Bioinformatics Core Facility, Instituto Aggeu Magalhães, Fundação Oswaldo Cruz (IAM-Fiocruz), Recife, Pernambuco, Brazil
| | - Pasi K Korhonen
- Department of Veterinary Biosciences, Melbourne Veterinary School, The University of Melbourne, Parkville, Victoria 3010, Australia
| | - Andreas Hofmann
- Department of Veterinary Biosciences, Melbourne Veterinary School, The University of Melbourne, Parkville, Victoria 3010, Australia
| | - Robin B Gasser
- Department of Veterinary Biosciences, Melbourne Veterinary School, The University of Melbourne, Parkville, Victoria 3010, Australia.
| | - Neil D Young
- Department of Veterinary Biosciences, Melbourne Veterinary School, The University of Melbourne, Parkville, Victoria 3010, Australia.
| |
Collapse
|
48
|
Urban Heat Island and Its Regional Impacts Using Remotely Sensed Thermal Data—A Review of Recent Developments and Methodology. LAND 2021. [DOI: 10.3390/land10080867] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]
Abstract
Many novel research algorithms have been developed to analyze urban heat island (UHI) and UHI regional impacts (UHIRIP) with remotely sensed thermal data tables. We present a comprehensive review of some important aspects of UHI and UHIRIP studies that use remotely sensed thermal data, including concepts, datasets, methodologies, and applications. We focus on reviewing progress on multi-sensor image selection, preprocessing, computing, gap filling, image fusion, deep learning, and developing new metrics. This literature review shows that new satellite sensors and valuable methods have been developed for calculating land surface temperature (LST) and UHI intensity, and for assessing UHIRIP. Additionally, some of the limitations of using remotely sensed data to analyze the LST, UHI, and UHI intensity are discussed. Finally, we review a variety of applications in UHI and UHIRIP analyses. The assimilation of time-series remotely sensed data with the application of data fusion, gap filling models, and deep learning using the Google Cloud platform and Google Earth Engine platform also has the potential to improve the estimation accuracy of change patterns of UHI and UHIRIP over long time periods.
Collapse
|
49
|
Bellin N, Calzolari M, Callegari E, Bonilauri P, Grisendi A, Dottori M, Rossi V. Geometric morphometrics and machine learning as tools for the identification of sibling mosquito species of the Maculipennis complex (Anopheles). INFECTION GENETICS AND EVOLUTION 2021; 95:105034. [PMID: 34384936 DOI: 10.1016/j.meegid.2021.105034] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/04/2021] [Revised: 07/28/2021] [Accepted: 08/07/2021] [Indexed: 11/29/2022]
Abstract
Geometric morphometrics allows researchers to use the specific software to quantify and to visualize morphological differences between taxa from insect wings. Our objective was to assess wing geometry to distinguish four Anopheles sibling species of the Maculipennis complex, An. maculipennis s. s., An. daciae sp. inq., An. atroparvus and An. melanoon, found in Northern Italy. We combined the geometric morphometric approach with different machine learning alghorithms: support vector machine (SVM), random forest (RF), artificial neural network (ANN) and an ensemble model (EN). Centroid size was smaller in An. atroparvus than in An. maculipennis s. s. and An. daciae sp. inq. Principal component analysis (PCA) explained only 33% of the total variance and appeared not very useful to discriminate among species, and in particular between An. maculipennis s. s. and An. daciae sp. inq. The performance of four different machine learning alghorithms using procrustes coordinates of wing shape as predictors was evaluated. All models showed ROC-AUC and PRC-AUC values that were higher than the random classifier but the SVM algorithm maximized the most metrics on the test set. The SVM algorithm with radial basis function allowed the correct classification of 83% of An. maculipennis s. s. and 79% of An. daciae sp. inq. ROC-AUC analysis showed that three landmarks, 11, 16 and 15, were the most important procrustes coordinates in mean wing shape comparison between An. maculipennis s. s. and An. daciae sp. inq. The pattern in the three-dimensional space of the most important procrustes coordinates showed a clearer differentiation between the two species than the PCA. Our study demonstrated that machine learning algorithms could be a useful tool combined with the wing geometric morphometric approach.
Collapse
Affiliation(s)
- Nicolò Bellin
- University of Parma, Department of Chemistry, Life Sciences and Environmental Sustainability, Parco Area delle Scienze, 11/A, 43124 Parma, Italy.
| | - Mattia Calzolari
- Istituto Zooprofilattico Sperimentale della Lombardia e dell'Emilia Romagna "B. Ubertini" (IZSLER), Brescia, Italy
| | - Emanuele Callegari
- Istituto Zooprofilattico Sperimentale della Lombardia e dell'Emilia Romagna "B. Ubertini" (IZSLER), Brescia, Italy
| | - Paolo Bonilauri
- Istituto Zooprofilattico Sperimentale della Lombardia e dell'Emilia Romagna "B. Ubertini" (IZSLER), Brescia, Italy
| | - Annalisa Grisendi
- Istituto Zooprofilattico Sperimentale della Lombardia e dell'Emilia Romagna "B. Ubertini" (IZSLER), Brescia, Italy
| | - Michele Dottori
- Istituto Zooprofilattico Sperimentale della Lombardia e dell'Emilia Romagna "B. Ubertini" (IZSLER), Brescia, Italy
| | - Valeria Rossi
- University of Parma, Department of Chemistry, Life Sciences and Environmental Sustainability, Parco Area delle Scienze, 11/A, 43124 Parma, Italy
| |
Collapse
|
50
|
Ahmad W, Wang B, Xu H, Xu M, Zeng Z. Topics, Sentiments, and Emotions Triggered by COVID-19-Related Tweets from IRAN and Turkey Official News Agencies. SN COMPUTER SCIENCE 2021; 2:394. [PMID: 34341778 PMCID: PMC8319903 DOI: 10.1007/s42979-021-00789-0] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/23/2021] [Accepted: 07/20/2021] [Indexed: 11/28/2022]
Abstract
There is no doubt that the COVID-19 epidemic posed the most significant challenge to all governments globally since January 2020. People have to readapt after the epidemic to daily life with the absence of an effective vaccine for a long time. The epidemic has led to society division and uncertainty. With such issues, governments have to take efficient procedures to fight the epidemic. In this paper, we analyze and discuss two official news agencies' tweets of Iran and Turkey by using sentiment- and semantic analysis-based unsupervised learning approaches. The main topics, sentiments, and emotions that accompanied the agencies' tweets are identified and compared. The results are analyzed from the perspective of psychology, sociology, and communication.
Collapse
Affiliation(s)
- Waseem Ahmad
- School of Electronic Information and Communications, Huazhong University of Science and Technology (HUST), Wuhan, China
| | - Bang Wang
- School of Electronic Information and Communications, Huazhong University of Science and Technology (HUST), Wuhan, China
| | - Han Xu
- School of Journalism and Information Communication, Huazhong University of Science and Technology (HUST), Wuhan, China
| | - Minghua Xu
- School of Journalism and Information Communication, Huazhong University of Science and Technology (HUST), Wuhan, China
| | - Zeng Zeng
- Institute for Infocomm Research, Agency for Science, Technology and Research (A*STAR), Singapore, Singapore
| |
Collapse
|