1
|
Prediction of Klebsiella phage-host specificity at the strain level. Nat Commun 2024; 15:4355. [PMID: 38778023 PMCID: PMC11111740 DOI: 10.1038/s41467-024-48675-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2023] [Accepted: 05/08/2024] [Indexed: 05/25/2024] Open
Abstract
Phages are increasingly considered promising alternatives to target drug-resistant bacterial pathogens. However, their often-narrow host range can make it challenging to find matching phages against bacteria of interest. Current computational tools do not accurately predict interactions at the strain level in a way that is relevant and properly evaluated for practical use. We present PhageHostLearn, a machine learning system that predicts strain-level interactions between receptor-binding proteins and bacterial receptors for Klebsiella phage-bacteria pairs. We evaluate this system both in silico and in the laboratory, in the clinically relevant setting of finding matching phages against bacterial strains. PhageHostLearn reaches a cross-validated ROC AUC of up to 81.8% in silico and maintains this performance in laboratory validation. Our approach provides a framework for developing and evaluating phage-host prediction methods that are useful in practice, which we believe to be a meaningful contribution to the machine-learning-guided development of phage therapeutics and diagnostics.
Collapse
|
2
|
Temperature-driven dynamics: unraveling the impact of climate change on cryptic species interactions within the Litoditis marina complex. PeerJ 2024; 12:e17324. [PMID: 38784398 PMCID: PMC11114120 DOI: 10.7717/peerj.17324] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2024] [Accepted: 04/10/2024] [Indexed: 05/25/2024] Open
Abstract
Anthropogenic climate change and the associated increase in sea temperatures are projected to greatly impact marine ecosystems. Temperature variation can influence the interactions between species, leading to cascading effects on the abundance, diversity and composition of communities. Such changes in community structure can have consequences on ecosystem stability, processes and the services it provides. Therefore, it is important to better understand the role of species interactions in the development of communities and how they are influenced by environmental factors like temperature. The coexistence of closely related cryptic species, with significant biological and ecological differences, makes this even more complex. This study investigated the effect of temperature on species growth and both intra- and interspecific interactions of three species within the free-living nematode Litoditis marina complex. To achieve this, closed microcosm experiments were conducted on the L. marina species Pm I, Pm III and Pm IV in monoculture and combined cultures at two temperature treatments of 15 °C and 20 °C. A population model was constructed to elucidate and quantify the effects of intra- and interspecific interactions on nematode populations. The relative competitive abilities of the investigated species were quantified using the Modern Coexistence Theory (MCT) framework. Temperature had strong and disparate effects on the population growth of the distinct L. marina species. This indicates temperature could play an important role in the distribution of these cryptic species. Both competitive and facilitative interactions were observed in the experiments. Temperature affected both the type and the strength of the species interactions, suggesting a change in temperature could impact the coexistence of these closely related species, alter community dynamics and consequently affect ecosystem processes and services.
Collapse
|
3
|
Cooperative interactions between invader and resident microbial community members weaken the negative diversity-invasion relationship. Ecol Lett 2024; 27:e14433. [PMID: 38712704 DOI: 10.1111/ele.14433] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2023] [Revised: 04/12/2024] [Accepted: 04/15/2024] [Indexed: 05/08/2024]
Abstract
The negative diversity-invasion relationship observed in microbial invasion studies is commonly explained by competition between the invader and resident populations. However, whether this relationship is affected by invader-resident cooperative interactions is unknown. Using ecological and mathematical approaches, we examined the survival and functionality of Aminobacter niigataensis MSH1 to mineralize 2,6-dichlorobenzamide (BAM), a groundwater micropollutant affecting drinking water production, in sand microcosms when inoculated together with synthetic assemblies of resident bacteria. The assemblies varied in richness and in strains that interacted pairwise with MSH1, including cooperative and competitive interactions. While overall, the negative diversity-invasion relationship was retained, residents engaging in cooperative interactions with the invader had a positive impact on MSH1 survival and functionality, highlighting the dependency of invasion success on community composition. No correlation existed between community richness and the delay in BAM mineralization by MSH1. The findings suggest that the presence of cooperative residents can alleviate the negative diversity-invasion relationship.
Collapse
|
4
|
Rapid and non-destructive microbial quality prediction of fresh pork stored under modified atmospheres by using selected-ion flow-tube mass spectrometry and machine learning. Meat Sci 2024; 213:109505. [PMID: 38579509 DOI: 10.1016/j.meatsci.2024.109505] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2023] [Revised: 03/08/2024] [Accepted: 03/27/2024] [Indexed: 04/07/2024]
Abstract
Volatile organic compounds (VOCs) indicative of pork microbial spoilage can be quantified rapidly at trace levels using selected-ion flow-tube mass spectrometry (SIFT-MS). Packaging atmosphere is one of the factors influencing VOC production patterns during storage. On this basis, machine learning would help to process complex volatolomic data and predict pork microbial quality efficiently. This study focused on (1) investigating model generalizability based on different nested cross-validation settings, and (2) comparing the predictive power and feature importance of nine algorithms, including Artificial Neural Network (ANN), k-Nearest Neighbors, Support Vector Regression, Decision Tree, Partial Least Squares Regression, and four ensemble learning models. The datasets used contain 37 VOCs' concentrations (input) and total plate counts (TPC, output) of 350 pork samples with different storage times, including 225 pork loin samples stored under three high-O2 and three low-O2 conditions, and 125 commercially packaged products. An appropriate choice of cross-validation strategies resulted in trustworthy and relevant predictions. When trained on all possible selections of two high-O2 and two low-O2 conditions, ANNs produced satisfactory TPC predictions of unseen test scenarios (one high-O2 condition, one low-O2 condition, and the commercial products). ANN-based bagging outperformed other employed models, when TPC exceeded ca. 6 log CFU/g. VOCs including benzaldehyde, 3-methyl-1-butanol, ethanol and methyl mercaptan were identified with high feature importance. This elaborated case study illustrates great prospects of real-time detection techniques and machine learning in meat quality prediction. Further investigations on handling low VOC levels would enhance the model performance and decision making in commercial meat quality control.
Collapse
|
5
|
A comparison of embedding aggregation strategies in drug-target interaction prediction. BMC Bioinformatics 2024; 25:59. [PMID: 38321386 PMCID: PMC10845509 DOI: 10.1186/s12859-024-05684-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2023] [Accepted: 01/30/2024] [Indexed: 02/08/2024] Open
Abstract
The prediction of interactions between novel drugs and biological targets is a vital step in the early stage of the drug discovery pipeline. Many deep learning approaches have been proposed over the last decade, with a substantial fraction of them sharing the same underlying two-branch architecture. Their distinction is limited to the use of different types of feature representations and branches (multi-layer perceptrons, convolutional neural networks, graph neural networks and transformers). In contrast, the strategy used to combine the outputs (embeddings) of the branches has remained mostly the same. The same general architecture has also been used extensively in the area of recommender systems, where the choice of an aggregation strategy is still an open question. In this work, we investigate the effectiveness of three different embedding aggregation strategies in the area of drug-target interaction (DTI) prediction. We formally define these strategies and prove their universal approximator capabilities. We then present experiments that compare the different strategies on benchmark datasets from the area of DTI prediction, showcasing conditions under which specific strategies could be the obvious choice.
Collapse
|
6
|
Response of phytoplankton functional types to Hurricane Fabian (2003) in the Sargasso Sea. MARINE ENVIRONMENTAL RESEARCH 2023; 190:106079. [PMID: 37473599 DOI: 10.1016/j.marenvres.2023.106079] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/21/2023] [Revised: 06/16/2023] [Accepted: 07/04/2023] [Indexed: 07/22/2023]
Abstract
Understanding how tropical cyclones affect phytoplankton communities is important for studies on ecological variability. Most studies assessing the post-storm phytoplankton response rely on satellite observations of chlorophyll a concentration, which inform on the ocean surface conditions and the whole phytoplankton community. In this work, we assess the potential of the Massachusetts Institute of Technology marine ecosystem model to account for the response of individual phytoplankton functional types (PFTs, coccolithophores, diatoms, diazotrophs, mixotrophic dinoflagellates, picoeukaryotes, Prochlorococcus and Synechococcus) in the euphotic zone to the passage of Hurricane Fabian (2003) across the tropical and subtropical Sargasso Sea. Fabian induced a significant mean concentration increase (t-test, p < 0.05) of all PFTs in the tropical waters (except for Prochlorococcus), which was driven by the mean nutrient concentration increase and by a limited zooplankton grazing pressure. More specifically, the post-storm nutrient enrichment increased the contribution of fast-growing PFTs (e.g. diatoms and coccolithophores) to the total phytoplankton concentration and decreased the contribution of slow-growing dominant groups (e.g. picoeukaryotes, Prochlorococcus and Synechococcus), which lead to a significant increase (t-test, p < 0.05) of the Shannon diversity index values. Overall, the model captured the causal relationship between nutrient and PFT concentration increases in the tropical waters, although it only reproduced the most pronounced PFT responses such as those in the deep euphotic zone. In contrast, the model did not capture the oceanic perturbations induced by Fabian as observed in satellite imagery in the subtropical waters, probably due to its limited performance in this complex oceanographic area.
Collapse
|
7
|
When driving becomes risky: Micro-scale variants of the lane-changing maneuver in highway traffic. TRAFFIC INJURY PREVENTION 2023; 24:583-591. [PMID: 37565705 DOI: 10.1080/15389588.2023.2242993] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/05/2023] [Revised: 07/25/2023] [Accepted: 07/26/2023] [Indexed: 08/12/2023]
Abstract
Objective: Vehicular lane-changing is one of the riskiest driving maneuvers. Since vehicular automation is quickly becoming a reality, it is crucial to be able to identify when such a maneuver can turn into a risky situation. Recently, it has been shown that a qualitative approach: the Point Descriptor Precedence (PDP) representation, is able to do so. Therefore, this study aims to investigate whether the PDP representation can detect hazardous micro movements during lane-changing maneuvers in a situation of structural congestion in the morning and/or evening.Method: The approach involves analyzing a large real-world traffic dataset using the PDP representation and adding safety distance points to distinguish subtle movement patterns.Results: Based on these subtleties, we label four out of seven and five out of nine lane-change maneuvers as risky during the selected peak and the off-peak traffic hours respectively.Conclusions: The results show that the approach can identify risky movement patterns in traffic. The PDP representation can be used to check whether certain adjustments (e.g., changing the maximum speed) have a significant impact on the number of dangerous behaviors, which is important for improving road safety. This approach has practical applications in penalizing traffic violations, improving traffic flow, and providing valuable information for policymakers and transport experts. It can also be used to train autonomous vehicles in risky driving situations.
Collapse
|
8
|
Plant impedance spectroscopy: a review of modeling approaches and applications. FRONTIERS IN PLANT SCIENCE 2023; 14:1187573. [PMID: 37588419 PMCID: PMC10426379 DOI: 10.3389/fpls.2023.1187573] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 03/16/2023] [Accepted: 06/20/2023] [Indexed: 08/18/2023]
Abstract
Electrochemical impedance spectroscopy has emerged over the past decade as an efficient, non-destructive method to investigate various (eco-)physiological and morphological properties of plants. This work reviews the state-of-the-art of impedance spectra modeling for plant applications. In addition to covering the traditional, widely-used representations of electrochemical impedance spectra, we also consider the more recent machine-learning-based approaches.
Collapse
|
9
|
Oracle selection provides insight into how far off practice is from Utopia in plant breeding. FRONTIERS IN PLANT SCIENCE 2023; 14:1218665. [PMID: 37546253 PMCID: PMC10401442 DOI: 10.3389/fpls.2023.1218665] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 05/07/2023] [Accepted: 06/27/2023] [Indexed: 08/08/2023]
Abstract
Since the introduction of genomic selection in plant breeding, high genetic gains have been realized in different plant breeding programs. Various methods based on genomic estimated breeding values (GEBVs) for selecting parental lines that maximize the genetic gain as well as methods for improving the predictive performance of genomic selection have been proposed. Unfortunately, it remains difficult to measure to what extent these methods really maximize long-term genetic values. In this study, we propose oracle selection, a hypothetical frame of mind that uses the ground truth to optimally select parents or optimize the training population in order to maximize the genetic gain in each breeding cycle. Clearly, oracle selection cannot be applied in a true breeding program, but allows for the assessment of existing parental selection and training population update methods and the evaluation of how far these methods are from the optimal utopian solution.
Collapse
|
10
|
No bacterial-mediated alleviation of thermal stress in a brown seaweed suggests the absence of ecological bacterial rescue effects. THE SCIENCE OF THE TOTAL ENVIRONMENT 2023; 876:162532. [PMID: 36870499 DOI: 10.1016/j.scitotenv.2023.162532] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/21/2022] [Revised: 02/24/2023] [Accepted: 02/25/2023] [Indexed: 06/18/2023]
Abstract
While microbiome alterations are increasingly proposed as a rapid mechanism to buffer organisms under changing environmental conditions, studies of these processes in the marine realm are lagging far behind their terrestrial counterparts. Here, we used a controlled laboratory experiment to examine whether the thermal tolerance of the brown seaweed Dictyota dichotoma, a common species in European coastal ecosystems, could be enhanced by the repeated addition of bacteria from its natural environment. Juvenile algae from three genotypes were subjected for two weeks to a temperature gradient, spanning almost the entire thermal range that can be tolerated by the species (11-30 °C). At the start of the experiment and again in the middle of the experiment, the algae were inoculated with bacteria from their natural environment or left untouched as a control. Relative growth rate was measured over the two-week period, and we assessed bacterial community composition prior to and at the end of the experiment. Since the growth of D. dichotoma over the full thermal gradient was not affected by supplementing bacteria, our results indicate no scope for bacterial-mediated stress alleviation. The minimal changes in the bacterial communities linked to bacterial addition, particularly at temperatures above the thermal optimum (22-23 °C), suggest the existence of a barrier to bacterial recruitment. These findings indicate that ecological bacterial rescue is unlikely to play a role in mitigating the effects of ocean warming on this brown seaweed.
Collapse
|
11
|
The Potential of Surveillance Data for Dengue Risk Mapping: An Evaluation of Different Approaches in Cuba. Trop Med Infect Dis 2023; 8:tropicalmed8040230. [PMID: 37104355 PMCID: PMC10143650 DOI: 10.3390/tropicalmed8040230] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2023] [Revised: 04/03/2023] [Accepted: 04/11/2023] [Indexed: 04/28/2023] Open
Abstract
To better guide dengue prevention and control efforts, the use of routinely collected data to develop risk maps is proposed. For this purpose, dengue experts identified indicators representative of entomological, epidemiological and demographic risks, hereafter called components, by using surveillance data aggregated at the level of Consejos Populares (CPs) in two municipalities of Cuba (Santiago de Cuba and Cienfuegos) in the period of 2010-2015. Two vulnerability models (one with equally weighted components and one with data-derived weights using Principal Component Analysis), and three incidence-based risk models were built to construct risk maps. The correlation between the two vulnerability models was high (tau > 0.89). The single-component and multicomponent incidence-based models were also highly correlated (tau ≥ 0.9). However, the agreement between the vulnerability- and the incidence-based risk maps was below 0.6 in the setting with a prolonged history of dengue transmission. This may suggest that an incidence-based approach does not fully reflect the complexity of vulnerability for future transmission. The small difference between single- and multicomponent incidence maps indicates that in a setting with a narrow availability of data, simpler models can be used. Nevertheless, the generalized linear mixed multicomponent model provides information of covariate-adjusted and spatially smoothed relative risks of disease transmission, which can be important for the prospective evaluation of an intervention strategy. In conclusion, caution is needed when interpreting risk maps, as the results vary depending on the importance given to the components involved in disease transmission. The multicomponent vulnerability mapping needs to be prospectively validated based on an intervention trial targeting high-risk areas.
Collapse
|
12
|
Seven-state rotation-symmetric number-conserving cellular automaton that is not isomorphic to any septenary one. Phys Rev E 2023; 107:024211. [PMID: 36932560 DOI: 10.1103/physreve.107.024211] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2022] [Accepted: 02/01/2023] [Indexed: 06/18/2023]
Abstract
We consider two-dimensional cellular automata with the von Neumann neighborhood that satisfy two properties of interest from a modeling viewpoint: rotation symmetry (i.e., the local rule is invariant under rotation of the neighborhood by 90^{∘}) and number conservation (i.e., the sum of all the cell states is conserved upon every update). It is known that if the number of states k is smaller than or equal to six, then each rotation-symmetric number-conserving cellular automaton is isomorphic to some k-ary one, i.e., one with state set {0,1,...,k-1}. In this paper, we exhibit an example of a seven-state rotation-symmetric number-conserving cellular automaton that is not isomorphic to any septenary one. This example strongly supports our plea that research into multistate cellular automata should not only focus on those that have {0,1,...,k-1} as a state set.
Collapse
|
13
|
Combining natural language processing and multidimensional classifiers to predict and correct CMMS metadata. COMPUT IND 2023. [DOI: 10.1016/j.compind.2022.103830] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]
|
14
|
Non-uniform number-conserving elementary cellular automata. Inf Sci (N Y) 2023. [DOI: 10.1016/j.ins.2023.01.033] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/14/2023]
|
15
|
|
16
|
Large-scale activity-based SCRA screening on patient serum samples: CB1 bioassay supported by machine learning. TOXICOLOGIE ANALYTIQUE ET CLINIQUE 2022. [DOI: 10.1016/j.toxac.2022.06.060] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/15/2022]
|
17
|
Moisture Dynamics of Wood-Based Panels and Wood Fibre Insulation Materials. FRONTIERS IN PLANT SCIENCE 2022; 13:951175. [PMID: 35909717 PMCID: PMC9330446 DOI: 10.3389/fpls.2022.951175] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 05/23/2022] [Accepted: 06/21/2022] [Indexed: 06/15/2023]
Abstract
Moisture performance is an important factor determining the resistance of wood-based building materials against fungal decay. Understanding how material porosity and chemistry affect moisture performance is necessary for their efficient use, as well as for product optimisation. In this study, three complementary techniques (X-ray computed tomography, infrared and low-field NMR spectroscopy) are applied to elucidate the influence of additives, manufacturing process and material structure on the liquid water absorption and desorption behaviour of a selection of wood-based panels, thermally modified wood and wood fibre insulation materials. Hydrophobic properties achieved by thermal treatment or hydrophobic additives such as paraffin and bitumen, had a major influence on water absorption and desorption rates. When hydrophobic additives did not play a role, pore distributions and manufacturing process had a decisive influence on the amount and rate of absorption and desorption. In that case, a higher porosity resulted in a higher water absorption rate. Our results show that there is a clear potential for tailoring materials towards specific moisture performance by better understanding the influence of different material characteristics. This is useful both for achieving desired moisture buffering as well as to increase service life of wood-based materials. From a sustainability perspective, fit-for-purpose moisture performance is often easier to achieve and preferred than wood protection by biocide preservative treatments.
Collapse
|
18
|
Qualitative Team Formation Analysis in Football: A Case Study of the 2018 FIFA World Cup. Front Psychol 2022; 13:863216. [PMID: 35899012 PMCID: PMC9309202 DOI: 10.3389/fpsyg.2022.863216] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2022] [Accepted: 05/03/2022] [Indexed: 11/18/2022] Open
Abstract
In this paper, we explore the use of the Static Qualitative Trajectory Calculus (QTCS), a qualitative spatiotemporal method based on the QTC, for the analysis of team formations in football. While methods for team formation analysis in sports are predominantly quantitative in nature, QTCS enables the comparison of team formations by describing the relative positions between players in a qualitative manner, which is more related to the way players position themselves on the field. QTCS has the potential to allow to monitor to what extent a football team plays according to a coach’s predetermined formation. When applied to multiple matches of one team, the method can contribute to the definition of the playing style of a team. We present an experiment aimed at identifying the team formation played by Belgian national football team during the 2018 FIFA World Cup held in France.
Collapse
|
19
|
A Nearest Neighbor Open-set Classifier based on Excesses of Distance Ratios. J Comput Graph Stat 2022. [DOI: 10.1080/10618600.2022.2096621] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/17/2022]
|
20
|
Covering the Combinatorial Design Space of Multiplex CRISPR/Cas Experiments in Plants. FRONTIERS IN PLANT SCIENCE 2022; 13:907095. [PMID: 35795354 PMCID: PMC9251496 DOI: 10.3389/fpls.2022.907095] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 03/29/2022] [Accepted: 05/16/2022] [Indexed: 06/15/2023]
Abstract
Over the past years, CRISPR/Cas-mediated genome editing has revolutionized plant genetic studies and crop breeding. Specifically, due to its ability to simultaneously target multiple genes, the multiplex CRISPR/Cas system has emerged as a powerful technology for functional analysis of genetic pathways. As such, it holds great potential for application in plant systems to discover genetic interactions and to improve polygenic agronomic traits in crop breeding. However, optimal experimental design regarding coverage of the combinatorial design space in multiplex CRISPR/Cas screens remains largely unexplored. To contribute to well-informed experimental design of such screens in plants, we first establish a representation of the design space at different stages of a multiplex CRISPR/Cas experiment. We provide two independent computational approaches yielding insights into the plant library size guaranteeing full coverage of all relevant multiplex combinations of gene knockouts in a specific multiplex CRISPR/Cas screen. These frameworks take into account several design parameters (e.g., the number of target genes, the number of gRNAs designed per gene, and the number of elements in the combinatorial array) and efficiencies at subsequent stages of a multiplex CRISPR/Cas experiment (e.g., the distribution of gRNA/Cas delivery, gRNA-specific mutation efficiency, and knockout efficiency). With this work, we intend to raise awareness about the limitations regarding the number of target genes and order of genetic interaction that can be realistically analyzed in multiplex CRISPR/Cas experiments with a given number of plants. Finally, we establish guidelines for designing multiplex CRISPR/Cas experiments with an optimal coverage of the combinatorial design space at minimal plant library size.
Collapse
|
21
|
Improved wood species identification based on multi-view imagery of the three anatomical planes. PLANT METHODS 2022; 18:79. [PMID: 35690828 PMCID: PMC9188236 DOI: 10.1186/s13007-022-00910-1] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/22/2021] [Accepted: 05/18/2022] [Indexed: 06/15/2023]
Abstract
BACKGROUND The identification of tropical African wood species based on microscopic imagery is a challenging problem due to the heterogeneous nature of the composition of wood combined with the vast number of candidate species. Image classification methods that rely on machine learning can facilitate this identification, provided that sufficient training material is available. Despite the fact that the three main anatomical sections contain information that is relevant for species identification, current methods only rely on transverse sections. Additionally, commonly used procedures for evaluating the performance of these methods neglect the fact that multiple images often originate from the same tree, leading to an overly optimistic estimate of the performance. RESULTS We introduce a new image dataset containing microscopic images of the three main anatomical sections of 77 Congolese wood species. A dedicated multi-view image classification method is developed and obtains an accuracy (computed using the naive but common approach) of 95%, outperforming the single-view methods by a large margin. An in-depth analysis shows that naive accuracy estimates can lead to a dramatic over-prediction, of up to 60%, of the accuracy. CONCLUSIONS Additional images from non-transverse sections can boost the performance of machine-learning-based wood species identification methods. Additionally, care should be taken when evaluating the performance of machine-learning-based wood species identification methods to avoid an overestimation of the performance.
Collapse
|
22
|
Valid prediction intervals for regression problems. Artif Intell Rev 2022. [DOI: 10.1007/s10462-022-10178-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
|
23
|
Hybrid differential equations: Integrating mechanistic and data-driven techniques for modelling of water systems. WATER RESEARCH 2022; 213:118166. [PMID: 35158263 DOI: 10.1016/j.watres.2022.118166] [Citation(s) in RCA: 13] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/24/2021] [Revised: 01/10/2022] [Accepted: 02/04/2022] [Indexed: 06/14/2023]
Abstract
Mathematical modelling is increasingly used to improve the design, understanding, and operation of water systems. Two modelling paradigms, i.e., mechanistic and data-driven modelling, are dominant in the water sector, both with their advantages and drawbacks. Hybrid modelling aims to combine the strengths of both paradigms. Here, we introduce a novel framework that incorporates a data-driven component into an existing activated sludge model of a water resource recovery facility. In contrast to previous efforts, we tightly integrate both models by incorporating a neural differential equation into an existing mechanistic ODE model. This machine learning component fills in the knowledge gaps of the mechanistic model. We show that this approach improves the predictive capabilities of the mechanistic model and is able to extrapolate to unseen conditions, a problematic task for data-driven models. This approach holds tremendous potential for systems that are difficult to model using the mechanistic paradigm only.
Collapse
|
24
|
Abstract
Dietary diversity is an established public health principle, and its measurement is essential for studies of diet quality and food security. However, conventional between food group scores fail to capture the nutritional variability and ecosystem services delivered by dietary richness and dissimilarity within food groups, or the relative distribution (i.e., evenness or moderation) of e.g., species or varieties across whole diets. Summarizing food biodiversity in an all-encompassing index is problematic. Therefore, various diversity indices have been proposed in ecology, yet these require methodological adaption for integration in dietary assessments. In this narrative review, we summarize the key conceptual issues underlying the measurement of food biodiversity at an edible species level, assess the ecological diversity indices previously applied to food consumption and food supply data, discuss their relative suitability, and potential amendments for use in (quantitative) dietary intake studies. Ecological diversity indices are often used without justification through the lens of nutrition. To illustrate: (i) dietary species richness fails to account for the distribution of foods across the diet or their functional traits; (ii) evenness indices, such as the Gini-Simpson index, require widely accepted relative abundance units (e.g., kcal, g, cups) and evidence-based moderation weighting factors; and (iii) functional dissimilarity indices are constructed based on an arbitrary selection of distance measures, cutoff criteria, and number of phylogenetic, nutritional, and morphological traits. Disregard for these limitations can lead to counterintuitive results and ambiguous or incorrect conclusions about the food biodiversity within diets or food systems. To ensure comparability and robustness of future research, we advocate food biodiversity indices that: (i) satisfy key axioms; (ii) can be extended to account for disparity between edible species; and (iii) are used in combination, rather than in isolation.Supplemental data for this article is available online at https://doi.org/10.1080/10408398.2022.2051163 .
Collapse
|
25
|
A characterization of idempotent nullnorms on bounded lattices. Inf Sci (N Y) 2022. [DOI: 10.1016/j.ins.2021.12.004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
26
|
BioCCP.jl: collecting coupons in combinatorial biotechnology. Bioinformatics 2022; 38:1144-1145. [PMID: 34788379 DOI: 10.1093/bioinformatics/btab775] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2021] [Revised: 10/15/2021] [Accepted: 11/08/2021] [Indexed: 02/03/2023] Open
Abstract
SUMMARY In combinatorial biotechnology, it is crucial for screening experiments to sufficiently cover the design space. In the BioCCP.jl package (Julia), we provide functions for minimum sample size determination based on the mathematical framework coined the Coupon Collector Problem. AVAILABILITY AND IMPLEMENTATION BioCCP.jl, including source code, documentation and Pluto notebooks, is available at https://github.com/kirstvh/BioCCP.jl.
Collapse
|
27
|
Interspecies Interactions of the 2,6-Dichlorobenzamide Degrading Aminobacter sp. MSH1 with Resident Sand Filter Bacteria: Indications for Mutual Cooperative Interactions That Improve BAM Mineralization Activity. ENVIRONMENTAL SCIENCE & TECHNOLOGY 2022; 56:1352-1364. [PMID: 34982540 DOI: 10.1021/acs.est.1c06653] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Bioaugmentation often involves an invasion process requiring the establishment and activity of a foreign microbe in the resident community of the target environment. Interactions with resident micro-organisms, either antagonistic or cooperative, are believed to impact invasion. However, few studies have examined the variability of interactions between an invader and resident species of its target environment, and none of them considered a bioremediation context. Aminobacter sp. MSH1 mineralizing the groundwater micropollutant 2,6-dichlorobenzamide (BAM), is proposed for bioaugmentation of sand filters used in drinking water production to avert BAM contamination. We examined the nature of the interactions between MSH1 and 13 sand filter resident bacteria in dual and triple species assemblies in sand microcosms. The residents affected MSH1-mediated BAM mineralization without always impacting MSH1 cell densities, indicating effects on cell physiology rather than on cell number. Exploitative competition explained most of the effects (70%), but indications of interference competition were also found. Two residents improved BAM mineralization in dual species assemblies, apparently in a mutual cooperation, and overruled negative effects by others in triple species systems. The results suggest that sand filter communities contain species that increase MSH1 fitness. This opens doors for assisting bioaugmentation through co-inoculation with "helper" bacteria originating from and adapted to the target environment.
Collapse
|
28
|
Multi-target prediction for dummies using two-branch neural networks. Mach Learn 2022. [DOI: 10.1007/s10994-021-06104-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
|
29
|
Digital phagograms: predicting phage infectivity through a multilayer machine learning approach. Curr Opin Virol 2021; 52:174-181. [PMID: 34952265 DOI: 10.1016/j.coviro.2021.12.004] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2021] [Revised: 11/26/2021] [Accepted: 12/04/2021] [Indexed: 12/19/2022]
Abstract
Machine learning has been broadly implemented to investigate biological systems. In this regard, the field of phage biology has embraced machine learning to elucidate and predict phage-host interactions, based on receptor-binding proteins, (anti-)defense systems, prophage detection, and life cycle recognition. Here, we highlight the enormous potential of integrating information from omics data with insights from systems biology to better understand phage-host interactions. We conceptualize and discuss the potential of a multilayer model that mirrors the phage infection process, integrating adsorption, bacterial pan-immune components and hijacking of the bacterial metabolism to predict phage infectivity. In the future, this model can offer insights into the underlying mechanisms of the infection process, and digital phagograms can support phage cocktail design and phage engineering.
Collapse
|
30
|
Deep scoping: a breeding strategy to preserve, reintroduce and exploit genetic variation. TAG. THEORETICAL AND APPLIED GENETICS. THEORETISCHE UND ANGEWANDTE GENETIK 2021; 134:3845-3861. [PMID: 34387711 PMCID: PMC8580937 DOI: 10.1007/s00122-021-03932-w] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/11/2021] [Accepted: 07/30/2021] [Indexed: 06/13/2023]
Abstract
The deep scoping method incorporates the use of a gene bank together with different population layers to reintroduce genetic variation into the breeding population, thus maximizing the long-term genetic gain without reducing the short-term genetic gain or increasing the total financial cost. Genomic prediction is often combined with truncation selection to identify superior parental individuals that can pass on favorable quantitative trait locus (QTL) alleles to their offspring. However, truncation selection reduces genetic variation within the breeding population, causing a premature convergence to a sub-optimal genetic value. In order to also increase genetic gain in the long term, different methods have been proposed that better preserve genetic variation. However, when the genetic variation of the breeding population has already been reduced as a result of prior intensive selection, even those methods will not be able to avert such premature convergence. Pre-breeding provides a solution for this problem by reintroducing genetic variation into the breeding population. Unfortunately, as pre-breeding often relies on a separate breeding population to increase the genetic value of wild specimens before introducing them in the elite population, it comes with an increased financial cost. In this paper, on the basis of a simulation study, we propose a new method that reintroduces genetic variation in the breeding population on a continuous basis without the need for a separate pre-breeding program or a larger population size. This way, we are able to introduce favorable QTL alleles into an elite population and maximize the genetic gain in the short as well as in the long term without increasing the financial cost.
Collapse
|
31
|
Two-dimensional rotation-symmetric number-conserving cellular automata. Inf Sci (N Y) 2021. [DOI: 10.1016/j.ins.2021.06.041] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
|
32
|
A comparative study of machine learning methods for ordinal classification with absolute and relative information. Knowl Based Syst 2021. [DOI: 10.1016/j.knosys.2021.107358] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
|
33
|
Stability analysis of the coexistence equilibrium of a balanced metapopulation model. Sci Rep 2021; 11:14084. [PMID: 34238954 PMCID: PMC8266877 DOI: 10.1038/s41598-021-93438-8] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2021] [Accepted: 06/23/2021] [Indexed: 12/04/2022] Open
Abstract
We analyze the stability of a unique coexistence equilibrium point of a system of ordinary differential equations (ODE system) modelling the dynamics of a metapopulation, more specifically, a set of local populations inhabiting discrete habitat patches that are connected to one another through dispersal or migration. We assume that the inter-patch migrations are detailed balanced and that the patches are identical with intra-patch dynamics governed by a mean-field ODE system with a coexistence equilibrium. By making use of an appropriate Lyapunov function coupled with LaSalle's invariance principle, we are able to show that the coexistence equilibrium point within each patch is locally asymptotically stable if the inter-patch dispersal network is heterogeneous, whereas it is neutrally stable in the case of a homogeneous network. These results provide a mathematical proof confirming the existing numerical simulations and broaden the range of networks for which they are valid.
Collapse
|
34
|
|
35
|
Disentangling the Information in Species Interaction Networks. ENTROPY 2021; 23:e23060703. [PMID: 34199402 PMCID: PMC8227248 DOI: 10.3390/e23060703] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/23/2021] [Revised: 05/25/2021] [Accepted: 05/26/2021] [Indexed: 11/16/2022]
Abstract
Shannon's entropy measure is a popular means for quantifying ecological diversity. We explore how one can use information-theoretic measures (that are often called indices in ecology) on joint ensembles to study the diversity of species interaction networks. We leverage the little-known balance equation to decompose the network information into three components describing the species abundance, specificity, and redundancy. This balance reveals that there exists a fundamental trade-off between these components. The decomposition can be straightforwardly extended to analyse networks through time as well as space, leading to the corresponding notions for alpha, beta, and gamma diversity. Our work aims to provide an accessible introduction for ecologists. To this end, we illustrate the interpretation of the components on numerous real networks. The corresponding code is made available to the community in the specialised Julia package EcologicalNetworks.jl.
Collapse
|
36
|
Optimal transportation theory for species interaction networks. Ecol Evol 2021; 11:3841-3855. [PMID: 33976779 PMCID: PMC8093754 DOI: 10.1002/ece3.7254] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2020] [Revised: 12/02/2020] [Accepted: 01/04/2021] [Indexed: 11/08/2022] Open
Abstract
Observed biotic interactions between species, such as in pollination, predation, and competition, are determined by combinations of population densities, matching in functional traits and phenology among the organisms, and stochastic events (neutral effects).We propose optimal transportation theory as a unified view for modeling species interaction networks with different intensities of interactions. We pose the coupling of two distributions as a constrained optimization problem, maximizing both the system's average utility and its global entropy, that is, randomness. Our model follows naturally from applying the MaxEnt principle to this problem setting.This approach allows for simulating changes in species relative densities as well as to disentangle the impact of trait matching and neutral forces.We provide a framework for estimating the pairwise species utilities from data. Experimentally, we show how to use this framework to perform trait matching and predict the coupling in pollination and host-parasite networks.
Collapse
|
37
|
Untangling the mechanisms of cryptic species coexistence in a nematode community through individual‐based modelling. OIKOS 2021. [DOI: 10.1111/oik.07989] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
38
|
Predicting bacteriophage hosts based on sequences of annotated receptor-binding proteins. Sci Rep 2021; 11:1467. [PMID: 33446856 PMCID: PMC7809048 DOI: 10.1038/s41598-021-81063-4] [Citation(s) in RCA: 43] [Impact Index Per Article: 14.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2020] [Accepted: 12/30/2020] [Indexed: 12/04/2022] Open
Abstract
Nowadays, bacteriophages are increasingly considered as an alternative treatment for a variety of bacterial infections in cases where classical antibiotics have become ineffective. However, characterizing the host specificity of phages remains a labor- and time-intensive process. In order to alleviate this burden, we have developed a new machine-learning-based pipeline to predict bacteriophage hosts based on annotated receptor-binding protein (RBP) sequence data. We focus on predicting bacterial hosts from the ESKAPE group, Escherichia coli, Salmonella enterica and Clostridium difficile. We compare the performance of our predictive model with that of the widely used Basic Local Alignment Search Tool (BLAST). Our best-performing predictive model reaches Precision-Recall Area Under the Curve (PR-AUC) scores between 73.6 and 93.8% for different levels of sequence similarity in the collected data. Our model reaches a performance comparable to that of BLASTp when sequence similarity in the data is high and starts outperforming BLASTp when sequence similarity drops below 75%. Therefore, our machine learning methods can be especially useful in settings in which sequence similarity to other known sequences is low. Predicting the hosts of novel metagenomic RBP sequences could extend our toolbox to tune the host spectrum of phages or phage tail-like bacteriocins by swapping RBPs.
Collapse
|
39
|
Transitive Closures of Ternary Fuzzy Relations. INT J COMPUT INT SYS 2021. [DOI: 10.2991/ijcis.d.210607.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/31/2022] Open
|
40
|
Reversibility of non-saturated linear cellular automata on finite triangular grids. CHAOS (WOODBURY, N.Y.) 2021; 31:013136. [PMID: 33754763 DOI: 10.1063/5.0031535] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/06/2020] [Accepted: 12/30/2020] [Indexed: 06/12/2023]
Abstract
Discrete dynamical systems such as cellular automata are of increasing interest to scientists in a variety of disciplines since they are simple models of computation capable of simulating complex phenomena. For this reason, the problem of reversibility of such systems is very important and, therefore, recurrently taken up by researchers. Unfortunately, the study of reversibility is remarkably hard, especially in the case of two- or higher-dimensional cellular automata. In this paper, we propose a novel and simple method that allows us to completely resolve the reversibility problem of a wide class of linear cellular automata on finite triangular grids with null boundary conditions.
Collapse
|
41
|
|
42
|
Improved deep embedding learning based on stochastic symmetric triplet loss and local sampling. Neurocomputing 2020. [DOI: 10.1016/j.neucom.2020.04.062] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
|
43
|
Relational Galois connections between transitive digraphs: Characterization and construction. Inf Sci (N Y) 2020. [DOI: 10.1016/j.ins.2020.01.034] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2022]
|
44
|
High-ISO Long-Exposure Image Denoising Based on Quantitative Blob Characterization. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2020; 29:5993-6005. [PMID: 32305916 DOI: 10.1109/tip.2020.2986687] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Blob detection and image denoising are fundamental, sometimes related tasks in computer vision. In this paper, we present a computational method to quantitatively measure blob characteristics using normalized unilateral second-order Gaussian kernels. This method suppresses non-blob structures while yielding a quantitative measurement of the position, prominence and scale of blobs, which can facilitate the tasks of blob reconstruction and blob reduction. Subsequently, we propose a denoising scheme to address high-ISO long-exposure noise, which sometimes spatially shows a blob appearance, employing a blob reduction procedure as a cheap preprocessing for conventional denoising methods. We apply the proposed denoising methods to real-world noisy images as well as standard images that are corrupted by real noise. The experimental results demonstrate the superiority of the proposed methods over state-of-the-art denoising methods.
Collapse
|
45
|
Development of a land use regression model for black carbon using mobile monitoring data and its application to pollution-avoiding routing. ENVIRONMENTAL RESEARCH 2020; 183:108619. [PMID: 31836206 DOI: 10.1016/j.envres.2019.108619] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/13/2018] [Revised: 05/02/2019] [Accepted: 07/31/2019] [Indexed: 06/10/2023]
Abstract
Black carbon is often used as an indicator for combustion-related air pollution. In urban environments, on-road black carbon concentrations have a large spatial variability, suggesting that the personal exposure of a cyclist to black carbon can heavily depend on the route that is chosen to reach a destination. In this paper, we describe the development of a cyclist routing procedure that minimizes personal exposure to black carbon. Firstly, a land use regression model for predicting black carbon concentrations in an urban environment is developed using mobile monitoring data, collected by cyclists. The optimal model is selected and validated using a spatially stratified cross-validation scheme. The resulting model is integrated in a dedicated routing procedure that minimizes personal exposure to black carbon during cycling. The best model obtains a coefficient of multiple correlation of R=0.520. Simulations with the black carbon exposure minimizing routing procedure indicate that the inhaled amount of black carbon is reduced by 1.58% on average as compared to the shortest-path route, with extreme cases where a reduction of up to 13.35% is obtained. Moreover, we observed that the average exposure to black carbon and the exposure to local peak concentrations on a route are competing objectives, and propose a parametrized cost function for the routing problem that allows for a gradual transition from routes that minimize average exposure to routes that minimize peak exposure.
Collapse
|
46
|
Scalable Large-Margin Distance Metric Learning Using Stochastic Gradient Descent. IEEE TRANSACTIONS ON CYBERNETICS 2020; 50:1072-1083. [PMID: 30507546 DOI: 10.1109/tcyb.2018.2881417] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
The key to success of many machine learning and pattern recognition algorithms is the way of computing distances between the input data. In this paper, we propose a large-margin-based approach, called the large-margin distance metric learning (LMDML), for learning a Mahalanobis distance metric. LMDML employs the principle of margin maximization to learn the distance metric with the goal of improving k -nearest-neighbor classification. The main challenge of distance metric learning is the positive semidefiniteness constraint on the Mahalanobis matrix. Semidefinite programming is commonly used to enforce this constraint, but it becomes computationally intractable on large-scale data sets. To overcome this limitation, we develop an efficient algorithm based on a stochastic gradient descent. Our algorithm can avoid the computations of the full gradient and ensure that the learned matrix remains within the positive semidefinite cone after each iteration. Extensive experiments show that the proposed algorithm is scalable to large data sets and outperforms other state-of-the-art distance metric learning approaches regarding classification accuracy and training time.
Collapse
|
47
|
Identification of Cellular Automata Based on Incomplete Observations With Bounded Time Gaps. IEEE TRANSACTIONS ON CYBERNETICS 2020; 50:971-984. [PMID: 30371399 DOI: 10.1109/tcyb.2018.2875266] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
In this paper, the problem of identifying the cellular automata (CAs) is considered. We frame and solve this problem in the context of incomplete observations, i.e., prerecorded, incomplete configurations of the system at certain, and unknown time stamps. We consider 1-D, deterministic, two-state CAs only. An identification method based on a genetic algorithm with individuals of variable length is proposed. The experimental results show that the proposed method is highly effective. In addition, connections between the dynamical properties of CAs (Lyapunov exponents and behavioral classes) and the performance of the identification algorithm are established and analyzed.
Collapse
|
48
|
Ternary reversible number-conserving cellular automata are trivial. Inf Sci (N Y) 2020. [DOI: 10.1016/j.ins.2019.10.068] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
|
49
|
Spatial movement pattern recognition in soccer based on relative player movements. PLoS One 2020; 15:e0227746. [PMID: 31945108 PMCID: PMC6964894 DOI: 10.1371/journal.pone.0227746] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2019] [Accepted: 12/27/2019] [Indexed: 11/30/2022] Open
Abstract
Knowledge of spatial movement patterns in soccer occurring on a regular basis can give a soccer coach, analyst or reporter insights in the playing style or tactics of a group of players or team. Furthermore, it can support a coach to better prepare for a soccer match by analysing (trained) movement patterns of both his own as well as opponent players. We explore the use of the Qualitative Trajectory Calculus (QTC), a spatiotemporal qualitative calculus describing the relative movement between objects, for spatial movement pattern recognition of players movements in soccer. The proposed method allows for the recognition of spatial movement patterns that occur on different parts of the field and/or at different spatial scales. Furthermore, the Levenshtein distance metric supports the recognition of similar movements that occur at different speeds and enables the comparison of movements that have different temporal lengths. We first present the basics of the calculus, and subsequently illustrate its applicability with a real soccer case. To that end, we present a situation where a user chooses the movements of two players during 20 seconds of a real soccer match of a 2016–2017 professional soccer competition as a reference fragment. Following a pattern matching procedure, we describe all other fragments with QTC and calculate their distance with the QTC representation of the reference fragment. The top-k most similar fragments of the same match are presented and validated by means of a duo-trio test. The analyses show the potential of QTC for spatial movement pattern recognition in soccer.
Collapse
|
50
|
Guiding Mineralization Co-Culture Discovery Using Bayesian Optimization. ENVIRONMENTAL SCIENCE & TECHNOLOGY 2019; 53:14459-14469. [PMID: 31682110 DOI: 10.1021/acs.est.9b05942] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Many disciplines rely on testing combinations of compounds, materials, proteins, or bacterial species to drive scientific discovery. It is time-consuming and expensive to determine experimentally, via trial-and-error or random selection approaches, which of the many possible combinations will lead to desirable outcomes. Hence, there is a pressing need for more rational and efficient experimental design approaches to reduce experimental effort. In this work, we demonstrate the potential of machine learning methods for the in silico selection of promising co-culture combinations in the application of bioaugmentation. We use the example of pollutant removal in drinking water treatment plants, which can be achieved using co-cultures of a specialized pollutant degrader with combinations of bacterial isolates. To reduce the experimental effort needed to discover high-performing combinations, we propose a data-driven experimental design. Based on a dataset of mineralization performance for all pairs of 13 bacterial species co-cultured with MSH1, we built a Gaussian process regression model to predict the Gompertz mineralization parameters of the co-cultures of two and three species, based on the single-strain parameters. We subsequently used this model in a Bayesian optimization scheme to suggest potentially high-performing combinations of bacteria. We achieved good performance with this approach, both for predicting mineralization parameters and for selecting effective co-cultures, despite the limited dataset. As a novel application of Bayesian optimization in bioremediation, this experimental design approach has promising applications for highlighting co-culture combinations for in vitro testing in various settings, to lessen the experimental burden and perform more targeted screenings.
Collapse
|