1
|
Deshpande SV, Harikrishnan R, Sampe J, Patwa A. An algorithm to create model file for Partially Observable Markov Decision Process for mobile robot path planning. MethodsX 2024; 12:102552. [PMID: 38299041 PMCID: PMC10828799 DOI: 10.1016/j.mex.2024.102552] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2023] [Accepted: 01/04/2024] [Indexed: 02/02/2024] Open
Abstract
The Partially Observable Markov Decision Process (POMDP), a mathematical framework for decision-making in uncertain environments suffers from the curse of dimensionality. There are various methods that can handle huge sizes of POMDP matrices to create approximate solutions, but no serious effort has been reported to effectively control the size of the POMDP matrices. Manually creating the high-dimension matrices of a POMDP model is a cumbersome and sometimes even impossible task. The PCMRPP (POMDP file Creator for Mobile Robot Path Planning) software package implements a novel algorithm to programmatically generate these matrices such that: •The sizes of the matrices can be controlled by configuring the granularity of discretization of the components of the state and•The sparseness of the matrices can be controlled by configuring the spread of the observation probability distribution. This kind of flexibility allows one to achieve a trade-off between time complexity and the level of robustness of the POMDP solution.
Collapse
Affiliation(s)
- Shripad V. Deshpande
- Symbiosis Institute of Technology, Pune Campus, Symbiosis International (Deemed University), Pune, India
| | - R. Harikrishnan
- Symbiosis Institute of Technology, Pune Campus, Symbiosis International (Deemed University), Pune, India
| | - Jahariah Sampe
- Institute of Microengineering and Nanoelectronics, Universiti Kebangsaan Malaysia, Bangi, Selangor, Malaysia
| | - Abhimanyu Patwa
- Symbiosis Institute of Technology, Pune Campus, Symbiosis International (Deemed University), Pune, India
| |
Collapse
|
2
|
Leong KH, Xiu Y, Chen B, Chan WK(V. Neural Causal Information Extractor for Unobserved Causes. ENTROPY (BASEL, SWITZERLAND) 2023; 26:46. [PMID: 38248172 PMCID: PMC11154551 DOI: 10.3390/e26010046] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/06/2023] [Revised: 12/18/2023] [Accepted: 12/22/2023] [Indexed: 01/23/2024]
Abstract
Causal inference aims to faithfully depict the causal relationships between given variables. However, in many practical systems, variables are often partially observed, and some unobserved variables could carry significant information and induce causal effects on a target. Identifying these unobserved causes remains a challenge, and existing works have not considered extracting the unobserved causes while retaining the causes that have already been observed and included. In this work, we aim to construct the implicit variables with a generator-discriminator framework named the Neural Causal Information Extractor (NCIE), which can complement the information of unobserved causes and thus provide a complete set of causes with both observed causes and the representations of unobserved causes. By maximizing the mutual information between the targets and the union of observed causes and implicit variables, the implicit variables we generate could complement the information that the unobserved causes should have provided. The synthetic experiments show that the implicit variables preserve the information and dynamics of the unobserved causes. In addition, extensive real-world time series prediction tasks show improved precision after introducing implicit variables, thus indicating their causality to the targets.
Collapse
Affiliation(s)
- Keng-Hou Leong
- Tsinghua Shenzhen International Graduate School, Tsinghua University, Shenzhen 518055, China; (K.-H.L.); (Y.X.)
- Tsinghua-Berkeley Shenzhen Institute, Tsinghua University, Shenzhen 518055, China
| | - Yuxuan Xiu
- Tsinghua Shenzhen International Graduate School, Tsinghua University, Shenzhen 518055, China; (K.-H.L.); (Y.X.)
- Tsinghua-Berkeley Shenzhen Institute, Tsinghua University, Shenzhen 518055, China
| | - Bokui Chen
- Tsinghua Shenzhen International Graduate School, Tsinghua University, Shenzhen 518055, China; (K.-H.L.); (Y.X.)
- Peng Cheng Laboratory, Shenzhen 518055, China
| | - Wai Kin (Victor) Chan
- Tsinghua Shenzhen International Graduate School, Tsinghua University, Shenzhen 518055, China; (K.-H.L.); (Y.X.)
- Tsinghua-Berkeley Shenzhen Institute, Tsinghua University, Shenzhen 518055, China
- International Science and Technology Information Center, Shenzhen 518055, China
| |
Collapse
|
3
|
Williams BK, Brown ED. Four conservation challenges and a synthesis. Ecol Evol 2023; 13:e10052. [PMID: 37153016 PMCID: PMC10154884 DOI: 10.1002/ece3.10052] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2023] [Revised: 04/12/2023] [Accepted: 04/17/2023] [Indexed: 05/09/2023] Open
Abstract
Conservation and management of biological systems involves decision-making over time, with a generic goal of sustaining systems and their capacity to function in the future. We address four persistent and difficult conservation challenges: (1) prediction of future consequences of management, (2) uncertainty about the system's structure, (3) inability to observe ecological systems fully, and (4) nonstationary system dynamics. We describe these challenges in terms of dynamic systems subject to different sources of uncertainty, and we present a basic Markovian framework that can encompass approaches to all four challenges. Finding optimal conservation strategies for each challenge requires issue-specific structural features, including adaptations of state transition models, uncertainty metrics, valuation of accumulated returns, and solution methods. Strategy valuation exhibits not only some remarkable similarities among approaches but also some important operational differences. Technical linkages among the models highlight synergies in solution approaches, as well as possibilities for combining them in particular conservation problems. As methodology and computing software advance, such an integrated conservation framework offers the potential to improve conservation outcomes with strategies to allocate management resources efficiently and avoid negative consequences.
Collapse
Affiliation(s)
| | - Eleanor D. Brown
- Science and Decisions CenterU.S. Geological SurveyRestonVirginiaUSA
| |
Collapse
|
4
|
Situation assessment in air combat considering incomplete frame of discernment in the generalized evidence theory. Sci Rep 2022; 12:22639. [PMID: 36587044 PMCID: PMC9805455 DOI: 10.1038/s41598-022-27076-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2022] [Accepted: 12/26/2022] [Indexed: 01/01/2023] Open
Abstract
For situation assessment in air combat, there may be incomplete information because of new technologies and unknown or uncertain targets and threats. In this paper, an improved method of situation assessment for air combat environment considering incomplete frame of discernment in the evidence theory is proposed to get a more accurate fusion result for decision making in the battlefield environment. First, the situation in air combat is assessed with knowledge. Then, the incomplete frame of discernment in the generalized evidence theory, which is an extension of Dempster-Shafer evidence theory, is adopted to model the incomplete and unknown situation assessment. After that, the generalized combination rule in the generalized evidence theory is adopted for fusion of situations in intelligent air combat. Finally, real-time decision-making in situation assessment can be reached for actions to take. Experiments in situation assessment of air combat with incomplete and uncertain situations show the rationality and effectiveness of the proposed method.
Collapse
|
5
|
Gu Y, Zhu Z, Lv J, Shi L, Hou Z, Xu S. DM-DQN: Dueling Munchausen deep Q network for robot path planning. COMPLEX INTELL SYST 2022. [DOI: 10.1007/s40747-022-00948-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]
Abstract
AbstractIn order to achieve collision-free path planning in complex environment, Munchausen deep Q-learning network (M-DQN) is applied to mobile robot to learn the best decision. On the basis of Soft-DQN, M-DQN adds the scaled log-policy to the immediate reward. The method allows agent to do more exploration. However, the M-DQN algorithm has the problem of slow convergence. A new and improved M-DQN algorithm (DM-DQN) is proposed in the paper to address the problem. First, its network structure was improved on the basis of M-DQN by decomposing the network structure into a value function and an advantage function, thus decoupling action selection and action evaluation and speeding up its convergence, giving it better generalization performance and enabling it to learn the best decision faster. Second, to address the problem of the robot’s trajectory being too close to the edge of the obstacle, a method of using an artificial potential field to set a reward function is proposed to drive the robot’s trajectory away from the vicinity of the obstacle. The result of simulation experiment shows that the method learns more efficiently and converges faster than DQN, Dueling DQN and M-DQN in both static and dynamic environments, and is able to plan collision-free paths away from obstacles.
Collapse
|
6
|
Williams BK, Brown ED. Partial observability and management of ecological systems. Ecol Evol 2022; 12:e9197. [PMID: 36172296 PMCID: PMC9468910 DOI: 10.1002/ece3.9197] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2022] [Accepted: 07/19/2022] [Indexed: 11/10/2022] Open
Abstract
The actual state of ecological systems is rarely known with certainty, but management actions must often be taken regardless of imperfect measurement (partial observability). Because of the difficulties in accounting for partial observability, it is usually treated in an ad hoc fashion, or simply ignored altogether. Yet incorporating partial observability into decision processes lends a realism that has the potential to improve ecological outcomes significantly. We review frameworks for dealing with partial observability, focusing specifically on dynamic ecological systems with Markovian transitions, i.e., transitions among system states that are influenced by the current system state and management action over time. Fully observable states are represented in an observable Markov decision process (MDP), whereas obscure or hidden states are represented in a partially observable process (POMDP). POMDPs can be seen as a natural extension of observable MDPs. Management under partial observability generalizes the situation for complete observability, by recognizing uncertainty about the system's state and incorporating sequential observations associated with, but not the same as, the states themselves. Decisions that otherwise would depend on the actual state must be based instead on state probability distributions (“belief states”). Partial observability requires adaptation of the entire decision process, including the use of belief states and Bayesian updates, valuation that includes expectations over observations, and optimal strategy that identifies actions for belief states over a continuous belief space. We compare MDPs and POMDPs and highlight POMDP applications to some common ecological problems. We clarify the structure and operations, approaches for finding solutions, and analytic challenges of POMDPs for practicing ecologists. Both observable and partially observable MDPs can use an inductive approach to identify optimal strategies and values, with a considerable increase in mathematical complexity with POMDPs. Better understanding of POMDPs can help decision makers manage imperfectly measured ecological systems more effectively.
Collapse
Affiliation(s)
| | - Eleanor D. Brown
- U.S. Geological SurveyScience and Decisions CenterRestonVirginiaUSA
| |
Collapse
|
7
|
Mallela A, Hastings A. Optimal management of stochastic invasion in a metapopulation with Allee effects. J Theor Biol 2022; 549:111221. [PMID: 35843441 DOI: 10.1016/j.jtbi.2022.111221] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2022] [Revised: 06/28/2022] [Accepted: 07/10/2022] [Indexed: 10/17/2022]
Abstract
Invasive species account for incalculable damages worldwide, in both ecological and bioeconomic terms. The question of how a network of invasive populations can be optimally managed is one that deserves further exploration. A study accounting for partial observability and imperfect detection, in particular, could yield useful insights into species eradication efforts. Here, we generalized a simple model system that we developed in previous work. This model consists of three interacting populations with underlying strong Allee effects and stochastic dynamics, inhabiting distinct locations connected by dispersal, which can generate bistability. To explore the stochastic dynamics, we formulated an individual-based modeling approach. Next, using the theory of continuous-time Markov chains, we approximated the original high-dimensional model by a Markov chain with eight states, with each state corresponding to a combination of population thresholds. We then used the reduced model as the core for a powerful decision-making tool, referred to as a Partially Observable Markov Decision Process (POMDP). Analysis of this POMDP indicates when the system results in optimal management outcomes.
Collapse
Affiliation(s)
- Abhishek Mallela
- Department of Mathematics, University of California, Davis, CA 95616, USA.
| | - Alan Hastings
- Department of Environmental Science and Policy, University of California, Davis, CA 95616, USA; Santa Fe Institute, Santa Fe, NM 87501, USA
| |
Collapse
|
8
|
Learning Dynamics and Control of a Stochastic System under Limited Sensing Capabilities. SENSORS 2022; 22:s22124491. [PMID: 35746272 PMCID: PMC9230096 DOI: 10.3390/s22124491] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/29/2022] [Revised: 06/07/2022] [Accepted: 06/12/2022] [Indexed: 02/04/2023]
Abstract
The operation of a variety of natural or man-made systems subject to uncertainty is maintained within a range of safe behavior through run-time sensing of the system state and control actions selected according to some strategy. When the system is observed from an external perspective, the control strategy may not be known and it should rather be reconstructed by joint observation of the applied control actions and the corresponding evolution of the system state. This is largely hurdled by limitations in the sensing of the system state and different levels of noise. We address the problem of optimal selection of control actions for a stochastic system with unknown dynamics operating under a controller with unknown strategy, for which we can observe trajectories made of the sequence of control actions and noisy observations of the system state which are labeled by the exact value of some reward functions. To this end, we present an approach to train an Input–Output Hidden Markov Model (IO-HMM) as the generative stochastic model that describes the state dynamics of a POMDP by the application of a novel optimization objective adopted from the literate. The learning task is hurdled by two restrictions: the only available sensed data are the limited number of trajectories of applied actions, noisy observations of the system state, and system state; and, the high failure costs prevent interaction with the online environment, preventing exploratory testing. Traditionally, stochastic generative models have been used to learn the underlying system dynamics and select appropriate actions in the defined task. However, current state of the art techniques, in which the state dynamics of the POMDP is first learned and then strategies are optimized over it, frequently fail because the model that best fits the data may not be well suited for controlling. By using the aforementioned optimization objective, we try to to tackle the problems related to model mis-specification. The proposed methodology is illustrated in a scenario of failure avoidance for a multi component system. The quality of the decision making is evaluated by using the collected reward on the test data and compared against the previous literature usual approach.
Collapse
|