1
|
Bias and Overtaking Optimality for Continuous-Time Jump Markov Decision Processes in Polish Spaces. J Appl Probab 2016. [DOI: 10.1017/s0021900200004320] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
In this paper we study the bias and the overtaking optimality criteria for continuous-time jump Markov decision processes in general state and action spaces. The corresponding transition rates are allowed to be unbounded, and the reward rates may have neither upper nor lower bounds. Under appropriate hypotheses, we prove the existence of solutions to the bias optimality equations, the existence of bias optimal policies, and an equivalence relation between bias and overtaking optimality.
Collapse
|
2
|
Abstract
The use ofbias optimalityto distinguish among gain optimal policies was recently studied by Haviv and Puterman [1] and extended in Lewiset al.[2]. In [1], upon arrival to anM/M/1 queue, customers offer the gatekeeper a rewardR. If accepted, the gatekeeper immediately receives the reward, but is charged a holding cost,c(s), depending on the number of customers in the system. The gatekeeper, whose objective is to ‘maximize’ rewards, must decide whether to admit the customer. If the customer is accepted, the customer joins the queue and awaits service. Haviv and Puterman [1] showed there can be only two Markovian, stationary, deterministic gain optimal policies and that only the policy which uses thelargercontrol limit is bias optimal. This showed the usefulness of bias optimality to distinguish between gain optimal policies. In the same paper, they conjectured that if the gatekeeper receives the reward uponcompletionof a job instead of upon entry, the bias optimal policy will be the lower control limit. This note confirms that conjecture.
Collapse
|
3
|
Abstract
We consider a finite-capacity queueing system where arriving customers offer rewards which are paid upon acceptance into the system. The gatekeeper, whose objective is to ‘maximize’ rewards, decides if the reward offered is sufficient to accept or reject the arriving customer. Suppose the arrival rates, service rates, and system capacity are changing over time in a known manner. We show that all bias optimal (a refinement of long-run average reward optimal) policies are of threshold form. Furthermore, we give sufficient conditions for the bias optimal policy to be monotonic in time. We show, via a counterexample, that if these conditions are violated, the optimal policy may not be monotonic in time or of threshold form.
Collapse
|
4
|
Abstract
The use of bias optimality to distinguish among gain optimal policies was recently studied by Haviv and Puterman [1] and extended in Lewis et al. [2]. In [1], upon arrival to an M/M/1 queue, customers offer the gatekeeper a reward R. If accepted, the gatekeeper immediately receives the reward, but is charged a holding cost, c(s), depending on the number of customers in the system. The gatekeeper, whose objective is to ‘maximize’ rewards, must decide whether to admit the customer. If the customer is accepted, the customer joins the queue and awaits service. Haviv and Puterman [1] showed there can be only two Markovian, stationary, deterministic gain optimal policies and that only the policy which uses the larger control limit is bias optimal. This showed the usefulness of bias optimality to distinguish between gain optimal policies. In the same paper, they conjectured that if the gatekeeper receives the reward upon completion of a job instead of upon entry, the bias optimal policy will be the lower control limit. This note confirms that conjecture.
Collapse
|
5
|
Lewis ME, Ayhan H, Foley RD. Bias optimal admission control policies for a multiclass nonstationary queueing system. J Appl Probab 2016. [DOI: 10.1239/jap/1019737985] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
We consider a finite-capacity queueing system where arriving customers offer rewards which are paid upon acceptance into the system. The gatekeeper, whose objective is to ‘maximize’ rewards, decides if the reward offered is sufficient to accept or reject the arriving customer. Suppose the arrival rates, service rates, and system capacity are changing over time in a known manner. We show that all bias optimal (a refinement of long-run average reward optimal) policies are of threshold form. Furthermore, we give sufficient conditions for the bias optimal policy to be monotonic in time. We show, via a counterexample, that if these conditions are violated, the optimal policy may not be monotonic in time or of threshold form.
Collapse
|
6
|
Guo X, Song X. Discounted continuous-time constrained Markov decision processes in Polish spaces. ANN APPL PROBAB 2011. [DOI: 10.1214/10-aap749] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
7
|
Guo X, Rieder U. Average optimality for continuous-time Markov decision processes in Polish spaces. ANN APPL PROBAB 2006. [DOI: 10.1214/105051606000000105] [Citation(s) in RCA: 40] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
8
|
Denumerable-state continuous-time Markov decision processes with unbounded transition and reward rates under the discounted criterion. J Appl Probab 2002. [DOI: 10.1017/s0021900200022476] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
In this paper, we consider denumerable-state continuous-time Markov decision processes with (possibly unbounded) transition and reward rates and general action space under the discounted criterion. We provide a set of conditions weaker than those previously known and then prove the existence of optimal stationary policies within the class of all possibly randomized Markov policies. Moreover, the results in this paper are illustrated by considering the birth-and-death processes with controlled immigration in which the conditions in this paper are satisfied, whereas the earlier conditions fail to hold.
Collapse
|