Blog

DISCRETE MARKOV DECISION PROBLEM

IN SERVICE FACILITY SYSTEMS WITH INVENTORY

22 th January, 2019.

Markov decision model is a useful and powerful tool for understanding probabilistic sequential decision processes with an infinite planning horizon. Focusing on a discrete-time MDP model of Admission and Inventory Control in Service Facility Systems, the time of operation of the system is divided into periods of time t >0. Decisions are taken at the beginning of each periods (epochs) to control both admission to service and inventory replenishment. Assume that we have 2 kinds of queue, including eligible queue and potential queue. Customers is transferred by the Admission control system at decision epochs from potential queue to eligible queue, either reject or admit. The demand for the services and the service times are assumed to have time invariant probability distributions g() (the arrival of customer in each periods) and f() respectively. Let the revenue be constant throughout the time period. There are 3 types of cost. The system operator earns R for every completely served customer and there is a charge when holding x items in inventory and when there are y customers in the system, i.e. h(x) and k(y) respectively. The maximum inventory is assumed to be M. The MDP model takes account on average cost to find the optimal policy to be implemented for the system.
Let denote $X_t$ be the number of customer in the system before decision epoch t, including number of people in eligible queue and server, and $Z_t$ is the number of customer in potential queue in period t. Let $Y_t$ be the number of “possible service completions” depending only on the service distribution (not take consideration into the number of customers in the system). We consider the problem on MDP having five components (tuples) $(T,\mathcal{X}, \mathcal{A}, r(), p())$ .
Decision epochs:

$T = \{ 0, t, 2t, 3t, ...\}$ ;
State space: $\mathcal{X} = S_1 \times S_2$ .
where $S_1$ is the number of customer in the system, $\{ 0,1,2, ..., N \}$ and $S_2$ is the number of items in stock $\{ 0, 1, 2,..., M \}$ .
Action: The (M-1, M) policy is adopted for the inventory system. Replenishing is instantaneous and delivery occurs instantaneously when making a decision to order more stock at the beginning of each period. $\mathcal{A} = \{(l,m) : l,m= 0,1,2\}$ where 0- no order/reject; 1- order/accept (how many) and 2 –compulsory order/ compulsory accept, depending on the number of $s_1$ and $s_2$.
Cost:
The expected number of service completion in period t is
$E \{ \min (Y_t, s_1 + a) \} = \sum_{i=1}^{s_1+a-1} if(i) + (s_1+a) \sum_{i = s_1+a} ^{\infty} f(i)$ .
Transition probability: is the probability moving from $s = (s_1, s_2)$ to $s'= (s_1', s_2')$ under the action $a \in \mathbb{A}$.
Let $(X_t, I_t)$ denote the state of the system of decision epoch t. Then $\{(X_t,I_t) : t \geq 0\}$ is a Markov chain with discrete state space S.
The average cost function $g_s(R) = \lim_{t \rightarrow \infty} \frac{1}{t} V_t(s,R)$ is alwsays exits (see Theorem (Puterman(1994) & Tijims(2003))), where the $V_t(s,R)$ denote the total expected cost over the first t decision with the initial state s and policy R is adopted.
The objective is to find the average cost optimal policy $R^*$ which is $g_s(R^*) \leq g_s (R)$ for each stationary policy R. To see more detail about the algorithm, please see in the reference.
Let’s see one example for illustration. For the system we are N = 5, M = 5. Let the state space be $S_1 = \{ 0,1,2,3,4,5\} = S_2$ . Assume the holding cost for admission control at level $s_1 \in S$ are $c_4 = 3, c_3 = 5, c_2= 7, c_1= 9, c_0 = c_f = 10$ .

Assume that the ordering cost of inventory machine at level $s_2 \in S$ are $c_{p4} = 5, c_ {p3}= 4, c_{p2} = 3.2, c_{p1}= 2, c_{p0} = c_f = 1.5$ , the holding cost $c_h = 0.1$ (per inventory), and inventory cost = 0.3(per item).

Based on the algorithm, the optimal policies $R^* = (0,1,1,1,1,2)$ (Admission control) and $R_* = (0,0,0,0,1,2)$ (inventory control). This mean that at state (0,0) compulsory admission and replenishment, only at the states 1,2,3,4 allow customers to the system and replenishment order is placed when the system state is in state 1 (inventory level).
Reference:
1. Discrete MDP problem with Admission and Inventory Control in Service Facility Systems by C. Selvakumar, P. Maheswari, and C. Elango Research Department of Mathematics, Cardamom Planters’ Association College, Bodinayakanur- 625 513.
2. Optimal Service Control in a Discrete Time Service Facility System with Inventory, Selvakumar, C. 1, Elango, C.2, International Journal of Engineering Science Invention (IJESI) ISSN (Online): 2319 – 6734, ISSN (Print): 2319 – 6726 www.ijesi.org ||Volume 7 Issue 6 Ver I || June 2018 || PP 21-34.
3. Inventory Ordering Control for a Retrial Service Facility System – Semi- MDP, S.Krishnakumar 1, C.Selvakumar3, C.Elango2, International Journal of Engineering Science Invention (IJESI) ISSN (Online): 2319 – 6734, ISSN (Print): 2319 – 6726 www.ijesi.org ||Volume 7 Issue 6 Ver I || June 2018 || PP 14-20.

BLOG

A BLOG FROM MY READINGS.

DISCRETE MARKOV DECISION PROBLEM

IN SERVICE FACILITY SYSTEMS WITH INVENTORY

22 th January, 2019.