[{"@context":"http:\/\/schema.org\/","@type":"BlogPosting","@id":"https:\/\/wiki.edu.vn\/en\/wiki24\/drift-plus-penalty-wikipedia\/#BlogPosting","mainEntityOfPage":"https:\/\/wiki.edu.vn\/en\/wiki24\/drift-plus-penalty-wikipedia\/","headline":"Drift plus penalty &#8211; Wikipedia","name":"Drift plus penalty &#8211; Wikipedia","description":"In the mathematical theory of probability, the drift-plus-penalty method is used for optimization of queueing networks and other stochastic systems.","datePublished":"2015-08-15","dateModified":"2015-08-15","author":{"@type":"Person","@id":"https:\/\/wiki.edu.vn\/en\/wiki24\/author\/lordneo\/#Person","name":"lordneo","url":"https:\/\/wiki.edu.vn\/en\/wiki24\/author\/lordneo\/","image":{"@type":"ImageObject","@id":"https:\/\/secure.gravatar.com\/avatar\/c9645c498c9701c88b89b8537773dd7c?s=96&d=mm&r=g","url":"https:\/\/secure.gravatar.com\/avatar\/c9645c498c9701c88b89b8537773dd7c?s=96&d=mm&r=g","height":96,"width":96}},"publisher":{"@type":"Organization","name":"Enzyklop\u00e4die","logo":{"@type":"ImageObject","@id":"https:\/\/wiki.edu.vn\/wiki4\/wp-content\/uploads\/2023\/08\/download.jpg","url":"https:\/\/wiki.edu.vn\/wiki4\/wp-content\/uploads\/2023\/08\/download.jpg","width":600,"height":60}},"image":{"@type":"ImageObject","@id":"https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/c45f2d6c1e791b1be653e7855fbe2a6565033e95","url":"https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/c45f2d6c1e791b1be653e7855fbe2a6565033e95","height":"","width":""},"url":"https:\/\/wiki.edu.vn\/en\/wiki24\/drift-plus-penalty-wikipedia\/","about":["Wiki"],"wordCount":31559,"articleBody":"In the mathematical theory of probability, the drift-plus-penalty method is used for optimization of queueing networks and other stochastic systems.The technique is for stabilizing a queueing network while also minimizing the time average of a network penalty function.  It can be used to optimize performance objectives such as time average power, throughput, and throughput utility.[1][2]In the special case when there is no penalty to be minimized, and when the goal is to design a stable routing policy in a multi-hop network, the method reduces to backpressure routing.[3][4]The drift-plus-penalty method can also be used to minimize the time average of a stochastic process subject to time average constraints on a collection of other stochastic processes.[5]This is done by defining an appropriate set of virtual queues. It can also be used to produce time averaged solutions to convex optimization problems.[6][7]Table of ContentsMethodology[edit]Origins and applications[edit]How it works[edit]The stochastic optimization problem[edit]Virtual queues[edit]The drift-plus-penalty expression[edit]Drift-plus-penalty algorithm[edit]Approximate scheduling[edit]Performance analysis[edit]Average penalty analysis[edit]Average queue size analysis[edit]Probability 1 convergence[edit]Application to queues with finite capacity[edit]Treatment of queueing systems[edit]Convex functions of time averages[edit]Delay tradeoffs and related work[edit]Extensions to non-i.i.d. event processes[edit]Extensions to variable frame length systems[edit]Application to convex programming[edit]Drift-plus-penalty algorithm for convex programming[edit]Drift-plus-penalty algorithm for linear programming[edit]Related links[edit]References[edit]Primary sources[edit]Methodology[edit]The drift-plus-penalty method applies to queueing systems that operate in discrete time with time slots t in {0, 1, 2, &#8230;}.  First, a non-negative function L(t) is defined as a scalar measure of the state of all queues at time\u00a0t.  The function L(t) is typically defined as the sum of the squares of all queue sizes at time t, and is called a Lyapunov function.  The Lyapunov drift is defined:\u0394L(t)=L(t+1)\u2212L(t){displaystyle Delta L(t)=L(t+1)-L(t)}Every slot t, the current queue state is observed and control actions are taken to greedily minimize a bound on the following drift-plus-penalty expression:\u0394L(t)+Vp(t),{displaystyle Delta L(t)+Vp(t),}where p(t) is the penalty function and V is a non-negative weight.  The V parameter can be chosen to ensure the time average of p(t) is arbitrarily close to optimal, with a corresponding tradeoff in average queue size. Like backpressure routing, this method typically does not require knowledge of the probability distributions for job arrivals and network mobility.[5]Origins and applications[edit]When V=0,{displaystyle V=0,} the method reduces to greedily minimizing the Lyapunov drift.  This results in the backpressure routing algorithm originally developed by Tassiulas and Ephremides (also called the max-weight algorithm).[3][8] The Vp(t){displaystyle Vp(t)} term was added to the drift expression by Neely[9] and Neely, Modiano, Li[2] for stabilizing a network while also maximizing a throughput utility function.  For this, the penalty p(t){displaystyle p(t)} was defined as \u22121{displaystyle -1} times a reward earned on slot t.{displaystyle t.} This drift-plus-penalty technique was later used to minimize average power[1] and optimize other penalty and reward metrics.[4][5]The theory was developed primarily for optimizing communication networks, including wireless networks, ad-hoc mobile networks, and other computer networks.  However, the mathematical techniques can be applied to optimization and control for other stochastic systems, including renewable energy allocation in smart power grids[10][11][12] and inventory control for product assembly systems.[13]How it works[edit]This section shows how to use the drift-plus-penalty method to minimize the time average of a function p(t) subject to time average constraints on a collection of other functions. The analysis below is based on the material in.[5]The stochastic optimization problem[edit]Consider a discrete time system that evolves over normalized time slots t in {0, 1, 2, &#8230;}.  Define p(t) as a function whose time average should be minimized, called a penalty function. Suppose that minimization of the time average of p(t) must be done subject to time-average constraints on a collection of K other functions:p(t)=penalty function whose time average must be minimized{displaystyle p(t)={text{penalty function whose time average must be minimized}}}y1(t),y2(t),\u2026,yK(t)=other functions whose time averages must be non-positive{displaystyle y_{1}(t),y_{2}(t),ldots ,y_{K}(t)={text{other functions whose time averages must be non-positive}}}Every slot t, the network controller observes a new random event.  It then makes a control action based on knowledge of this event.  The values of p(t) and y_i(t) are determined as functions of the random event and the control action on slot t:\u03c9(t)=random event on slot\u00a0t\u00a0(assumed i.i.d. over slots){displaystyle omega (t)={text{random event on slot }}t{text{ (assumed i.i.d. over slots)}}}\u03b1(t)=control action on slot\u00a0t\u00a0(chosen after observing\u00a0\u03c9(t)){displaystyle alpha (t)={text{control action on slot }}t{text{ (chosen after observing }}omega (t){text{)}}}p(t)=P(\u03b1(t),\u03c9(t))\u00a0(a deterministic function of\u00a0\u03b1(t),\u03c9(t)){displaystyle p(t)=P(alpha (t),omega (t)){text{ (a deterministic function of }}alpha (t),omega (t){text{)}}}yi(t)=Yi(\u03b1(t),\u03c9(t))\u00a0\u2200i\u2208{1,\u2026,K}\u00a0(deterministic functions of\u00a0\u03b1(t),\u03c9(t)){displaystyle y_{i}(t)=Y_{i}(alpha (t),omega (t)){text{ }}forall iin {1,ldots ,K}{text{ (deterministic functions of }}alpha (t),omega (t){text{)}}}The small case notation p(t), y_i(t) and upper case notation P(), Y_i() is used to distinguish the penalty values from the function that determines these values based on the random event and control action for slot t.  The random event \u03c9(t){displaystyle omega (t)} is assumed to take values in some abstract set of events \u03a9{displaystyle Omega }.  The control action \u03b1(t){displaystyle alpha (t)} is assumed to be chosen within some abstract set A{displaystyle A} that contains the control options.  The sets \u03a9{displaystyle Omega } and A{displaystyle A} are arbitrary and can be either finite or infinite. For example, A{displaystyle A} could be a finite list of abstract elements,  an uncountably infinite (and possibly non-convex) collection of real-valued vectors, and so on.   The functions P(), Y_i() are also arbitrary and do not require continuity or convexity assumptions.As an example in the context of communication networks, the random event \u03c9(t){displaystyle omega (t)} can be a vector that contains the slot t arrival information for each node and the slot t channel state information for each link. The control action \u03b1(t){displaystyle alpha (t)} can be a vector that contains the routing and transmission decisions for each node. The functions P() and Y_i() can represent power expenditures or throughputs associated with the control action and channel condition for slot t.For simplicity of exposition, assume the P() and Y_i() functions are bounded.  Further assume  the random event process \u03c9(t){displaystyle omega (t)} is independent and identically distributed (i.i.d.) over slots t with some possibly unknown probability distribution. The goal is to design a policy for making control actions over time to solve the following problem:Minimize:\u00a0limt\u2192\u221e1t\u2211\u03c4=0t\u22121E[p(\u03c4)]{displaystyle {text{Minimize: }}lim _{trightarrow infty }{frac {1}{t}}sum _{tau =0}^{t-1}E[p(tau )]}Subject to:\u00a0limt\u2192\u221e1t\u2211\u03c4=0t\u22121E[yi(\u03c4)]\u22640\u00a0\u2200i\u2208{1,\u2026,K}{displaystyle {text{Subject to: }}lim _{trightarrow infty }{frac {1}{t}}sum _{tau =0}^{t-1}E[y_{i}(tau )]leq 0{text{  }}forall iin {1,ldots ,K}}It is assumed throughout that this problem is feasible.  That is, it is assumed that there exists an algorithm that can satisfy all of the K desired constraints.The above problem poses each constraint in the standard form of a time average expectation of an abstract process y_i(t) being non-positive.  There is no loss of generality with this approach.  For example, suppose one desires the time average expectation of some process a(t) to be less than or equal to a given constant c. Then a new penalty function y(t)\u00a0=\u00a0a(t)\u00a0\u2212\u00a0c  can be defined, and the desired constraint is equivalent to the time average expectation of y(t) being non-positive. Likewise, suppose there are two processes a(t) and b(t) and one desires the time average expectation of a(t) to be less than or equal to that of\u00a0b(t).  This constraint is written in standard form by defining a new penalty function y(t)\u00a0=\u00a0a(t)\u00a0\u2212\u00a0b(t). The above problem seeks to minimize the time average of an abstract penalty function\u00a0p'(t)&#8217;.  This can be used to maximize the time average of some desirable reward function r(t) by defining p(t)\u00a0=\u00a0\u2212r(&#8216;t).Virtual queues[edit]For each constraint i in {1, &#8230;, K}, define a virtual queue with dynamics over slots t in {0, 1, 2, &#8230;} as follows:(Eq.\u00a01)\u00a0Qi(t+1)=max[Qi(t)+yi(t),0]{displaystyle ({text{Eq. }}1){text{ }}Q_{i}(t+1)=max[Q_{i}(t)+y_{i}(t),0]}Initialize Qi(0)\u00a0=\u00a00 for all i in {1, &#8230;, K}.  This update equation is identical to that of a  virtual discrete time queue with backlog Q_i(t) and with y_i(t) being the difference between new arrivals and new service opportunities on slot\u00a0t.  Intuitively, stabilizing these virtual queues ensures the time averages of the constraint functions are less than or equal to zero, so the desired constraints are satisfied.  To see this precisely, note that (Eq. 1) implies:Qi(t+1)\u2265Qi(t)+yi(t){displaystyle Q_{i}(t+1)geq Q_{i}(t)+y_{i}(t)}Therefore:yi(t)\u2264Qi(t+1)\u2212Qi(t){displaystyle y_{i}(t)leq Q_{i}(t+1)-Q_{i}(t)}Summing the above over the first t slots and using the law of telescoping sums implies:\u2211\u03c4=0t\u22121yi(\u03c4)\u2264Qi(t)\u2212Qi(0)=Qi(t){displaystyle sum _{tau =0}^{t-1}y_{i}(tau )leq Q_{i}(t)-Q_{i}(0)=Q_{i}(t)}Dividing by t and taking expectations implies:1t\u2211\u03c4=0t\u22121E[yi(\u03c4)]\u2264E[Qi(t)]t{displaystyle {frac {1}{t}}sum _{tau =0}^{t-1}E[y_{i}(tau )]leq {frac {E[Q_{i}(t)]}{t}}}Therefore, the desired constraints of the problem are satisfied whenever the following holds for all i in {1, &#8230;, K}:limt\u2192\u221eE[Qi(t)]t=0{displaystyle lim _{trightarrow infty }{frac {E[Q_{i}(t)]}{t}}=0}A queue Q_i(t) that satisfies the above limit equation is said to be mean rate stable.[5]The drift-plus-penalty expression[edit]To stabilize the queues, define the Lyapunov function L(t) as a measure of the total queue backlog on slot\u00a0t:L(t)=12\u2211i=1KQi(t)2{displaystyle L(t)={frac {1}{2}}sum _{i=1}^{K}Q_{i}(t)^{2}}Squaring the queueing equation (Eq. 1) results in the following bound for each queue i in {1, &#8230;, K}:Qi(t+1)2\u2264(Qi(t)+yi(t))2=Qi(t)2+yi(t)2+2Qi(t)yi(t){displaystyle Q_{i}(t+1)^{2}leq (Q_{i}(t)+y_{i}(t))^{2}=Q_{i}(t)^{2}+y_{i}(t)^{2}+2Q_{i}(t)y_{i}(t)}Therefore,12\u2211i=1KQi(t+1)2\u226412\u2211i=1KQi(t)2+12\u2211i=1Kyi(t)2+\u2211i=1KQi(t)yi(t){displaystyle {frac {1}{2}}sum _{i=1}^{K}Q_{i}(t+1)^{2}leq {frac {1}{2}}sum _{i=1}^{K}Q_{i}(t)^{2}+{frac {1}{2}}sum _{i=1}^{K}y_{i}(t)^{2}+sum _{i=1}^{K}Q_{i}(t)y_{i}(t)}It follows that\u0394L(t)=L(t+1)\u2212L(t)\u226412\u2211i=1Kyi(t)2+\u2211i=1KQi(t)yi(t){displaystyle Delta L(t)=L(t+1)-L(t)leq {frac {1}{2}}sum _{i=1}^{K}y_{i}(t)^{2}+sum _{i=1}^{K}Q_{i}(t)y_{i}(t)}Now define B as a positive constant that upper bounds the first term on the right-hand-side of the above inequality.  Such a constant exists because the y_i(t) values are bounded.  Then:\u0394L(t)\u2264B+\u2211i=1KQi(t)yi(t){displaystyle Delta L(t)leq B+sum _{i=1}^{K}Q_{i}(t)y_{i}(t)}Adding Vp(t) to both sides results in the following bound on the drift-plus-penalty expression:(Eq.\u00a02)\u00a0\u0394L(t)+Vp(t)\u2264B+Vp(t)+\u2211i=1KQi(t)yi(t){displaystyle ({text{Eq. }}2){text{  }}Delta L(t)+Vp(t)leq B+Vp(t)+sum _{i=1}^{K}Q_{i}(t)y_{i}(t)}The drift-plus-penalty algorithm (defined below) makes control actions every slot t that greedily minimize the right-hand-side of the above inequality. Intuitively, taking an action that minimizes the drift alone would be beneficial in terms of queue stability but would not minimize time average penalty.  Taking an action that minimizes the penalty alone would not necessarily stabilize the queues.  Thus, taking an action to minimize the weighted sum incorporates both objectives of queue stability and penalty minimization. The weight V can be tuned to place more or less emphasis on penalty minimization, which results in a performance tradeoff.[5]Drift-plus-penalty algorithm[edit]Let A{displaystyle A} be the abstract set of all possible control actions. Every slot t, observe the random event and the current queue values:Observe:\u00a0\u03c9(t),Q1(t),\u2026,QK(t){displaystyle {text{Observe:  }}omega (t),Q_{1}(t),ldots ,Q_{K}(t)}Given these observations for slot t, greedily choose a control action \u03b1(t)\u2208A{displaystyle alpha (t)in A} to minimize the following expression (breaking ties arbitrarily):VP(\u03b1(t),\u03c9(t))+\u2211i=1KQi(t)Yi(\u03b1(t),\u03c9(t)){displaystyle VP(alpha (t),omega (t))+sum _{i=1}^{K}Q_{i}(t)Y_{i}(alpha (t),omega (t))}Then update the queues for each i in {1, &#8230;, K} according to (Eq. 1).  Repeat this procedure for slot t+1.[5]Note that the random event and queue backlogs observed on slot t act as given constants when selecting the control action for the slot t minimization.  Thus, each slot involves a deterministic search for the minimizing control action over the set A.  A key feature of this algorithm is that it does not require knowledge of the probability distribution of the random event process.Approximate scheduling[edit]The above algorithm involves finding a minimum of a function over an abstract set A.  In general cases, the minimum might not exist, or might be difficult to find.  Thus, it is useful to assume the algorithm is implemented in an approximate manner as follows:  Define C as a non-negative constant, and assume that for all slots t, the control action \u03b1(t){displaystyle alpha (t)} is chosen in the set A to satisfy:VP(\u03b1(t),\u03c9(t))+\u2211i=1KQi(t)Yi(\u03b1(t),\u03c9(t))\u2264C+inf\u03b1\u2208A[VP(\u03b1,\u03c9(t))+\u2211i=1KQi(t)Yi(\u03b1,\u03c9(t))]{displaystyle {begin{aligned}&#038;VP(alpha (t),omega (t))+sum _{i=1}^{K}Q_{i}(t)Y_{i}(alpha (t),omega (t))\\leq {}&#038;C+inf _{alpha in A}[VP(alpha ,omega (t))+sum _{i=1}^{K}Q_{i}(t)Y_{i}(alpha ,omega (t))]end{aligned}}}Such a control action is called a C-additive approximation.[5]  The case C = 0 corresponds to exact minimization of the desired expression on every slot\u00a0t.Performance analysis[edit]This section shows the algorithm results in a time average penalty that is within O(1\/V) of optimality, with a corresponding O(V) tradeoff in average queue size.[5]Average penalty analysis[edit]Define an \u03c9{displaystyle omega }-only policy to be a stationary and randomized policy for choosing the control action \u03b1(t){displaystyle alpha (t)} based on the observed \u03c9(t){displaystyle omega (t)} only.  That is, an \u03c9{displaystyle omega }-only policy specifies, for each possible random event \u03c9\u2208\u03a9{displaystyle omega in Omega }, a conditional probability distribution for selecting a control action \u03b1(t)\u2208A{displaystyle alpha (t)in A} given that \u03c9(t)=\u03c9{displaystyle omega (t)=omega }. Such a policy makes decisions independent of current queue backlog.  Assume there exists an \u03c9{displaystyle omega }-only policy \u03b1\u2217(t){displaystyle alpha ^{*}(t)} that satisfies the following:(Eq.\u00a03)E[P(\u03b1\u2217(t),\u03c9(t))]=p\u2217=optimal time average penalty for the problem{displaystyle ({text{Eq. }}3)qquad E[P(alpha ^{*}(t),omega (t))]=p^{*}={text{optimal time average penalty for the problem}}}(Eq.\u00a04)E[Yi(\u03b1\u2217(t),\u03c9(t))]\u2a7d0\u2200i\u2208{1,\u2026,K}{displaystyle ({text{Eq. }}4)qquad E[Y_{i}(alpha ^{*}(t),omega (t))]leqslant 0qquad forall iin {1,ldots ,K}}The expectations above are with respect to the random variable \u03c9(t){displaystyle omega (t)} for slot t,{displaystyle t,} and the random control action \u03b1(t){displaystyle alpha (t)} chosen on slot t{displaystyle t} after observing \u03c9(t){displaystyle omega (t)}. Such a policy \u03b1\u2217(t){displaystyle alpha ^{*}(t)} can be shown to exist whenever the desired control problem is feasible and the event space for \u03c9(t){displaystyle omega (t)} and action space for \u03b1(t){displaystyle alpha (t)} are finite, or when mild closure properties are satisfied.[5]Let \u03b1(t){displaystyle alpha (t)} represent the action taken by a C-additive approximation of the drift-plus-penalty algorithm of the previous section, for some non-negative constant C. To simplify terminology, we call this action the drift-plus-penalty action, rather than the C-additive approximate drift-plus-penalty action.  Let \u03b1\u2217(t){displaystyle alpha ^{*}(t)} represent the \u03c9{displaystyle omega }-only decision:\u03b1(t)=drift-plus-penalty action for slot\u00a0t{displaystyle alpha (t)={text{drift-plus-penalty action for slot }}t}\u03b1\u2217(t)=\u03c9-only action that satisfies (Eq.3)-(Eq.4){displaystyle alpha ^{*}(t)=omega {text{-only action that satisfies (Eq.3)-(Eq.4)}}}Assume the drift-plus-penalty action \u03b1(t){displaystyle alpha (t)} is used on each and every slot.  By (Eq. 2), the drift-plus-penalty expression under this \u03b1(t){displaystyle alpha (t)} action satisfies the following for each slot t:{displaystyle t:}\u0394L(t)+Vp(t)\u2a7dB+Vp(t)+\u2211i=1KQi(t)yi(t)=B+VP(\u03b1(t),\u03c9(t))+\u2211i=1KQi(t)Yi(\u03b1(t),\u03c9(t))\u2a7dB+C+VP(\u03b1\u2217(t),\u03c9(t))+\u2211i=1KQi(t)Yi(\u03b1\u2217(t),\u03c9(t)){displaystyle {begin{aligned}Delta L(t)+Vp(t)&#038;leqslant B+Vp(t)+sum _{i=1}^{K}Q_{i}(t)y_{i}(t)\\&#038;=B+VP(alpha (t),omega (t))+sum _{i=1}^{K}Q_{i}(t)Y_{i}(alpha (t),omega (t))\\&#038;leqslant B+C+VP(alpha ^{*}(t),omega (t))+sum _{i=1}^{K}Q_{i}(t)Y_{i}(alpha ^{*}(t),omega (t))end{aligned}}}where the last inequality follows because action \u03b1(t){displaystyle alpha (t)} comes within an additive constant C{displaystyle C} of minimizing the preceding expression over all other actions in the set A,{displaystyle A,} including \u03b1\u2217(t).{displaystyle alpha ^{*}(t).} Taking expectations of the above inequality gives:E[\u0394(t)+Vp(t)]\u2a7dB+C+VE[P(\u03b1\u2217(t),\u03c9(t))]+\u2211i=1KE[Qi(t)Yi(\u03b1\u2217(t),\u03c9(t))]=B+C+VE[P(\u03b1\u2217(t),\u03c9(t))]+\u2211i=1KE[Qi(t)]E[Yi(\u03b1\u2217(t),\u03c9(t))]\u03b1\u2217(t),\u03c9(t)\u00a0are independent of\u00a0Qi(t)\u2a7dB+C+Vp\u2217Using Eq. 3 and Eq. 4{displaystyle {begin{aligned}E[Delta (t)+Vp(t)]&#038;leqslant B+C+VE[P(alpha ^{*}(t),omega (t))]+sum _{i=1}^{K}Eleft[Q_{i}(t)Y_{i}(alpha ^{*}(t),omega (t))right]\\&#038;=B+C+VE[P(alpha ^{*}(t),omega (t))]+sum _{i=1}^{K}E[Q_{i}(t)]E[Y_{i}(alpha ^{*}(t),omega (t))]&#038;&#038;alpha ^{*}(t),omega (t){text{ are independent of }}Q_{i}(t)\\&#038;leqslant B+C+Vp^{*}&#038;&#038;{text{Using Eq. 3 and Eq. 4}}end{aligned}}}Notice that the \u03b1\u2217(t){displaystyle alpha ^{*}(t)} action was never actually implemented. Its existence was used only for comparison purposes to reach the final inequality. Summing the above inequality over the first  0&#8243;\/> slots gives:(B+C+Vp\u2217)t\u2a7e\u2211\u03c4=0t\u22121E[\u0394(\u03c4)+Vp(\u03c4)]=E[L(t)]\u2212E[L(0)]+V\u2211\u03c4=0t\u22121E[p(\u03c4)]\u0394(\u03c4)=L(\u03c4+1)\u2212L(\u03c4)=E[L(t)]+V\u2211\u03c4=0t\u22121E[p(\u03c4)]assume\u00a0L(0)=0\u2a7eV\u2211\u03c4=0t\u22121E[p(\u03c4)]L(t)\u2a7e0{displaystyle {begin{aligned}(B+C+Vp^{*})t&#038;geqslant sum _{tau =0}^{t-1}E[Delta (tau )+Vp(tau )]\\&#038;=E[L(t)]-E[L(0)]+Vsum _{tau =0}^{t-1}E[p(tau )]&#038;&#038;Delta (tau )=L(tau +1)-L(tau )\\&#038;=E[L(t)]+Vsum _{tau =0}^{t-1}E[p(tau )]&#038;&#038;{text{assume }}L(0)=0\\&#038;geqslant Vsum _{tau =0}^{t-1}E[p(tau )]&#038;&#038;L(t)geqslant 0end{aligned}}}Dividing the above by Vt{displaystyle Vt} yields the following result, which holds for all slots 0:}&#8221;\/>1t\u2211\u03c4=0t\u22121E[p(\u03c4)]\u2a7dp\u2217+B+CV.{displaystyle {frac {1}{t}}sum _{tau =0}^{t-1}E[p(tau )]leqslant p^{*}+{frac {B+C}{V}}.}Thus, the time average expected penalty can be made arbitrarily close to the optimal value p\u2217{displaystyle p^{*}} by choosing V{displaystyle V} suitably large.  It can be shown that all virtual queues are mean rate stable, and so all desired constraints are satisfied.[5] The parameter V{displaystyle V} affects the size of the queues, which determines the speed at which the time average constraint functions converge to a non-positive number. A more detailed analysis on the size of the queues is given in the next subsection.Average queue size analysis[edit]Assume now there exists an \u03c9{displaystyle omega }-only policy \u03b1\u2217(t){displaystyle alpha ^{*}(t)}, possibly different from the one that satisfies (Eq. 3)-(Eq.4), that satisfies the following for some "},{"@context":"http:\/\/schema.org\/","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"item":{"@id":"https:\/\/wiki.edu.vn\/en\/wiki24\/#breadcrumbitem","name":"Enzyklop\u00e4die"}},{"@type":"ListItem","position":2,"item":{"@id":"https:\/\/wiki.edu.vn\/en\/wiki24\/drift-plus-penalty-wikipedia\/#breadcrumbitem","name":"Drift plus penalty &#8211; Wikipedia"}}]}]