ハミルトン-ヤコビ-ベルマン方程式 – Wikipedia

Posted on April 23, 2022 by lordneo

ハミルトン-ヤコビ-ベルマン(HJB)方程式（ハミルトン–ヤコビ–ベルマンほうていしき、英: Hamilton–Jacobi–Bellman equation）は、最適制御理論の根幹をなす偏微分方程式である。その解を「価値関数(value function)」と呼び、対象の動的システムとそれに関するコスト関数(cost function)の最小値を与える。

HJB方程式の局所解は最適性の必要条件を与えるが、全状態空間で解けば必要十分条件を与える。解は開ループ制御則となるが、閉ループ解も導ける。以上の手法は確率システムへも拡張することができるほか、古典的変分問題、例えば最速降下線問題も解くことができる。

HJB方程式は1950年代のリチャード・ベルマンとその共同研究者を先駆とする「動的計画法(Dynamic programming)」理論の成果として得られた^[1]。その離散時間形式は通常「ベルマン方程式」と呼称される。

連続時間においては、古典物理学におけるハミルトン-ヤコビ方程式 (ウィリアム・ローワン・ハミルトン (William Rowan Hamilton) および、カール・グスタフ・ヤコブ・ヤコビ (Carl Gustav Jacob Jacobi)による) の拡張形とみなせる。

Table of Contents

最適制御問題[編集]

時間範囲

[0,T]{displaystyle [0,T]}

$<img src="https://wikimedia.org/api/rest_v1/media/math/render/svg/35ccef2d3dc751e081375d51c111709d8a1d7ac6" aria-hidden="true" alt="{displaystyle [0,T]}" width="2207.2" height="1223.9">$ における次式の最適制御問題について考える。

V(x(0),0)=minu{∫0TC[x(t),u(t)]dt+D[x(T)]}{displaystyle V(x(0),0)=min _{u}left{int _{0}^{T}!!!C[x(t),u(t)],dt;+;D[x(T)]right}} <img src="https://wikimedia.org/api/rest_v1/media/math/render/svg/22e50bc24ee90ef8bd36fba461fa321c20f9f154" aria-hidden="true" alt="{displaystyle V(x(0),0)=min _{u}left{int _{0}^{T}!!!C[x(t),u(t)],dt;+;D[x(T)]right}}" width="21802.5" height="2730.8">

ここで、

C[ ]{displaystyle C[~]}

$<img src="https://wikimedia.org/api/rest_v1/media/math/render/svg/f31ee5dd16cd5ff532f33291148fcb2cd9f37756" aria-hidden="true" alt="{displaystyle C[~]}" width="1567.5" height="1223.9">$ は、スカラーの微分コスト関数(cost rate function)、

D[ ]{displaystyle D[~]}

$<img src="https://wikimedia.org/api/rest_v1/media/math/render/svg/baac34bc661434cb2677d00905f27fdd184cfa26" aria-hidden="true" alt="{displaystyle D[~]}" width="1635.5" height="1223.9">$ は終端状態の望ましさ、ないし経済価値を与える関数、

x(t){displaystyle x(t)}

$<img src="https://wikimedia.org/api/rest_v1/media/math/render/svg/d54c275db3a1e620737b58e143b0818107fa5f5c" aria-hidden="true" alt="x(t)" width="1713" height="1223.9">$ はシステムの状態ベクトル、

x(0){displaystyle x(0)}

$<img src="https://wikimedia.org/api/rest_v1/media/math/render/svg/8f7176643e6d36fa7674dc79fdff1a4daa068f5d" aria-hidden="true" alt="x(0)" width="1852" height="1223.9">$ はその初期値、

u(t){displaystyle u(t)}

$<img src="https://wikimedia.org/api/rest_v1/media/math/render/svg/b375df3b65d282f8715835dc91ccb22f46993959" aria-hidden="true" alt="u(t)" width="1713" height="1223.9">$ は我々が求めたいと考えている時間

0≤t≤T{displaystyle 0leq tleq T}

$<img src="https://wikimedia.org/api/rest_v1/media/math/render/svg/98d0fad09ba08d7aab9962e9a58c5dfa5412e6ad" aria-hidden="true" alt="{displaystyle 0leq tleq T}" width="4234.6" height="1008.6">$ の制御入力ベクトルである。

対象とするシステムは以下のダイナミクスに従うとする。

x˙(t)=F[x(t),u(t)]{displaystyle {dot {x}}(t)=F[x(t),u(t)],} <img src="https://wikimedia.org/api/rest_v1/media/math/render/svg/75ed914d23651470383a9791606522f81bf11bea" aria-hidden="true" alt="{displaystyle {dot {x}}(t)=F[x(t),u(t)],}" width="8391.4" height="1223.9">

ここで、

F[ ]{displaystyle F[~]}

$<img src="https://wikimedia.org/api/rest_v1/media/math/render/svg/6dd50b483237dc7a8e217f0ff5b2f359e8e099f0" aria-hidden="true" alt="{displaystyle F[~]}" width="1556.5" height="1223.9">$ はシステムの状態の時間発展を与える関数ベクトルである。

HJB方程式[編集]

このシステムに関するハミルトン-ヤコビ-ベルマン(HJB)方程式は次の偏微分方程式で表される。

V˙(x,t)+minu{∇V(x,t)⋅F(x,u)+C(x,u)}=0{displaystyle {dot {V}}(x,t)+min _{u}left{nabla V(x,t)cdot F(x,u)+C(x,u)right}=0} <img src="https://wikimedia.org/api/rest_v1/media/math/render/svg/effd6c61ac89b4403669ed51e8d45ec6294b0bb1" aria-hidden="true" alt="{displaystyle {dot {V}}(x,t)+min _{u}left{nabla V(x,t)cdot F(x,u)+C(x,u)right}=0}" width="20776.7" height="1798">

その終端条件は以下の通り。

V(x,T)=D(x),{displaystyle V(x,T)=D(x),,} <img src="https://wikimedia.org/api/rest_v1/media/math/render/svg/f9e2a8f575164227d3c7405de2daa38a85d0c4d9" aria-hidden="true" alt="{displaystyle V(x,T)=D(x),,}" width="7229.9" height="1223.9">

ここで、

a⋅b{displaystyle acdot b}

$<img src="https://wikimedia.org/api/rest_v1/media/math/render/svg/620419d3ed53abc98659a5fc0f3a5eb6177830ae" aria-hidden="true" alt="{displaystyle acdot b}" width="1681.9" height="936.9">$ はベクトル

a{displaystyle a}

$<img src="https://wikimedia.org/api/rest_v1/media/math/render/svg/ffd2487510aa438433a2579450ab2b3d557e5edc" aria-hidden="true" alt="a" width="529.5" height="721.6">$ と

b{displaystyle b}

$<img src="https://wikimedia.org/api/rest_v1/media/math/render/svg/f11423fbb2e967f986e36804a8ae4271734917c3" aria-hidden="true" alt="b" width="429.5" height="936.9">$ の内積、

∇{displaystyle nabla }

$<img src="https://wikimedia.org/api/rest_v1/media/math/render/svg/a3d0e93b78c50237f9ea83d027e4ebbdaef354b2" aria-hidden="true" alt="nabla" width="833.5" height="936.9">$ は勾配オペレーター。

上述の方程式に現れる未知のスカラー関数

V(x,t){displaystyle V(x,t)}

$<img src="https://wikimedia.org/api/rest_v1/media/math/render/svg/cb6eeabffa0b021abc76f4de9533b58bb5e0ac69" aria-hidden="true" alt="{displaystyle V(x,t)}" width="2927.7" height="1223.9">$ をベルマンの「価値関数」と呼ぶ。

V(x,t){displaystyle V(x,t)}

x{displaystyle x}

$<img src="https://wikimedia.org/api/rest_v1/media/math/render/svg/87f9e315fd7e2ba406057a97300593c4802b53e4" aria-hidden="true" alt="x" width="572.5" height="721.6">$ と時刻

t{displaystyle t}

$<img src="https://wikimedia.org/api/rest_v1/media/math/render/svg/65658b7b223af9e1acc877d848888ecdb4466560" aria-hidden="true" alt="t" width="361.5" height="865.1">$ から、時刻

T{displaystyle T}

$<img src="https://wikimedia.org/api/rest_v1/media/math/render/svg/ec7200acd984a1d3a3d7dc455e262fbe54f7f6e0" aria-hidden="true" alt="T" width="704.5" height="936.9">$ までシステムを最適に制御した場合に得られる最小コストを表している。

HJB方程式の導出[編集]

直感的には、HJB方程式は以下のように導出できる。

V(x(t),t){displaystyle V(x(t),t)}

$<img src="https://wikimedia.org/api/rest_v1/media/math/render/svg/3e9a3b7375c893c804c3cdc295ed9360adca0ca4" aria-hidden="true" alt="{displaystyle V(x(t),t)}" width="4068.2" height="1223.9">$ が上述の価値関数（すなわち最小コスト）であったとすれば、Richard-Bellmanの「最適性の原理」から、時間

t{displaystyle t}

$<img src="https://wikimedia.org/api/rest_v1/media/math/render/svg/65658b7b223af9e1acc877d848888ecdb4466560" aria-hidden="true" alt="t" width="361.5" height="865.1">$ から

t+dt{displaystyle t+dt}

$<img src="https://wikimedia.org/api/rest_v1/media/math/render/svg/fbf0d05f576ad31a7c7943c1daafab81501b7695" aria-hidden="true" alt="{displaystyle t+dt}" width="2469.4" height="1008.6">$ までの変化は次式で表現できる。

V(x(t),t)=minu{∫tt+dtC(x(s),u(s))ds+V(x(t+dt),t+dt)}.{displaystyle V(x(t),t)=min _{u}left{int _{t}^{t+dt}!!!!!!!!C(x(s),u(s)),ds;;+;;V(x(t!+!dt),t!+!dt)right}.} <img src="https://wikimedia.org/api/rest_v1/media/math/render/svg/f51c3601e57df0b60f8ceecdc9c6d958cce38713" aria-hidden="true" alt="{displaystyle V(x(t),t)=min _{u}left{int _{t}^{t+dt}!!!!!!!!C(x(s),u(s)),ds;;+;;V(x(t!+!dt),t!+!dt)right}.}" width="27347.5" height="2802.6">

右辺の第二項が次のようにテイラー展開できることに注目しよう。

V(x(t+dt),t+dt)=V(x(t),t)+V˙(x(t),t)dt+∇V(x(t),t)⋅x˙(t)dt+o(dt),{displaystyle V(x(t!+!dt),t!+!dt);=;V(x(t),t)+{dot {V}}(x(t),t),dt+nabla V(x(t),t)cdot {dot {x}}(t),dt;+;o(dt),} <img src="https://wikimedia.org/api/rest_v1/media/math/render/svg/c6402e9a465cf17a6f56d1a5dd7ab3844d74ba42" aria-hidden="true" alt="{displaystyle V(x(t!+!dt),t!+!dt);=;V(x(t),t)+{dot {V}}(x(t),t),dt+nabla V(x(t),t)cdot {dot {x}}(t),dt;+;o(dt),}" width="33736.7" height="1367.4">

o(dt){displaystyle o(dt)}

$<img src="https://wikimedia.org/api/rest_v1/media/math/render/svg/4478e60faab611fff5eec5404b0f8185aee1aada" aria-hidden="true" alt="{displaystyle o(dt)}" width="2149.5" height="1223.9">$ はテイラー展開の2次以上の高次項をランダウ記法で表現したものなので無視することにする。価値関数の式にこれを代入した後、両辺の

V(x(t),t){displaystyle V(x(t),t)}

dt{displaystyle dt}

$<img src="https://wikimedia.org/api/rest_v1/media/math/render/svg/ebee76a835701fd1f26047a09855f2ea36bb08fc" aria-hidden="true" alt="dt" width="885" height="936.9">$ で割ってゼロに漸近させれば、上述のHJB方程式が導出できる。

HJB方程式の解法[編集]

HJB方程式は通常、

t=T{displaystyle t=T}

$<img src="https://wikimedia.org/api/rest_v1/media/math/render/svg/5c6b2eabe8e275c2da71dbc61ca0ede73a418051" aria-hidden="true" alt="t = T" width="2400.1" height="936.9">$ から

t=0{displaystyle t=0}

$<img src="https://wikimedia.org/api/rest_v1/media/math/render/svg/43469ec032d858feae5aa87029e22eaaf0109e9c" aria-hidden="true" alt="t = 0" width="2196.1" height="936.9">$ へ向かって時間を遡る方向で解かれる。

全状態空間で解かれた場合、HJB方程式は最適性の必要十分条件を与える^[2]。

V{displaystyle V}

$<img src="https://wikimedia.org/api/rest_v1/media/math/render/svg/af0f6064540e84211d0ffe4dac72098adfa52845" aria-hidden="true" alt="V" width="769.5" height="936.9">$ に関して解ければ、そこからコスト関数を最小化する制御入力

u{displaystyle u}

$<img src="https://wikimedia.org/api/rest_v1/media/math/render/svg/c3e6bb763d22c20916ed4f0bb6bd49d7470cffd8" aria-hidden="true" alt="u" width="572.5" height="721.6">$ が得られる。

u(t)=arg⁡minu{∇V(x,t)⋅F(x,u)+C(x,u)}{displaystyle u(t)=arg min _{u}left{nabla V(x,t)cdot F(x,u)+C(x,u)right}} <img src="https://wikimedia.org/api/rest_v1/media/math/render/svg/43e2e830d76ae07fd63ede634ca2288a9b7d6150" aria-hidden="true" alt="{displaystyle u(t)=arg min _{u}left{nabla V(x,t)cdot F(x,u)+C(x,u)right}}" width="19398.8" height="1654.5">

一般的にHJB方程式は古典的な（なめらかな）解をもたない。そのような場合の解法として、粘性解 (Pierre-Louis Lions と　Michael Crandall)、ミニマックス解 (Andrei Izmailovich Subbotin 露) などが存在する。

確率システムへの拡張[編集]

システムの制御問題にベルマンの最適性原理を適用し、最適制御戦略を時間を遡る形で解く手法は、確率微分方程式で表現されるシステムの制御問題へ拡張することができる。上述の問題に良く似た次の問題を考えよう。

minE⁡{∫0TC(t,Xt,ut)dt+D(XT)}{displaystyle min operatorname {E} left{int _{0}^{T}C(t,X_{t},u_{t}),dt+D(X_{T})right}} <img src="https://wikimedia.org/api/rest_v1/media/math/render/svg/b589e289a096119ee5feff1172977b4eee35f7be" aria-hidden="true" alt="{displaystyle min operatorname {E} left{int _{0}^{T}C(t,X_{t},u_{t}),dt+D(X_{T})right}}" width="16055.8" height="2730.8">

ここでは、最適化したい（1次元）確率過程

(Xt)t∈[0,T]{displaystyle (X_{t})_{tin [0,T]},!}

$<img src="https://wikimedia.org/api/rest_v1/media/math/render/svg/051cf22d1408da8261237355120c3bc5b8e10286" aria-hidden="true" alt="{displaystyle (X_{t})_{tin [0,T]},!}" width="4400.3" height="1367.4">$ とその入力

(ut)t∈[0,T]{displaystyle (u_{t})_{tin [0,T]},!}

$<img src="https://wikimedia.org/api/rest_v1/media/math/render/svg/7791c7f4b8324e1c8d172da5e96e572a14cd297b" aria-hidden="true" alt="{displaystyle (u_{t})_{tin [0,T]},!}" width="4144.3" height="1367.4">$ を考える。確率過程

(Xt)t∈[0,T]{displaystyle (X_{t})_{tin [0,T]},!}

dXt=μ(t,Xt,ut)dt+σ(t,Xt,ut)dwt,{displaystyle dX_{t}=mu (t,X_{t},u_{t})dt+sigma (t,X_{t},u_{t})dw_{t},} <img src="https://wikimedia.org/api/rest_v1/media/math/render/svg/f36d14a26d4685cefbefdef03e9e765b2184b0be" aria-hidden="true" alt="{displaystyle dX_{t}=mu (t,X_{t},u_{t})dt+sigma (t,X_{t},u_{t})dw_{t},}" width="16485.9" height="1223.9">

ただし、

(wt)t∈[0,T]{displaystyle (w_{t})_{tin [0,T]},!}

$<img src="https://wikimedia.org/api/rest_v1/media/math/render/svg/4d023103d8305ac08989eef2f0bc8206dcc4473e" aria-hidden="true" alt="{displaystyle (w_{t})_{tin [0,T]},!}" width="4288.3" height="1367.4">$ は標準ブラウン運動（ウィーナー過程）であり、

μ,σ{displaystyle mu ,;sigma }

$<img src="https://wikimedia.org/api/rest_v1/media/math/render/svg/951da9b23eaf8bfd4724598da02460639abcec39" aria-hidden="true" alt="{displaystyle mu ,;sigma }" width="1898.9" height="936.9">$ は標準的な仮定を満たす可測関数であるとする。直観的に解釈すれば、状態変数

X{displaystyle X}

$<img src="https://wikimedia.org/api/rest_v1/media/math/render/svg/68baa052181f707c662844a465bfeeb135e82bab" aria-hidden="true" alt="X" width="852.5" height="936.9">$ は瞬間的に

μdt{displaystyle mu dt}

$<img src="https://wikimedia.org/api/rest_v1/media/math/render/svg/ededee4fc8019654300fa11685e91d1050510024" aria-hidden="true" alt="{displaystyle mu dt}" width="1488.5" height="1152.1">$ だけ増減するが、同時に正規ノイズ

σdwt{displaystyle sigma dw_{t}}

$<img src="https://wikimedia.org/api/rest_v1/media/math/render/svg/139f4b596c64a52495eb05ce2996797b2cc18341" aria-hidden="true" alt="{displaystyle sigma dw_{t}}" width="2168.1" height="1080.4">$ の影響も受けている。この時、ベルマンの最適性原理を用い、次に価値関数

V(Xt,t){displaystyle V(X_{t},t)}

$<img src="https://wikimedia.org/api/rest_v1/media/math/render/svg/e3b82a26e3eea109afac8412889fdf85300b7efa" aria-hidden="true" alt="{displaystyle V(X_{t},t)}" width="3539.3" height="1223.9">$ を伊藤のルールを使って展開することにより、価値関数についてのHJB方程式が得られる。

−∂V(x,t)∂t−minu{AuV(x,t)+C(t,x,u)}=0,{displaystyle -{frac {partial V(x,t)}{partial t}}-min _{u}left{{mathcal {A}}^{u}V(x,t)+C(t,x,u)right}=0,} <img src="https://wikimedia.org/api/rest_v1/media/math/render/svg/d1994b816b39b7eb70074c8d35922a48d0aca230" aria-hidden="true" alt="{displaystyle -{frac {partial V(x,t)}{partial t}}-min _{u}left{{mathcal {A}}^{u}V(x,t)+C(t,x,u)right}=0,}" width="20223.4" height="2515.6">

ここで、

Au{displaystyle {mathcal {A}}^{u}}

$<img src="https://wikimedia.org/api/rest_v1/media/math/render/svg/5b6a4e56608232e630be5f48dcd04b2d63a2bef3" aria-hidden="true" alt="{displaystyle {mathcal {A}}^{u}}" width="1330.7" height="1008.6">$ は無限小生成作用素（英語版）と呼ばれる関数作用素で以下のように表される。

AuV(x,t):=μ(t,x,u)∂V(x,t)∂x+12(σ(t,x,u))2∂2V(x,t)∂x2{displaystyle {mathcal {A}}^{u}V(x,t):=mu (t,x,u){frac {partial V(x,t)}{partial x}}+{frac {1}{2}}{Big (}sigma (t,x,u){Big )}^{2}{frac {partial ^{2}V(x,t)}{partial x^{2}}}} <img src="https://wikimedia.org/api/rest_v1/media/math/render/svg/395e6e8dabdde0a5abaaab1d490f301c2896befb" aria-hidden="true" alt="{displaystyle {mathcal {A}}^{u}V(x,t):=mu (t,x,u){frac {partial V(x,t)}{partial x}}+{frac {1}{2}}{Big (}sigma (t,x,u){Big )}^{2}{frac {partial ^{2}V(x,t)}{partial x^{2}}}}" width="25306" height="2659.1">

非確率的な設定の下では存在しなかった

σ2/2{displaystyle sigma ^{2}/2}

$<img src="https://wikimedia.org/api/rest_v1/media/math/render/svg/896d26c16f36f99cc74d72fdde057a171d7b06e4" aria-hidden="true" alt="{displaystyle sigma ^{2}/2}" width="2027.8" height="1367.4">$ に価値関数

V(x,t){displaystyle V(x,t)}

$<img src="https://wikimedia.org/api/rest_v1/media/math/render/svg/cb6eeabffa0b021abc76f4de9533b58bb5e0ac69" aria-hidden="true" alt="{displaystyle V(x,t)}" width="2927.7" height="1223.9">$ の

x{displaystyle x}

$<img src="https://wikimedia.org/api/rest_v1/media/math/render/svg/87f9e315fd7e2ba406057a97300593c4802b53e4" aria-hidden="true" alt="x" width="572.5" height="721.6">$ についての2回微分を掛けた項が足されているが、この項は伊藤の公式により生じている。終端条件は次式である。

V(x,T)=D(x).{displaystyle V(x,T)=D(x),!.} <img src="https://wikimedia.org/api/rest_v1/media/math/render/svg/9f1999852836a6785a1bd45bf961882ca802ccb6" aria-hidden="true" alt="{displaystyle V(x,T)=D(x),!.}" width="7063.2" height="1223.9">

ランダム性が消えたことに注意しよう。この場合、

V{displaystyle V,!}

$<img src="https://wikimedia.org/api/rest_v1/media/math/render/svg/a468e2ec5d9cd335f3679285c0151db396c41fc9" aria-hidden="true" alt="{displaystyle V,!}" width="936.2" height="936.9">$ の解は元の問題の最適解の候補であるにすぎず、さらなる検証が必要である^{[注釈 1]}。この技術は金融工学において、市場における最適投資戦略を定めるため広く用いられている（例：マートンのポートフォリオ問題)。

ハミルトン–ヤコビ–ベルマン–アイザックス方程式[編集]

プレイヤー1と2の二人からなる非協力ゼロサムゲームを考える^[3]。ミニマックス原理はこの設定でも成立し、プレイヤー1の最適制御問題はプレイヤー1の制御変数を

u{displaystyle u}

$<img src="https://wikimedia.org/api/rest_v1/media/math/render/svg/c3e6bb763d22c20916ed4f0bb6bd49d7470cffd8" aria-hidden="true" alt="u" width="572.5" height="721.6">$ として以下のように表される。

maxuminvE⁡{∫0TC(t,Xt,ut,vt)dt+D(XT)}{displaystyle max _{u}min _{v}operatorname {E} left{int _{0}^{T}C(t,X_{t},u_{t},v_{t}),dt+D(X_{T})right}} <img src="https://wikimedia.org/api/rest_v1/media/math/render/svg/3ef619ad466fea097936877248a096b44752eefa" aria-hidden="true" alt="{displaystyle max _{u}min _{v}operatorname {E} left{int _{0}^{T}C(t,X_{t},u_{t},v_{t}),dt+D(X_{T})right}}" width="19371.2" height="2730.8">

ただし、状態変数

(Xt)t∈[0,T]{displaystyle (X_{t})_{tin [0,T]},!}

dXt=μ(t,Xt,ut,vt)dt+σ(t,Xt,ut,vt)dwt{displaystyle dX_{t}=mu (t,X_{t},u_{t},v_{t})dt+sigma (t,X_{t},u_{t},v_{t})dw_{t}} <img src="https://wikimedia.org/api/rest_v1/media/math/render/svg/fa218390de58da759b7d4ab624196f81dc348874" aria-hidden="true" alt="{displaystyle dX_{t}=mu (t,X_{t},u_{t},v_{t})dt+sigma (t,X_{t},u_{t},v_{t})dw_{t}}" width="18780" height="1223.9">

この問題においてはプレイヤー2の制御変数

v{displaystyle v}

$<img src="https://wikimedia.org/api/rest_v1/media/math/render/svg/e07b00e7fc0847fbd16391c778d65bc25c452597" aria-hidden="true" alt="v" width="485.5" height="721.6">$ が問題に導入されている。プレイヤー1の問題の価値関数は以下のハミルトン–ヤコビ–ベルマン–アイザックス方程式（HJBI方程式、英: Hamilton–Jacobi–Bellman–Isaacs equation (HJBI equation)）^{[注釈 2]}の粘性解となる。

−∂V(x,t)∂t−maxuminu{Au,vV(x,t)+C(t,x,u,v)}=0,{displaystyle -{frac {partial V(x,t)}{partial t}}-max _{u}min _{u}left{{mathcal {A}}^{u,v}V(x,t)+C(t,x,u,v)right}=0,} <img src="https://wikimedia.org/api/rest_v1/media/math/render/svg/7609b923fbe36bc97c150f4d53f2b1ebfd255c6c" aria-hidden="true" alt="{displaystyle -{frac {partial V(x,t)}{partial t}}-max _{u}min _{u}left{{mathcal {A}}^{u,v}V(x,t)+C(t,x,u,v)right}=0,}" width="23723.5" height="2515.6">

ここで、

Au,v{displaystyle {mathcal {A}}^{u,v}}

$<img src="https://wikimedia.org/api/rest_v1/media/math/render/svg/ffa78555b0d9cadc1cf1c839dcd9d22ceba9f9e1" aria-hidden="true" alt="{displaystyle {mathcal {A}}^{u,v}}" width="1870.9" height="1008.6">$ は無限小生成作用素で以下のように表される。

Au,vV(x,t):=μ(t,x,u,v)∂V(x,t)∂x+12(σ(t,x,u,v))2∂2V(x,t)∂x2{displaystyle {mathcal {A}}^{u,v}V(x,t):=mu (t,x,u,v){frac {partial V(x,t)}{partial x}}+{frac {1}{2}}{Big (}sigma (t,x,u,v){Big )}^{2}{frac {partial ^{2}V(x,t)}{partial x^{2}}}} <img src="https://wikimedia.org/api/rest_v1/media/math/render/svg/943b46a121de5716f21580173a07f20ae9c4200a" aria-hidden="true" alt="{displaystyle {mathcal {A}}^{u,v}V(x,t):=mu (t,x,u,v){frac {partial V(x,t)}{partial x}}+{frac {1}{2}}{Big (}sigma (t,x,u,v){Big )}^{2}{frac {partial ^{2}V(x,t)}{partial x^{2}}}}" width="27707.6" height="2659.1">

終端条件は次式である。

V(x,T)=D(x).{displaystyle V(x,T)=D(x),!.} <img src="https://wikimedia.org/api/rest_v1/media/math/render/svg/9f1999852836a6785a1bd45bf961882ca802ccb6" aria-hidden="true" alt="{displaystyle V(x,T)=D(x),!.}" width="7063.2" height="1223.9">

HJBI方程式に含まれる

u,v{displaystyle u,v}

$<img src="https://wikimedia.org/api/rest_v1/media/math/render/svg/7e66f4b32a0181923cc1337a5634f38241e5c697" aria-hidden="true" alt="u,v" width="1503.2" height="865.1">$ についての最大化問題と最小化問題の解がこのゲームの(マルコフ)ナッシュ均衡となる。

最適停止問題[編集]

次の最適停止問題を考える^[4]。

maxτE⁡{∫0τC(t,Xt)dt+D(XT)1{τ=T}+F(τ,Xτ)1{τ<T}}{displaystyle max _{tau }operatorname {E} left{int _{0}^{tau }C(t,X_{t}),dt+D(X_{T})mathbf {1} {tau =T}+F(tau ,X_{tau })mathbf {1} {tau <img src="https://wikimedia.org/api/rest_v1/media/math/render/svg/b108dd2b957606904d00b3c2fe27964af4692e18" aria-hidden="true" alt="{displaystyle max _{tau }operatorname {E} left{int _{0}^{tau }C(t,X_{t}),dt+D(X_{T})mathbf {1} {tau =T}+F(tau ,X_{tau })mathbf {1} {tau &lt;T}right}}" width="28017.9" height="2659.1">

ここで

1{⋅}{displaystyle mathbf {1} {;cdot ;}}

$<img src="https://wikimedia.org/api/rest_v1/media/math/render/svg/1ac681e109d159934e07c9df84def5ecf916a4c2" aria-hidden="true" alt="{displaystyle mathbf {1} {;cdot ;}}" width="2410.6" height="1223.9">$ は特性関数で

{⋅}{displaystyle {;cdot ;}}

$<img src="https://wikimedia.org/api/rest_v1/media/math/render/svg/857a56b7de7a464cdc171f3c38d3403ca8822d16" aria-hidden="true" alt="{displaystyle {;cdot ;}}" width="1835.1" height="1223.9">$ 内の事象が起きれば1、そうでなければ0を返す関数である。状態変数

(Xt)t∈[0,T]{displaystyle (X_{t})_{tin [0,T]},!}

dXt=μ(t,Xt)dt+σ(t,Xt)dwt{displaystyle dX_{t}=mu (t,X_{t})dt+sigma (t,X_{t})dw_{t}} <img src="https://wikimedia.org/api/rest_v1/media/math/render/svg/433ec67362f15911510906a85fead61fdd9084ff" aria-hidden="true" alt="{displaystyle dX_{t}=mu (t,X_{t})dt+sigma (t,X_{t})dw_{t}}" width="13460.8" height="1223.9">

すると、価値関数

V(x,t){displaystyle V(x,t)}

min{−∂V(x,t)∂t−AV(x,t)−C(t,x),V(x,t)−F(t,x)}=0,{displaystyle min left{-{frac {partial V(x,t)}{partial t}}-{mathcal {A}}V(x,t)-C(t,x),quad V(x,t)-F(t,x)right}=0,} <img src="https://wikimedia.org/api/rest_v1/media/math/render/svg/f24ae450eb2573d00c363764db81f332b6ade960" aria-hidden="true" alt="{displaystyle min left{-{frac {partial V(x,t)}{partial t}}-{mathcal {A}}V(x,t)-C(t,x),quad V(x,t)-F(t,x)right}=0,}" width="27698.1" height="2730.8">

ただし、無限小生成作用素

A{displaystyle {mathcal {A}}}

$<img src="https://wikimedia.org/api/rest_v1/media/math/render/svg/280ae03440942ab348c2ca9b8db6b56ffa9618f8" aria-hidden="true" alt="{mathcal {A}}" width="819.5" height="1008.6">$ は次のように表される。

AV(x,t):=μ(t,x)∂V(x,t)∂x+12(σ(t,x))2∂2V(x,t)∂x2{displaystyle {mathcal {A}}V(x,t):=mu (t,x){frac {partial V(x,t)}{partial x}}+{frac {1}{2}}{Big (}sigma (t,x){Big )}^{2}{frac {partial ^{2}V(x,t)}{partial x^{2}}}} <img src="https://wikimedia.org/api/rest_v1/media/math/render/svg/76fa5558c5aad1e6525fbb6224db5a6886b1dc15" aria-hidden="true" alt="{displaystyle {mathcal {A}}V(x,t):=mu (t,x){frac {partial V(x,t)}{partial x}}+{frac {1}{2}}{Big (}sigma (t,x){Big )}^{2}{frac {partial ^{2}V(x,t)}{partial x^{2}}}}" width="22759.5" height="2659.1">

終端条件は次式である。

V(x,T)=D(x).{displaystyle V(x,T)=D(x),!.} <img src="https://wikimedia.org/api/rest_v1/media/math/render/svg/9f1999852836a6785a1bd45bf961882ca802ccb6" aria-hidden="true" alt="{displaystyle V(x,T)=D(x),!.}" width="7063.2" height="1223.9">

最適制御となる停止時刻（英語版）は次で与えられる。

τ∗:=min{inf{t∈[0,T]:V(Xt,t)=F(t,Xt)},T}{displaystyle tau ^{*}:=min{inf{tin [0,T];:;V(X_{t},t)=F(t,X_{t})},;T}} <img src="https://wikimedia.org/api/rest_v1/media/math/render/svg/ac2512a7ee126d93883b62b748365c2d670ae3f3" aria-hidden="true" alt="{displaystyle tau ^{*}:=min{inf{tin [0,T];:;V(X_{t},t)=F(t,X_{t})},;T}}" width="22421.4" height="1223.9">

最適停止問題はアメリカンオプションの価格付け問題などで現れる。

Linear Quadratic Gaussian (LQG)制御への応用[編集]

一例として、二次形式のコスト関数を持つ線形確率システムの問題を扱ってみよう。以下のダイナミクスを持つシステムを考える。

dxt=(axt+but)dt+σdwt,{displaystyle dx_{t}=(ax_{t}+bu_{t})dt+sigma dw_{t},} <img src="https://wikimedia.org/api/rest_v1/media/math/render/svg/78ad2d462e3b9a2caf817173f45a75c247149b7b" aria-hidden="true" alt="{displaystyle dx_{t}=(ax_{t}+bu_{t})dt+sigma dw_{t},}" width="12157.4" height="1223.9">

微分コスト関数が、

C(xt,ut)=r(t)ut2/2+q(t)xt2/2{displaystyle C(x_{t},u_{t})=r(t)u_{t}^{2}/2+q(t)x_{t}^{2}/2}

$<img src="https://wikimedia.org/api/rest_v1/media/math/render/svg/ee0816f7c8f1bc9bba3ffe0724f1c17ca8ae3b56" aria-hidden="true" alt="{displaystyle C(x_{t},u_{t})=r(t)u_{t}^{2}/2+q(t)x_{t}^{2}/2}" width="13645.7" height="1367.4">$ で与えられるとすれば、HJB方程式は以下のように与えられる。

−∂V(x,t)∂t=12q(t)x2+∂V(x,t)∂xax−b22r(t)(∂V(x,t)∂x)2+12σ2∂2V(x,t)∂x2.{displaystyle -{frac {partial V(x,t)}{partial t}}={frac {1}{2}}q(t)x^{2}+{frac {partial V(x,t)}{partial x}}ax-{frac {b^{2}}{2r(t)}}left({frac {partial V(x,t)}{partial x}}right)^{2}+{frac {1}{2}}sigma ^{2}{frac {partial ^{2}V(x,t)}{partial x^{2}}}.} <img src="https://wikimedia.org/api/rest_v1/media/math/render/svg/67b2203670bd6c14dce4b9bf8a89c28ecd164155" aria-hidden="true" alt="{displaystyle -{frac {partial V(x,t)}{partial t}}={frac {1}{2}}q(t)x^{2}+{frac {partial V(x,t)}{partial x}}ax-{frac {b^{2}}{2r(t)}}left({frac {partial V(x,t)}{partial x}}right)^{2}+{frac {1}{2}}sigma ^{2}{frac {partial ^{2}V(x,t)}{partial x^{2}}}.}" width="32801.9" height="2946.1">

二次形式の価値関数を仮定する事により、通常のLQG制御と同様に、価値関数のヘシアンに関する一般的なリカッチ方程式を得ることが出来る。

HJB方程式の応用[編集]

HJB方程式は連続時間の最適制御において基本となる方程式であり、様々な分野で応用されている。例えば、

などが挙げられる。

参考文献[編集]

出典は列挙するだけでなく、脚注などを用いてどの記述の情報源であるかを明記してください。記事の信頼性向上にご協力をお願いいたします。（2016年10月）

ハミルトン-ヤコビ-ベルマン方程式 – Wikipedia

最適制御問題[編集]

HJB方程式[編集]

HJB方程式の導出[編集]

HJB方程式の解法[編集]

確率システムへの拡張[編集]

ハミルトン–ヤコビ–ベルマン–アイザックス方程式[編集]

最適停止問題[編集]

Linear Quadratic Gaussian (LQG)制御への応用[編集]

HJB方程式の応用[編集]

関連項目[編集]

注釈[編集]

出典[編集]

参考文献[編集]

関連文献[編集]

Recent Posts

Recent Comments

Archives

Categories

Meta