[{"@context":"http:\/\/schema.org\/","@type":"BlogPosting","@id":"https:\/\/wiki.edu.vn\/en\/wiki19\/markov-kernel-wikipedia\/#BlogPosting","mainEntityOfPage":"https:\/\/wiki.edu.vn\/en\/wiki19\/markov-kernel-wikipedia\/","headline":"Markov kernel – Wikipedia","name":"Markov kernel – Wikipedia","description":"In probability theory, a Markov kernel (also known as a stochastic kernel or probability kernel) is a map that in","datePublished":"2020-02-18","dateModified":"2020-02-18","author":{"@type":"Person","@id":"https:\/\/wiki.edu.vn\/en\/wiki19\/author\/lordneo\/#Person","name":"lordneo","url":"https:\/\/wiki.edu.vn\/en\/wiki19\/author\/lordneo\/","image":{"@type":"ImageObject","@id":"https:\/\/secure.gravatar.com\/avatar\/c9645c498c9701c88b89b8537773dd7c?s=96&d=mm&r=g","url":"https:\/\/secure.gravatar.com\/avatar\/c9645c498c9701c88b89b8537773dd7c?s=96&d=mm&r=g","height":96,"width":96}},"publisher":{"@type":"Organization","name":"Enzyklop\u00e4die","logo":{"@type":"ImageObject","@id":"https:\/\/wiki.edu.vn\/wiki4\/wp-content\/uploads\/2023\/08\/download.jpg","url":"https:\/\/wiki.edu.vn\/wiki4\/wp-content\/uploads\/2023\/08\/download.jpg","width":600,"height":60}},"image":{"@type":"ImageObject","@id":"https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/0297c24d37da698d6c360440dd83c2f60a1ce3b6","url":"https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/0297c24d37da698d6c360440dd83c2f60a1ce3b6","height":"","width":""},"url":"https:\/\/wiki.edu.vn\/en\/wiki19\/markov-kernel-wikipedia\/","wordCount":16309,"articleBody":"In probability theory, a Markov kernel (also known as a stochastic kernel or probability kernel) is a map that in the general theory of Markov processes plays the role that the transition matrix does in the theory of Markov processes with a finite state space.[1]Table of ContentsFormal definition[edit]Examples[edit]Simple random walk on the integers[edit]General Markov processes with countable state space[edit]Markov kernel defined by a kernel function and a measure[edit]Measurable functions[edit]Galton\u2013Watson process[edit]Composition of Markov Kernels and the Markov Category[edit]Probability Space defined by Probability Distribution and a Markov Kernel[edit]Properties[edit]Semidirect product[edit]Regular conditional distribution[edit]Generalizations[edit]References[edit]Formal definition[edit]Let (X,A){displaystyle (X,{mathcal {A}})} and (Y,B){displaystyle (Y,{mathcal {B}})} be measurable spaces. A Markov kernel with source (X,A){displaystyle (X,{mathcal {A}})} and target (Y,B){displaystyle (Y,{mathcal {B}})} is a map \u03ba:B\u00d7X\u2192[0,1]{displaystyle kappa :{mathcal {B}}times Xto [0,1]} with the following properties:For every (fixed) B\u2208B{displaystyle Bin {mathcal {B}}}, the map x\u21a6\u03ba(B,x){displaystyle xmapsto kappa (B,x)} is A{displaystyle {mathcal {A}}}-measurableFor every (fixed) x\u2208X{displaystyle xin X}, the map B\u21a6\u03ba(B,x){displaystyle Bmapsto kappa (B,x)} is a probability measure on (Y,B){displaystyle (Y,{mathcal {B}})}In other words it associates to each point x\u2208X{displaystyle xin X} a probability measure \u03ba(dy|x):B\u21a6\u03ba(B,x){displaystyle kappa (dy|x):Bmapsto kappa (B,x)} on (Y,B){displaystyle (Y,{mathcal {B}})} such that, for every measurable set B\u2208B{displaystyle Bin {mathcal {B}}}, the map x\u21a6\u03ba(B,x){displaystyle xmapsto kappa (B,x)} is measurable with respect to the \u03c3{displaystyle sigma }-algebra A{displaystyle {mathcal {A}}}.[2]Examples[edit]Simple random walk on the integers[edit]Take X=Y=Z{displaystyle X=Y=mathbb {Z} }, and A=B=P(Z){displaystyle {mathcal {A}}={mathcal {B}}={mathcal {P}}(mathbb {Z} )} (the power set of Z{displaystyle mathbb {Z} }). Then a Markov kernel is fully determined by the probability it assigns to singletons {m},m\u2208Y=Z{displaystyle {m},,min Y=mathbb {Z} } for each n\u2208X=Z{displaystyle nin X=mathbb {Z} }:\u03ba(B|n)=\u2211m\u2208B\u03ba({m}|n),\u2200n\u2208Z,\u2200B\u2208B{displaystyle kappa (B|n)=sum _{min B}kappa ({m}|n),qquad forall nin mathbb {Z} ,,forall Bin {mathcal {B}}}.Now the random walk \u03ba{displaystyle kappa } that goes to the right with probability p{displaystyle p} and to the left with probability 1\u2212p{displaystyle 1-p} is defined by\u03ba({m}|n)=p\u03b4m,n+1+(1\u2212p)\u03b4m,n\u22121,\u2200n,m\u2208Z{displaystyle kappa ({m}|n)=pdelta _{m,n+1}+(1-p)delta _{m,n-1},quad forall n,min mathbb {Z} }where \u03b4{displaystyle delta } is the Kronecker delta. The transition probabilities P(m|n)=\u03ba({m}|n){displaystyle P(m|n)=kappa ({m}|n)} for the random walk are equivalent to the Markov kernel.General Markov processes with countable state space[edit]More generally take X{displaystyle X} and Y{displaystyle Y} both countable and A=P(X),\u00a0B=P(Y){displaystyle {mathcal {A}}={mathcal {P}}(X), {mathcal {B}}={mathcal {P}}(Y)}.Again a Markov kernel is defined by the probability it assigns to singleton sets for each i\u2208X{displaystyle iin X}\u03ba(B|i)=\u2211j\u2208B\u03ba({j}|i),\u2200i\u2208X,\u2200B\u2208B{displaystyle kappa (B|i)=sum _{jin B}kappa ({j}|i),qquad forall iin X,,forall Bin {mathcal {B}}},We define a Markov process by defining a transition probability P(j|i)=Kji{displaystyle P(j|i)=K_{ji}} where the numbers Kji{displaystyle K_{ji}} define a (countable) stochastic matrix (Kji){displaystyle (K_{ji})} i.e.Kji\u22650,\u2200(j,i)\u2208Y\u00d7X,\u2211j\u2208YKji=1,\u2200i\u2208X.{displaystyle {begin{aligned}K_{ji}&geq 0,qquad &forall (j,i)in Ytimes X,\\sum _{jin Y}K_{ji}&=1,qquad &forall iin X.\\end{aligned}}}We then define\u03ba({j}|i)=Kji=P(j|i),\u2200i\u2208X,\u2200B\u2208B{displaystyle kappa ({j}|i)=K_{ji}=P(j|i),qquad forall iin X,quad forall Bin {mathcal {B}}}.Again the transition probability, the stochastic matrix and the Markov kernel are equivalent reformulations.Markov kernel defined by a kernel function and a measure[edit]Let \u03bd{displaystyle nu } be a measure on (Y,B){displaystyle (Y,{mathcal {B}})}, and k:Y\u00d7X\u2192[0,\u221e]{displaystyle k:Ytimes Xto [0,infty ]} a measurable function with respect to the product \u03c3{displaystyle sigma }-algebra A\u2297B{displaystyle {mathcal {A}}otimes {mathcal {B}}} such that\u222bYk(y,x)\u03bd(dy)=1,\u2200x\u2208X{displaystyle int _{Y}k(y,x)nu (mathrm {d} y)=1,qquad forall xin X},then \u03ba(dy|x)=k(y,x)\u03bd(dy){displaystyle kappa (dy|x)=k(y,x)nu (dy)} i.e. the mapping{\u03ba:B\u00d7X\u2192[0,1]\u03ba(B|x)=\u222bBk(y,x)\u03bd(dy){displaystyle {begin{cases}kappa :{mathcal {B}}times Xto [0,1]\\kappa (B|x)=int _{B}k(y,x)nu (mathrm {d} y)end{cases}}}defines a Markov kernel.[3] This example generalises the countable Markov process example where \u03bd{displaystyle nu } was the counting measure. Moreover it encompasses other important examples such as the convolution kernels, in particular the Markov kernels defined by the heat equation. The latter example includes the Gaussian kernel on X=Y=R{displaystyle X=Y=mathbb {R} } with \u03bd(dx)=dx{displaystyle nu (dx)=dx} standard Lebesgue measure andkt(y,x)=12\u03c0te\u2212(y\u2212x)2\/(2t2){displaystyle k_{t}(y,x)={frac {1}{{sqrt {2pi }}t}}e^{-(y-x)^{2}\/(2t^{2})}}.Measurable functions[edit]Take (X,A){displaystyle (X,{mathcal {A}})} and (Y,B){displaystyle (Y,{mathcal {B}})} arbitrary measurable spaces, and let f:X\u2192Y{displaystyle f:Xto Y} be a measurable function. Now define \u03ba(dy|x)=\u03b4f(x)(dy){displaystyle kappa (dy|x)=delta _{f(x)}(dy)} i.e.\u03ba(B|x)=1B(f(x))=1f\u22121(B)(x)={1if\u00a0f(x)\u2208B0otherwise{displaystyle kappa (B|x)=mathbf {1} _{B}(f(x))=mathbf {1} _{f^{-1}(B)}(x)={begin{cases}1&{text{if }}f(x)in B\\0&{text{otherwise}}end{cases}}} for all B\u2208B{displaystyle Bin {mathcal {B}}}.Note that the indicator function 1f\u22121(B){displaystyle mathbf {1} _{f^{-1}(B)}} is A{displaystyle {mathcal {A}}}-measurable for all B\u2208B{displaystyle Bin {mathcal {B}}} iff f{displaystyle f} is measurable.This example allows us to think of a Markov kernel as a generalised function with a (in general) random rather than certain value. That is, it is a multivalued function where the values are not equally weighted.Galton\u2013Watson process[edit]As a less obvious example, take X=N,A=P(N){displaystyle X=mathbb {N} ,{mathcal {A}}={mathcal {P}}(mathbb {N} )}, and (Y,B){displaystyle (Y,{mathcal {B}})} the real numbers R{displaystyle mathbb {R} } with the standard sigma algebra of Borel sets. Then\u03ba(B|n)={1B(0)n=0Pr(\u03be1+\u22ef+\u03bex\u2208B)n\u22600{displaystyle kappa (B|n)={begin{cases}mathbf {1} _{B}(0)&n=0\\Pr(xi _{1}+cdots +xi _{x}in B)&nneq 0\\end{cases}}}[clarification needed]with i.i.d. random variables \u03bei{displaystyle xi _{i}} (usually with mean 0) and where 1B{displaystyle mathbf {1} _{B}} is the indicator function. For the simple case of coin flips this models the different levels of a Galton board.Composition of Markov Kernels and the Markov Category[edit]Given measurable spaces (X,A){displaystyle (X,{mathcal {A}})}, (Y,B){displaystyle (Y,{mathcal {B}})} we consider a Markov kernel \u03ba:B\u00d7X\u2192[0,1]{displaystyle kappa :{mathcal {B}}times Xto [0,1]} as a morphism \u03ba:X\u2192Y{displaystyle kappa :Xto Y}. Intuitively, rather than assigning to each x\u2208X{displaystyle xin X} a sharply defined point y\u2208Y{displaystyle yin Y} the kernel assigns a “fuzzy” point in Y{displaystyle Y} which is only known with some level of uncertainty, much like actual physical measurements. If we have a third measurable space (Z,C){displaystyle (Z,{mathcal {C}})}, and probability kernels \u03ba:X\u2192Y{displaystyle kappa :Xto Y} and \u03bb:Y\u2192Z{displaystyle lambda :Yto Z}, we can define a composition \u03bb\u2218\u03ba:X\u2192Z{displaystyle lambda circ kappa :Xto Z} by(\u03bb\u2218\u03ba)(dz|x)=\u222bY\u03bb(dz|y)\u03ba(dy|x){displaystyle (lambda circ kappa )(dz|x)=int _{Y}lambda (dz|y)kappa (dy|x)}.The composition is associative by the Monotone Convergence Theorem and the identity function considered as a Markov kernel (i.e. the delta measure \u03ba1(dx\u2032|x)=\u03b4x(dx\u2032){displaystyle kappa _{1}(dx’|x)=delta _{x}(dx’)}) is the unit for this composition.This composition defines the structure of a category on the measurable spaces with Markov kernels as morphisms first defined by Lawvere.[4] The category has the empty set as initial object and the one point set \u2217{displaystyle *} as the terminal object. From this point of view a probability space (\u03a9,A,P){displaystyle (Omega ,{mathcal {A}},mathbb {P} )} is the same thing as a pointed space \u2217\u2192\u03a9{displaystyle *to Omega } in the Markov category.Probability Space defined by Probability Distribution and a Markov Kernel[edit]A composition of a probability space (X,A,PX){displaystyle (X,{mathcal {A}},P_{X})} and a probability kernel \u03ba:(X,A)\u2192(Y,B){displaystyle kappa :(X,{mathcal {A}})to (Y,{mathcal {B}})} defines a probability space (Y,B,PY=\u03ba\u2218PX){displaystyle (Y,{mathcal {B}},P_{Y}=kappa circ P_{X})}, where the probability measure is given byPY(B)=\u222bX\u222bB\u03ba(dy|x)PX(dx)=\u222bX\u03ba(B|x)PX(dx)=EPX\u03ba(B|\u22c5).{displaystyle P_{Y}(B)=int _{X}int _{B}kappa (dy|x)P_{X}(dx)=int _{X}kappa (B|x)P_{X}(dx)=mathbb {E} _{P_{X}}kappa (B|cdot ).}Properties[edit]Semidirect product[edit]Let (X,A,P){displaystyle (X,{mathcal {A}},P)} be a probability space and \u03ba{displaystyle kappa } a Markov kernel from (X,A){displaystyle (X,{mathcal {A}})} to some (Y,B){displaystyle (Y,{mathcal {B}})}. Then there exists a unique measure Q{displaystyle Q} on (X\u00d7Y,A\u2297B){displaystyle (Xtimes Y,{mathcal {A}}otimes {mathcal {B}})}, such that:Q(A\u00d7B)=\u222bA\u03ba(B|x)P(dx),\u2200A\u2208A,\u2200B\u2208B.{displaystyle Q(Atimes B)=int _{A}kappa (B|x),P(dx),quad forall Ain {mathcal {A}},quad forall Bin {mathcal {B}}.}Regular conditional distribution[edit]Let (S,Y){displaystyle (S,Y)} be a Borel space, X{displaystyle X} a (S,Y){displaystyle (S,Y)}-valued random variable on the measure space (\u03a9,F,P){displaystyle (Omega ,{mathcal {F}},P)} and G\u2286F{displaystyle {mathcal {G}}subseteq {mathcal {F}}} a sub-\u03c3{displaystyle sigma }-algebra. Then there exists a Markov kernel \u03ba{displaystyle kappa } from (\u03a9,G){displaystyle (Omega ,{mathcal {G}})} to (S,Y){displaystyle (S,Y)}, such that \u03ba(\u22c5,B){displaystyle kappa (cdot ,B)} is a version of the conditional expectation E[1{X\u2208B}\u2223G]{displaystyle mathbb {E} [mathbf {1} _{{Xin B}}mid {mathcal {G}}]} for every B\u2208Y{displaystyle Bin Y}, i.e.P(X\u2208B\u2223G)=E[1{X\u2208B}\u2223G]=\u03ba(\u22c5,B),P-a.s.\u2200B\u2208G.{displaystyle P(Xin Bmid {mathcal {G}})=mathbb {E} left[mathbf {1} _{{Xin B}}mid {mathcal {G}}right]=kappa (cdot ,B),qquad P{text{-a.s.}},,forall Bin {mathcal {G}}.}It is called regular conditional distribution of X{displaystyle X} given G{displaystyle {mathcal {G}}} and is not uniquely defined.Generalizations[edit]Transition kernels generalize Markov kernels in the sense that for all x\u2208X{displaystyle xin X}, the mapB\u21a6\u03ba(B|x){displaystyle Bmapsto kappa (B|x)}can be any type of (non negative) measure, not necessarily a probability measure.References[edit]\u00a736. Kernels and semigroups of kernels"},{"@context":"http:\/\/schema.org\/","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"item":{"@id":"https:\/\/wiki.edu.vn\/en\/wiki19\/#breadcrumbitem","name":"Enzyklop\u00e4die"}},{"@type":"ListItem","position":2,"item":{"@id":"https:\/\/wiki.edu.vn\/en\/wiki19\/markov-kernel-wikipedia\/#breadcrumbitem","name":"Markov kernel – Wikipedia"}}]}]