[{"@context":"http:\/\/schema.org\/","@type":"BlogPosting","@id":"https:\/\/wiki.edu.vn\/en\/wiki40\/low-rank-matrix-approximations-wikipedia\/#BlogPosting","mainEntityOfPage":"https:\/\/wiki.edu.vn\/en\/wiki40\/low-rank-matrix-approximations-wikipedia\/","headline":"Low-rank matrix approximations &#8211; Wikipedia","name":"Low-rank matrix approximations &#8211; Wikipedia","description":"before-content-x4 Low-rank matrix approximations are essential tools in the application of kernel methods to large-scale learning problems.[1] after-content-x4 Kernel methods","datePublished":"2022-03-15","dateModified":"2022-03-15","author":{"@type":"Person","@id":"https:\/\/wiki.edu.vn\/en\/wiki40\/author\/lordneo\/#Person","name":"lordneo","url":"https:\/\/wiki.edu.vn\/en\/wiki40\/author\/lordneo\/","image":{"@type":"ImageObject","@id":"https:\/\/secure.gravatar.com\/avatar\/c9645c498c9701c88b89b8537773dd7c?s=96&d=mm&r=g","url":"https:\/\/secure.gravatar.com\/avatar\/c9645c498c9701c88b89b8537773dd7c?s=96&d=mm&r=g","height":96,"width":96}},"publisher":{"@type":"Organization","name":"Enzyklop\u00e4die","logo":{"@type":"ImageObject","@id":"https:\/\/wiki.edu.vn\/wiki4\/wp-content\/uploads\/2023\/08\/download.jpg","url":"https:\/\/wiki.edu.vn\/wiki4\/wp-content\/uploads\/2023\/08\/download.jpg","width":600,"height":60}},"image":{"@type":"ImageObject","@id":"https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/a601995d55609f2d9f5e233e36fbe9ea26011b3b","url":"https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/a601995d55609f2d9f5e233e36fbe9ea26011b3b","height":"","width":""},"url":"https:\/\/wiki.edu.vn\/en\/wiki40\/low-rank-matrix-approximations-wikipedia\/","wordCount":19061,"articleBody":"     (adsbygoogle = window.adsbygoogle || []).push({});before-content-x4Low-rank matrix approximations are essential tools in the application of kernel methods to large-scale learning problems.[1]       (adsbygoogle = window.adsbygoogle || []).push({});after-content-x4Kernel methods (for instance, support vector machines or Gaussian processes[2]) project data points into a high-dimensional or infinite-dimensional feature space and find the optimal splitting hyperplane. In the kernel method the data is represented in a kernel matrix (or, Gram matrix). Many algorithms can solve machine learning problems using the kernel matrix. The main problem of kernel method is its high computational cost associated with kernel matrices. The cost is at least quadratic in the number of training data points, but most kernel methods include computation of matrix inversion or eigenvalue decomposition and the cost becomes cubic in the number of training data. Large training sets cause large storage and computational costs. Despite low rank decomposition methods (Cholesky decomposition) reduce this cost, they continue to require computing the kernel matrix.  One of the approaches to deal with this problem is low-rank matrix approximations. The most popular examples of them are Nystr\u00f6m method and the random features. Both of them have been successfully applied to efficient kernel learning.Table of Contents       (adsbygoogle = window.adsbygoogle || []).push({});after-content-x4Nystr\u00f6m approximation[edit]Theorem for kernel approximation[edit]Proof[edit]Singular-value decomposition application[edit]Further proof[edit]General theorem for kernel approximation for a feature map[edit]Application for regularized least squares[edit]Randomized feature maps approximation[edit]Random Fourier features[edit]Random binning features[edit]Comparison of approximation methods[edit]See also[edit]External links[edit]References[edit]Nystr\u00f6m approximation[edit]Kernel methods become unfeasible when the number of points n{displaystyle n} is so large such that the kernel matrix K^{displaystyle {hat {K}}} cannot be stored in memory.If        (adsbygoogle = window.adsbygoogle || []).push({});after-content-x4n{displaystyle n} is the number of training examples, the storage and computational cost required to find the solution of the problem using general kernel method is O(n2){displaystyle O(n^{2})} and O(n3){displaystyle O(n^{3})} respectively. The Nystr\u00f6m approximation can allow a significant speed-up of the computations.[2][3] This speed-up is achieved by using, instead of the kernel matrix, its approximation K~{displaystyle {tilde {K}}} of rank q{displaystyle q}. An advantage of the method is that it is not necessary to compute or store the whole kernel matrix, but only a submatrix of size q\u00d7n{displaystyle qtimes n}.It reduces the storage and complexity requirements to O(nq){displaystyle O(nq)} and O(nq2){displaystyle O(nq^{2})} respectively.The method is named &#8220;Nystr\u00f6m approximation&#8221; because it can be interpreted as a case of the Nystr\u00f6m method from integral equation theory.[3]Theorem for kernel approximation[edit]K^{displaystyle {hat {K}}} is a kernel matrix for some kernel method. Consider the first q=K^n,qK^q\u22121K^n,qT{displaystyle {tilde {K}}={hat {K}}_{n,q}{hat {K}}_{q}^{-1}{hat {K}}_{n,q}^{text{T}}} , where(K^q)i,j=K(xi,xj),i,j=1,\u2026,q{displaystyle ({hat {K}}_{q})_{i,j}=K(x_{i},x_{j}),i,j=1,dots ,q} ,K^q{displaystyle {hat {K}}_{q}} is invertible matrixand(K^n,q)i,j=K(xi,xj),i=1,\u2026,n\u00a0and\u00a0j=1,\u2026,q.{displaystyle ({hat {K}}_{n,q})_{i,j}=K(x_{i},x_{j}),i=1,dots ,n{text{ and }}j=1,dots ,q.}Proof[edit]Singular-value decomposition application[edit]Applying singular-value decomposition (SVD) to matrix A{displaystyle A} with dimensions p\u00d7m{displaystyle ptimes m} produces a singular system  consisting of singular values "},{"@context":"http:\/\/schema.org\/","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"item":{"@id":"https:\/\/wiki.edu.vn\/en\/wiki40\/#breadcrumbitem","name":"Enzyklop\u00e4die"}},{"@type":"ListItem","position":2,"item":{"@id":"https:\/\/wiki.edu.vn\/en\/wiki40\/low-rank-matrix-approximations-wikipedia\/#breadcrumbitem","name":"Low-rank matrix approximations &#8211; Wikipedia"}}]}]