[{"@context":"http:\/\/schema.org\/","@type":"BlogPosting","@id":"https:\/\/wiki.edu.vn\/en\/wiki24\/q-q-plot-wikipedia\/#BlogPosting","mainEntityOfPage":"https:\/\/wiki.edu.vn\/en\/wiki24\/q-q-plot-wikipedia\/","headline":"Q\u2013Q plot – Wikipedia","name":"Q\u2013Q plot – Wikipedia","description":"before-content-x4 Plot of the empirical distribution of p-values against the theoretical one A normal Q\u2013Q plot of randomly generated, independent","datePublished":"2021-10-18","dateModified":"2021-10-18","author":{"@type":"Person","@id":"https:\/\/wiki.edu.vn\/en\/wiki24\/author\/lordneo\/#Person","name":"lordneo","url":"https:\/\/wiki.edu.vn\/en\/wiki24\/author\/lordneo\/","image":{"@type":"ImageObject","@id":"https:\/\/secure.gravatar.com\/avatar\/c9645c498c9701c88b89b8537773dd7c?s=96&d=mm&r=g","url":"https:\/\/secure.gravatar.com\/avatar\/c9645c498c9701c88b89b8537773dd7c?s=96&d=mm&r=g","height":96,"width":96}},"publisher":{"@type":"Organization","name":"Enzyklop\u00e4die","logo":{"@type":"ImageObject","@id":"https:\/\/wiki.edu.vn\/wiki4\/wp-content\/uploads\/2023\/08\/download.jpg","url":"https:\/\/wiki.edu.vn\/wiki4\/wp-content\/uploads\/2023\/08\/download.jpg","width":600,"height":60}},"image":{"@type":"ImageObject","@id":"https:\/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/1\/11\/Normal_exponential_qq.svg\/300px-Normal_exponential_qq.svg.png","url":"https:\/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/1\/11\/Normal_exponential_qq.svg\/300px-Normal_exponential_qq.svg.png","height":"263","width":"300"},"url":"https:\/\/wiki.edu.vn\/en\/wiki24\/q-q-plot-wikipedia\/","wordCount":6727,"articleBody":" (adsbygoogle = window.adsbygoogle || []).push({});before-content-x4Plot of the empirical distribution of p-values against the theoretical one A normal Q\u2013Q plot of randomly generated, independent standard exponential data, (X ~ Exp(1)). This Q\u2013Q plot compares a sample of data on the vertical axis to a statistical population on the horizontal axis. The points follow a strongly nonlinear pattern, suggesting that the data are not distributed as a standard normal (X ~ N(0,1)). The offset between the line and the points suggests that the mean of the data is not 0. The median of the points can be determined to be near 0.7 A normal Q\u2013Q plot comparing randomly generated, independent standard normal data on the vertical axis to a standard normal population on the horizontal axis. The linearity of the points suggests that the data are normally distributed. (adsbygoogle = window.adsbygoogle || []).push({});after-content-x4A Q\u2013Q plot of a sample of data versus a Weibull distribution. The deciles of the distributions are shown in red. Three outliers are evident at the high end of the range. Otherwise, the data fit the Weibull(1,2) model well. A Q\u2013Q plot comparing the distributions of standardized daily maximum temperatures at 25 stations in the US state of Ohio in March and in July. The curved pattern suggests that the central quantiles are more closely spaced in July than in March, and that the July distribution is skewed to the left compared to the March distribution. The data cover the period 1893\u20132001. (adsbygoogle = window.adsbygoogle || []).push({});after-content-x4In statistics, a Q\u2013Q plot (quantile-quantile plot) is a probability plot, a graphical method for comparing two probability distributions by plotting their quantiles against each other.[1] A point (x, y) on the plot corresponds to one of the quantiles of the second distribution (y-coordinate) plotted against the same quantile of the first distribution (x-coordinate). This defines a parametric curve where the parameter is the index of the quantile interval.If the two distributions being compared are similar, the points in the Q\u2013Q plot will approximately lie on the identity line y = x. If the distributions are linearly related, the points in the Q\u2013Q plot will approximately lie on a line, but not necessarily on the line y = x. Q\u2013Q plots can also be used as a graphical means of estimating parameters in a location-scale family of distributions.A Q\u2013Q plot is used to compare the shapes of distributions, providing a graphical view of how properties such as location, scale, and skewness are similar or different in the two distributions. Q\u2013Q plots can be used to compare collections of data, or theoretical distributions. The use of Q\u2013Q plots to compare two samples of data can be viewed as a non-parametric approach to comparing their underlying distributions. A Q\u2013Q plot is generally more diagnostic than comparing the samples’ histograms, but is less widely known. Q\u2013Q plots are commonly used to compare a data set to a theoretical model.[2][3] This can provide an assessment of goodness of fit that is graphical, rather than reducing to a numerical summary statistic. Q\u2013Q plots are also used to compare two theoretical distributions to each other.[4] Since Q\u2013Q plots compare distributions, there is no need for the values to be observed as pairs, as in a scatter plot, or even for the numbers of values in the two groups being compared to be equal. (adsbygoogle = window.adsbygoogle || []).push({});after-content-x4The term “probability plot” sometimes refers specifically to a Q\u2013Q plot, sometimes to a more general class of plots, and sometimes to the less commonly used P\u2013P plot. The probability plot correlation coefficient plot (PPCC plot) is a quantity derived from the idea of Q\u2013Q plots, which measures the agreement of a fitted distribution with observed data and which is sometimes used as a means of fitting a distribution to data.Table of ContentsDefinition and construction[edit]Interpretation[edit]Plotting positions[edit]Expected value of the order statistic for a uniform distribution[edit]Expected value of the order statistic for a standard normal distribution[edit]Median of the order statistics[edit]Heuristics[edit]Filliben’s estimate[edit]See also[edit]References[edit]Citations[edit]Sources[edit]External links[edit]Definition and construction[edit] Q\u2013Q plot for first opening\/final closing dates of Washington State Route 20, versus a normal distribution.[5] Outliers are visible in the upper right corner.A Q\u2013Q plot is a plot of the quantiles of two distributions against each other, or a plot based on estimates of the quantiles. The pattern of points in the plot is used to compare the two distributions.The main step in constructing a Q\u2013Q plot is calculating or estimating the quantiles to be plotted. If one or both of the axes in a Q\u2013Q plot is based on a theoretical distribution with a continuous cumulative distribution function (CDF), all quantiles are uniquely defined and can be obtained by inverting the CDF. If a theoretical probability distribution with a discontinuous CDF is one of the two distributions being compared, some of the quantiles may not be defined, so an interpolated quantile may be plotted. If the Q\u2013Q plot is based on data, there are multiple quantile estimators in use. Rules for forming Q\u2013Q plots when quantiles must be estimated or interpolated are called plotting positions.A simple case is where one has two data sets of the same size. In that case, to make the Q\u2013Q plot, one orders each set in increasing order, then pairs off and plots the corresponding values. A more complicated construction is the case where two data sets of different sizes are being compared. To construct the Q\u2013Q plot in this case, it is necessary to use an interpolated quantile estimate so that quantiles corresponding to the same underlying probability can be constructed.More abstractly,[4] given two cumulative probability distribution functions F and G, with associated quantile functions F\u22121 and G\u22121 (the inverse function of the CDF is the quantile function), the Q\u2013Q plot draws the q-th quantile of F against the q-th quantile of G for a range of values of q. Thus, the Q\u2013Q plot is a parametric curve indexed over [0,1] with values in the real plane R2.Interpretation[edit]The points plotted in a Q\u2013Q plot are always non-decreasing when viewed from left to right. If the two distributions being compared are identical, the Q\u2013Q plot follows the 45\u00b0 line y = x. If the two distributions agree after linearly transforming the values in one of the distributions, then the Q\u2013Q plot follows some line, but not necessarily the line y = x. If the general trend of the Q\u2013Q plot is flatter than the line y = x, the distribution plotted on the horizontal axis is more dispersed than the distribution plotted on the vertical axis. Conversely, if the general trend of the Q\u2013Q plot is steeper than the line y = x, the distribution plotted on the vertical axis is more dispersed than the distribution plotted on the horizontal axis. Q\u2013Q plots are often arced, or “S” shaped, indicating that one of the distributions is more skewed than the other, or that one of the distributions has heavier tails than the other.Although a Q\u2013Q plot is based on quantiles, in a standard Q\u2013Q plot it is not possible to determine which point in the Q\u2013Q plot determines a given quantile. For example, it is not possible to determine the median of either of the two distributions being compared by inspecting the Q\u2013Q plot. Some Q\u2013Q plots indicate the deciles to make determinations such as this possible.The intercept and slope of a linear regression between the quantiles gives a measure of the relative location and relative scale of the samples. If the median of the distribution plotted on the horizontal axis is 0, the intercept of a regression line is a measure of location, and the slope is a measure of scale. The distance between medians is another measure of relative location reflected in a Q\u2013Q plot. The “probability plot correlation coefficient” (PPCC plot) is the correlation coefficient between the paired sample quantiles. The closer the correlation coefficient is to one, the closer the distributions are to being shifted, scaled versions of each other. For distributions with a single shape parameter, the probability plot correlation coefficient plot provides a method for estimating the shape parameter \u2013 one simply computes the correlation coefficient for different values of the shape parameter, and uses the one with the best fit, just as if one were comparing distributions of different types.Another common use of Q\u2013Q plots is to compare the distribution of a sample to a theoretical distribution, such as the standard normal distribution N(0,1), as in a normal probability plot. As in the case when comparing two samples of data, one orders the data (formally, computes the order statistics), then plots them against certain quantiles of the theoretical distribution.[3]Plotting positions[edit]The choice of quantiles from a theoretical distribution can depend upon context and purpose. One choice, given a sample of size n, is k \/ n for k = 1, \u2026, n, as these are the quantiles that the sampling distribution realizes. The last of these, n \/ n, corresponds to the 100th percentile \u2013 the maximum value of the theoretical distribution, which is sometimes infinite. Other choices are the use of (k \u2212 0.5) \/ n, or instead to space the points evenly in the uniform distribution, using k \/ (n + 1).[6]Many other choices have been suggested, both formal and heuristic, based on theory or simulations relevant in context. The following subsections discuss some of these. A narrower question is choosing a maximum (estimation of a population maximum), known as the German tank problem, for which similar “sample maximum, plus a gap” solutions exist, most simply m + m\/n \u2212 1. A more formal application of this uniformization of spacing occurs in maximum spacing estimation of parameters.Expected value of the order statistic for a uniform distribution[edit]The k \/ (n + 1) approach equals that of plotting the points according to the probability that the last of (n + 1) randomly drawn values will not exceed the k-th smallest of the first n randomly drawn values.[7][8]Expected value of the order statistic for a standard normal distribution[edit]In using a normal probability plot, the quantiles one uses are the rankits, the quantile of the expected value of the order statistic of a standard normal distribution.More generally, Shapiro\u2013Wilk test uses the expected values of the order statistics of the given distribution; the resulting plot and line yields the generalized least squares estimate for location and scale (from the intercept and slope of the fitted line).[9]Although this is not too important for the normal distribution (the location and scale are estimated by the mean and standard deviation, respectively), it can be useful for many other distributions.However, this requires calculating the expected values of the order statistic, which may be difficult if the distribution is not normal.Median of the order statistics[edit]Alternatively, one may use estimates of the median of the order statistics, which one can compute based on estimates of the median of the order statistics of a uniform distribution and the quantile function of the distribution; this was suggested by (Filliben 1975).[9]This can be easily generated for any distribution for which the quantile function can be computed, but conversely the resulting estimates of location and scale are no longer precisely the least squares estimates, though these only differ significantly for n small.Heuristics[edit]Several different formulas have been used or proposed as affine symmetrical plotting positions. Such formulas have the form (k \u2212 a) \/ (n + 1 \u2212 2a) for some value of a in the range from 0 to 1, which gives a range between k \/ (n + 1) and (k \u2212 1) \/ (n \u2212 1).Expressions include:k \/ (n + 1)(k \u2212 0.3)\u2009\/\u2009(n + 0.4).[10](k \u2212 0.3175)\u2009\/\u2009(n + 0.365).[11][note 1](k \u2212 0.326)\u2009\/\u2009(n + 0.348).[12](k \u2212 \u2153)\u2009\/\u2009(n + \u2153).[note 2](k \u2212 0.375)\u2009\/\u2009(n + 0.25).[note 3](k \u2212 0.4)\u2009\/\u2009(n + 0.2).[13](k \u2212 0.44)\u2009\/\u2009(n + 0.12).[note 4](k \u2212 0.5)\u2009\/\u2009n.[15](k \u2212 0.567)\u2009\/\u2009(n \u2212 0.134).[16](k \u2212 1)\u2009\/\u2009(n \u2212 1).[note 5]For large sample size, n, there is little difference between these various expressions.Filliben’s estimate[edit]The order statistic medians are the medians of the order statistics of the distribution. These can be expressed in terms of the quantile function and the order statistic medians for the continuous uniform distribution by:N(i)=G(U(i)){displaystyle N(i)=G(U(i))}where U(i) are the uniform order statistic medians and G is the quantile function for the desired distribution. The quantile function is the inverse of the cumulative distribution function (probability that X is less than or equal to some value). That is, given a probability, we want the corresponding quantile of the cumulative distribution function.James J. Filliben (Filliben 1975) uses the following estimates for the uniform order statistic medians:m(i)={1\u22120.51\/ni=1i\u22120.3175n+0.365i=2,3,\u2026,n\u221210.51\/ni=n.{displaystyle m(i)={begin{cases}1-0.5^{1\/n}&i=1\\\\{dfrac {i-0.3175}{n+0.365}}&i=2,3,ldots ,n-1\\\\0.5^{1\/n}&i=n.end{cases}}}The reason for this estimate is that the order statistic medians do not have a simple form.See also[edit]^ Note that this also uses a different expression for the first & last points. [1] cites the original work by (Filliben 1975). This expression is an estimate of the medians of U(k).^ A simple (and easy to remember) formula for plotting positions; used in BMDP statistical package.^ This is (Blom 1958)’s earlier approximation and is the expression used in MINITAB.^ This plotting position was used by Irving I. Gringorten[14] to plot points in tests for the Gumbel distribution.^ Used by Filliben (1975), these plotting points are equal to the modes of U(k).References[edit]Citations[edit]^ Wilk, M.B.; Gnanadesikan, R. (1968), “Probability plotting methods for the analysis of data”, Biometrika, Biometrika Trust, 55 (1): 1\u201317, doi:10.1093\/biomet\/55.1.1, JSTOR\u00a02334448, PMID\u00a05661047.^ Gnanadesikan (1977) p199.^ a b (Thode 2002, Section 2.2.2, Quantile-Quantile Plots, p. 21)^ a b (Gibbons & Chakraborti 2003, p. 144)^ “SR 20 \u2013 North Cascades Highway \u2013 Opening and Closing History”. North Cascades Passes. Washington State Department of Transportation. October 2009. Retrieved 8 February 2009.^ Weibull, Waloddi (1939), “The Statistical Theory of the Strength of Materials”, IVA Handlingar, Royal Swedish Academy of Engineering Sciences (151)^ Madsen, H.O.; et\u00a0al. (1986), Methods of Structural Safety^ Makkonen, L. (2008), “Bringing closure to the plotting position controversy”, Communications in Statistics \u2013 Theory and Methods, 37 (3): 460\u2013467, doi:10.1080\/03610920701653094, S2CID\u00a0122822135^ a b Testing for Normality, by Henry C. Thode, CRC Press, 2002, ISBN\u00a0978-0-8247-9613-6, p. 31^ Benard & Bos-Levenbach (1953) harvtxt error: no target: CITEREFBenardBos-Levenbach1953 (help). The plotting of observations on probability paper. Statistica Neederlandica, 7: 163-173. doi:10.1111\/j.1467-9574.1953.tb00821.x. (in Dutch)^ “1.3.3.21. Normal Probability Plot”. itl.nist.gov. Retrieved 16 February 2022.^ Distribution free plotting position, Yu & Huang^ Cunnane (1978) harvtxt error: no target: CITEREFCunnane1978 (help).^ Gringorten, Irving I. (1963). “A plotting rule for extreme probability paper”. Journal of Geophysical Research. 68 (3): 813\u2013814. Bibcode:1963JGR….68..813G. doi:10.1029\/JZ068i003p00813. ISSN\u00a02156-2202.^ Hazen, Allen (1914), “Storage to be provided in the impounding reservoirs for municipal water supply”, Transactions of the American Society of Civil Engineers (77): 1547\u20131550^ Larsen, Curran & Hunt (1980) harvtxt error: no target: CITEREFLarsenCurranHunt1980 (help).Sources[edit]\u00a0This article incorporates public domain material from the National Institute of Standards and Technology.Blom, G. (1958), Statistical estimates and transformed beta variables, New York: John Wiley and SonsChambers, John; William Cleveland; Beat Kleiner; Paul Tukey (1983), Graphical methods for data analysis, WadsworthCleveland, W.S. (1994) The Elements of Graphing Data, Hobart Press ISBN\u00a00-9634884-1-4Filliben, J. J. (February 1975), “The Probability Plot Correlation Coefficient Test for Normality”, Technometrics, American Society for Quality, 17 (1): 111\u2013117, doi:10.2307\/1268008, JSTOR\u00a01268008.Gibbons, Jean Dickinson; Chakraborti, Subhabrata (2003), Nonparametric statistical inference (4th\u00a0ed.), CRC Press, ISBN\u00a0978-0-8247-4052-8Gnanadesikan, R. (1977) Methods for Statistical Analysis of Multivariate Observations, Wiley ISBN\u00a00-471-30845-5.Thode, Henry C. (2002), Testing for normality, New York: Marcel Dekker, ISBN\u00a00-8247-9613-6External links[edit]Wikimedia Commons has media related to Q-Q plot. (adsbygoogle = window.adsbygoogle || []).push({});after-content-x4"},{"@context":"http:\/\/schema.org\/","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"item":{"@id":"https:\/\/wiki.edu.vn\/en\/wiki24\/#breadcrumbitem","name":"Enzyklop\u00e4die"}},{"@type":"ListItem","position":2,"item":{"@id":"https:\/\/wiki.edu.vn\/en\/wiki24\/q-q-plot-wikipedia\/#breadcrumbitem","name":"Q\u2013Q plot – Wikipedia"}}]}]