P–P plot

In statistics, a P–P plot (probability–probability plot or percent–percent plot or P value plot) is a probability plot for assessing how closely two data sets agree, which plots the two cumulative distribution functions against each other. P-P plots are vastly used to evaluate the skewness of a distribution. The Q–Q plot is more widely used, but they are both referred to as 'the' probability plot, and are potentially confused. A P–P plot plots two cumulative distribution functions (cdfs) against each other:given two probability distributions, with cdfs 'F' and 'G', it plots ( F ( z ) , G ( z ) ) {displaystyle (F(z),G(z))} as z ranges from − ∞ {displaystyle -infty } to ∞ . {displaystyle infty .} As a cdf has range , the domain of this parametric graph is ( − ∞ , ∞ ) {displaystyle (-infty ,infty )} and the range is the unit square [ 0 , 1 ] × [ 0 , 1 ] . {displaystyle imes .} Thus for input z the output is the pair of numbers giving what percentage of f and what percentage of g fall at or below z. The comparison line is the 45° line from (0,0) to (1,1) – the distributions are equal if and only if the plot falls on this line – any deviation indicates a difference between the distributions. As an example, if the two distributions do not overlap, say F is below G, then the P–P plot will move from left to right along the bottom of the square – as z moves through the support of F, the cdf of F goes from 0 to 1, while the cdf of G stays at 0 – and then moves up the right side of the square – the cdf of F is now 1, as all points of F lie below all points of G, and now the cdf of G moves from 0 to 1 as z moves through the support of G. (need a graph for this paragraph) As the above example illustrates, if two distributions are separated in space, the P–P plot will give very little data – it is only useful for comparing probability distributions that have nearby or equal location. Notably, it will pass through the point (1/2, 1/2) if and only if the two distributions have the same median. P–P plots are sometimes limited to comparisons between two samples, rather than comparison of a sample to a theoretical model distribution. However, they are of general use, particularly where observations are not all modelled with the same distribution.

Parent Topic

Child Topic

No Parent Topic