G-test

In statistics, G-tests are likelihood-ratio or maximum likelihood statistical significance tests that are increasingly being used in situations where chi-squared tests were previously recommended.There is nothing magical about a sample size of 1000, it's just a nice round number that is well within the range where an exact test, chi-square test and G–test will give almost identical P values. Spreadsheets, web-page calculators, and SAS shouldn't have any problem doing an exact test on a sample size of 1000. In statistics, G-tests are likelihood-ratio or maximum likelihood statistical significance tests that are increasingly being used in situations where chi-squared tests were previously recommended. The general formula for G is where O i ≥ 0 { extstyle O_{i}geq 0} is the observed count in a cell, E i > 0 { extstyle E_{i}>0} is the expected count under the null hypothesis, ln { extstyle ln } denotes the natural logarithm, and the sum is taken over all non-empty cells. Furthermore, the total observed count should be equal to the total expected count: G-tests have been recommended at least since the 1981 edition of Biometry, a statistics textbook by Robert R. Sokal and F. James Rohlf. We can derive the value of the G-test from the log-likelihood ratio test where the underlying model is a multinomial model. Suppose we had a sample x = ( x 1 , … , x m ) { extstyle x=(x_{1},ldots ,x_{m})} where each x i { extstyle x_{i}} is the number of times that an object of type i { extstyle i} was observed. Furthermore, let n = ∑ i = 1 m x i { extstyle n=sum _{i=1}^{m}x_{i}} be the total number of objects observed. If we assume that the underlying model is multinomial, then the test statistic is defined by G = − 2 ∑ i = 1 m O i ln ⁡ ( E i O i ) = 2 ∑ i = 1 m O i ln ⁡ ( O i E i ) {displaystyle {egin{alignedat}{2}G&=&;-2sum _{i=1}^{m}O_{i}ln left({frac {E_{i}}{O_{i}}} ight)\&=&2sum _{i=1}^{m}O_{i}ln left({frac {O_{i}}{E_{i}}} ight)end{alignedat}}} Given the null hypothesis that the observed frequencies result from random sampling from a distribution with the given expected frequencies, the distribution of G is approximately a chi-squared distribution, with the same number of degrees of freedom as in the corresponding chi-squared test. For very small samples the multinomial test for goodness of fit, and Fisher's exact test for contingency tables, or even Bayesian hypothesis selection are preferable to the G-test. McDonald recommends to always use an exact test (exact test of goodness-of-fit, Fisher's exact test) if the total sample size is less than 1000.

Parent Topic

Child Topic

No Parent Topic