statsmodels.stats.gof.powerdiscrepancy
-
statsmodels.stats.gof.powerdiscrepancy(observed, expected, lambd=0.0, axis=0, ddof=0)
[source] -
Calculates power discrepancy, a class of goodness-of-fit tests as a measure of discrepancy between observed and expected data.
This contains several goodness-of-fit tests as special cases, see the describtion of lambd, the exponent of the power discrepancy. The pvalue is based on the asymptotic chi-square distribution of the test statistic.
freeman_tukey: D(x| heta) = sum_j (sqrt{x_j} - sqrt{e_j})^2
Parameters: o : Iterable
Observed values
e : Iterable
Expected values
lambd : float or string
- float : exponent
a
for power discrepancy - ?loglikeratio?: a = 0
- ?freeman_tukey?: a = -0.5
- ?pearson?: a = 1 (standard chisquare test statistic)
- ?modified_loglikeratio?: a = -1
- ?cressie_read?: a = 2/3
- ?neyman? : a = -2 (Neyman-modified chisquare, reference from a book?)
axis : int
axis for observations of one series
ddof : int
degrees of freedom correction,
Returns: D_obs : Discrepancy of observed values
pvalue : pvalue
References
- Cressie, Noel and Timothy R. C. Read, Multinomial Goodness-of-Fit Tests,
- Journal of the Royal Statistical Society. Series B (Methodological), Vol. 46, No. 3 (1984), pp. 440-464
- Campbell B. Read: Freeman-Tukey chi-squared goodness-of-fit statistics,
- Statistics & Probability Letters 18 (1993) 271-278
- Nobuhiro Taneichi, Yuri Sekiya, Akio Suzukawa, Asymptotic Approximations
- for the Distributions of the Multinomial Goodness-of-Fit Statistics under Local Alternatives, Journal of Multivariate Analysis 81, 335?359 (2002)
- Steele, M. 1,2, C. Hurst 3 and J. Chaseling, Simulated Power of Discrete
- Goodness-of-Fit Tests for Likert Type Data
Examples
12>>> observed
=
np.array([
2.
,
4.
,
2.
,
1.
,
1.
])
>>> expected
=
np.array([
0.2
,
0.2
,
0.2
,
0.2
,
0.2
])
for checking correct dimension with multiple series
1234567891011121314>>> powerdiscrepancy(np.column_stack((observed,observed)).T,
10
*
expected, lambd
=
'freeman_tukey'
,axis
=
1
)
(array([[
2.745166
,
2.745166
]]), array([[
0.6013346
,
0.6013346
]]))
>>> powerdiscrepancy(np.column_stack((observed,observed)).T,
10
*
expected,axis
=
1
)
(array([[
2.77258872
,
2.77258872
]]), array([[
0.59657359
,
0.59657359
]]))
>>> powerdiscrepancy(np.column_stack((observed,observed)).T,
10
*
expected, lambd
=
0
,axis
=
1
)
(array([[
2.77258872
,
2.77258872
]]), array([[
0.59657359
,
0.59657359
]]))
>>> powerdiscrepancy(np.column_stack((observed,observed)).T,
10
*
expected, lambd
=
1
,axis
=
1
)
(array([[
3.
,
3.
]]), array([[
0.5578254
,
0.5578254
]]))
>>> powerdiscrepancy(np.column_stack((observed,observed)).T,
10
*
expected, lambd
=
2
/
3.0
,axis
=
1
)
(array([[
2.89714546
,
2.89714546
]]), array([[
0.57518277
,
0.57518277
]]))
>>> powerdiscrepancy(np.column_stack((observed,observed)).T, expected, lambd
=
2
/
3.0
,axis
=
1
)
(array([[
2.89714546
,
2.89714546
]]), array([[
0.57518277
,
0.57518277
]]))
>>> powerdiscrepancy(np.column_stack((observed,observed)), expected, lambd
=
2
/
3.0
, axis
=
0
)
(array([[
2.89714546
,
2.89714546
]]), array([[
0.57518277
,
0.57518277
]]))
each random variable can have different total count/sum
123456789101112>>> powerdiscrepancy(np.column_stack((observed,
2
*
observed)), expected, lambd
=
2
/
3.0
, axis
=
0
)
(array([[
2.89714546
,
5.79429093
]]), array([[
0.57518277
,
0.21504648
]]))
>>> powerdiscrepancy(np.column_stack((observed,
2
*
observed)), expected, lambd
=
2
/
3.0
, axis
=
0
)
(array([[
2.89714546
,
5.79429093
]]), array([[
0.57518277
,
0.21504648
]]))
>>> powerdiscrepancy(np.column_stack((
2
*
observed,
2
*
observed)), expected, lambd
=
2
/
3.0
, axis
=
0
)
(array([[
5.79429093
,
5.79429093
]]), array([[
0.21504648
,
0.21504648
]]))
>>> powerdiscrepancy(np.column_stack((
2
*
observed,
2
*
observed)),
20
*
expected, lambd
=
2
/
3.0
, axis
=
0
)
(array([[
5.79429093
,
5.79429093
]]), array([[
0.21504648
,
0.21504648
]]))
>>> powerdiscrepancy(np.column_stack((observed,
2
*
observed)), np.column_stack((
10
*
expected,
20
*
expected)), lambd
=
2
/
3.0
, axis
=
0
)
(array([[
2.89714546
,
5.79429093
]]), array([[
0.57518277
,
0.21504648
]]))
>>> powerdiscrepancy(np.column_stack((observed,
2
*
observed)), np.column_stack((
10
*
expected,
20
*
expected)), lambd
=
-
1
, axis
=
0
)
(array([[
2.77258872
,
5.54517744
]]), array([[
0.59657359
,
0.2357868
]]))
- float : exponent
Please login to continue.