pytesmo.metrics.pairwise module
Pairwise metrics and analytical confidence intervals.
Metrics
The metrics function implemented here all have the signature:
def metric(x : np.ndarray, y : np.ndarray) -> float
Confidence intervals
Formulas for confidence intervals have in general been taken from from Gilleland (2010), 10.5065/D6WD3XJM, https://opensky.ucar.edu/islandora/object/technotes:491
Other references are cited in the docstring of the respective function.
Analytical confidence interval functions implemented here are named
<metric>_ci
, e.g. for bias
, the CI function is bias_ci
.
The signature is be:
def metric_ci(x : np.ndarray, y : np.ndarray, m : float,
alpha=0.05 : float) -> float, float
where m is the metric value that has been calculated for x and y.
Typically, you should use
pytesmo.metrics.confidence_intervals.with_analytical_ci()
for
calculating a metric CI.
- pytesmo.metrics.pairwise.aad(x, y)[source]
Average (=mean) absolute deviation (AAD).
- Parameters:
x (numpy.ndarray) – First input vector.
y (numpy.ndarray) – Second input vector.
- Returns:
d – Mean absolute deviation.
- Return type:
- pytesmo.metrics.pairwise.bias_ci(x, y, b, alpha=0.05)[source]
Confidence interval for bias.
The confidence interval is the same as the confidence interval for a mean.
- Parameters:
x (numpy.ndarray) – First input vector.
y (numpy.ndarray) – Second input vector.
b (float) – bias
alpha (float, optional) – 1 - confidence level, default is 0.05
- Returns:
lower, upper – Lower and upper confidence interval bounds.
- Return type:
- pytesmo.metrics.pairwise.index_of_agreement(o, p)[source]
Index of agreement was proposed by Willmot (1981), to overcome the insenstivity of Nash-Sutcliffe efficiency E and R^2 to differences in the observed and predicted means and variances (Legates and McCabe, 1999). The index of agreement represents the ratio of the mean square error and the potential error (Willmot, 1984). The potential error in the denominator represents the largest value that the squared difference of each pair can attain. The range of d is similar to that of R^2 and lies between 0 (no correlation) and 1 (perfect fit).
- Parameters:
o (numpy.ndarray) – Observations.
p (numpy.ndarray) – Predictions.
- Returns:
d – Index of agreement.
- Return type:
- pytesmo.metrics.pairwise.kendall_tau(x, y)[source]
Wrapper for scipy.stats.kendalltau
- Parameters:
x (numpy.array) – First input vector.
y (numpy.array) – Second input vector.
- Returns:
tau – Kendall’s tau statistic
- Return type:
See also
- pytesmo.metrics.pairwise.kendall_tau_ci(x, y, tau, alpha=0.05)[source]
Confidence intervall for Kendall’s rank coefficient.
- Parameters:
x (numpy.ndarray) – First input vector
y (numpy.ndarray) – Second input vector
tau (float) – Kendall tau for this data
alpha (float, optional) – 1 - confidence level, default is 0.05
- Returns:
lower, upper – Lower and upper confidence interval bounds.
- Return type:
References
Bonett, D. G., & Wright, T. A. (2000). Sample size requirements for estimating Pearson, Kendall and Spearman correlations. Psychometrika, 65(1), 23-28.
- pytesmo.metrics.pairwise.mad(x, y)[source]
Median absolute deviation (MAD).
- Parameters:
x (numpy.ndarray) – First input vector.
y (numpy.ndarray) – Second input vector.
- Returns:
d – Median absolute deviation.
- Return type:
- pytesmo.metrics.pairwise.msd(x, y)[source]
Mean square deviation/mean square error.
For validation, MSD (same as MSE) is defined as
..math:
MSD = \frac{1}{n}\sum\limits_{i=1}^n (x_i - y_i)^2
MSD can be decomposed into a term describing the deviation of x and y attributable to non-perfect correlation (r < 1), a term depending on the difference in variances between x and y, and the difference in means between x and y (bias).
..math:
MSD &= MSD_{corr} + MSD_{var} + MSD_{bias}\\ &= 2\sigma_x\sigma_y (1-r) + (\sigma_x - \sigma_y)^2 + (\mu_x - \mu_y)^2
This function calculates the full MSD, the function msd_corr, msd_var, and msd_bias can be used to calculate the individual components.
- Parameters:
x (numpy.ndarray) – First input vector
y (numpy.ndarray) – Second input vector
- Returns:
msd – Mean square deviation
- Return type:
- pytesmo.metrics.pairwise.nash_sutcliffe(o, p)[source]
Nash Sutcliffe model efficiency coefficient E. The range of E lies between 1.0 (perfect fit) and -inf.
- Parameters:
o (numpy.ndarray) – Observations.
p (numpy.ndarray) – Predictions.
- Returns:
E – Nash Sutcliffe model efficiency coefficient E.
- Return type:
- pytesmo.metrics.pairwise.nrmsd(x, y, ddof=0)[source]
Normalized root-mean-square deviation (nRMSD).
This is normalizes RMSD by
max(max(x), max(y)) - min(min(x), min(y))
.- Parameters:
x (numpy.ndarray) – First input vector.
y (numpy.ndarray) – Second input vector.
ddof (int, optional) – Delta degree of freedom.The divisor used in calculations is N - ddof, where N represents the number of elements. By default ddof is zero. DEPRECATED: ddof is deprecated and might be removed in future versions.
- Returns:
nrmsd – Normalized root-mean-square deviation (nRMSD).
- Return type:
- pytesmo.metrics.pairwise.pearson_r(x, y)[source]
Pearson’s linear correlation coefficient.
- Parameters:
x (numpy.ndarray) – First input vector.
y (numpy.ndarray) – Second input vector.
- Returns:
r – Pearson’s correlation coefficent.
- Return type:
See also
- pytesmo.metrics.pairwise.pearson_r_ci(x, y, r, alpha=0.05)[source]
Confidence interval for Pearson correlation coefficient.
- Parameters:
x (numpy.ndarray) – First input vector
y (numpy.ndarray) – Second input vector
r (float) – Pearson r for this data
alpha (float, optional) – 1 - confidence level, default is 0.05
- Returns:
lower, upper – Lower and upper confidence interval bounds.
- Return type:
References
Bonett, D. G., & Wright, T. A. (2000). Sample size requirements for estimating Pearson, Kendall and Spearman correlations. Psychometrika, 65(1), 23-28.
- pytesmo.metrics.pairwise.rmsd(x, y, ddof=0)[source]
Root-mean-square deviation (RMSD).
This is the root of MSD (see
pytesmo.metrics.msd()
). If x and y have the same mean (i.e. mean(x - y = 0) RMSD corresponds to the square root of the variance of x - y.- Parameters:
x (numpy.ndarray) – First input vector.
y (numpy.ndarray) – Second input vector.
ddof (int, optional, DEPRECATED) – Delta degree of freedom.The divisor used in calculations is N - ddof, where N represents the number of elements. By default ddof is zero. DEPRECATED: ddof is deprecated and might be removed in future versions.
- Returns:
rmsd – Root-mean-square deviation.
- Return type:
- pytesmo.metrics.pairwise.spearman_r(x, y)[source]
Spearman’s rank correlation coefficient.
- Parameters:
x (numpy.array) – First input vector.
y (numpy.array) – Second input vector.
- Returns:
rho – Spearman correlation coefficient
- Return type:
See also
scipy.stats.spearmenr
- pytesmo.metrics.pairwise.spearman_r_ci(x, y, r, alpha=0.05)[source]
Confidence interval for Spearman rank correlation coefficient.
- Parameters:
x (numpy.ndarray) – First input vector
y (numpy.ndarray) – Second input vector
r (float) – Spearman’s r for this data
alpha (float, optional) – 1 - confidence level, default is 0.05
- Returns:
lower, upper – Lower and upper confidence interval bounds.
- Return type:
References
Bonett, D. G., & Wright, T. A. (2000). Sample size requirements for estimating Pearson, Kendall and Spearman correlations. Psychometrika, 65(1), 23-28.
- pytesmo.metrics.pairwise.ubrmsd(x, y, ddof=0)[source]
Unbiased root-mean-square deviation (uRMSD).
This corresponds to RMSD with mean biases removed beforehand, that is
..math:
ubRMSD = \sqrt{\frac{1}{n}\sum\limits_{i=1}^n \left((x - \bar{x}) - (y - \bar{y}))^2}
NOTE: If you are scaling the data beforehand to have zero mean bias, this is exactly the same as RMSD.
- Parameters:
x (numpy.ndarray) – First input vector.
y (numpy.ndarray) – Second input vector.
ddof (int, optional) – Delta degree of freedom.The divisor used in calculations is N - ddof, where N represents the number of elements. By default ddof is zero. DEPRECATED: ddof is deprecated and might be removed in future versions.
- Returns:
ubrmsd – Unbiased root-mean-square deviation (uRMSD).
- Return type:
- pytesmo.metrics.pairwise.ubrmsd_ci(x, y, ubrmsd, alpha=0.05)[source]
Confidende interval for unbiased root-mean-square deviation (uRMSD).
- Parameters:
x (numpy.ndarray) – First input vector
y (numpy.ndarray) – Second input vector
ubrmsd (float) – ubRMSD for this data
alpha (float, optional) – 1 - confidence level, default is 0.05
- Returns:
lower, upper – Lower and upper confidence interval bounds.
- Return type: