pytesmo.interpolate namespace

Submodules

pytesmo.interpolate.dctpls module

The following python implementation of the DCT-PLS algorithm (Garcia 2010) for python 3 and is based on:

References:

Garcia, D. (2010) ‘Robust smoothing of gridded data in one and higher dimensions with missing values’, Computational Statistics & Data Analysis, 54(4), pp. 1167–1178. Available at: https://doi.org/10.1016/j.csda.2009.09.020

pytesmo.interpolate.dctpls.RobustWeights(residuals, h, mask=None, method: Literal['cauchy', 'talworth', 'bisquare'] = 'cauchy', simple=True)[source]

Compute weights from residuals.

Parameters:
  • residuals (np.ndarray) – Residuals between previous and current smoothing run. Larger residuals with result in larger weights. NaNs will get weight 0.

  • h (float) – todo

  • mask (np.ndarray, optional (default: None)) – Boolean array of same shape as residuals. True indicates residuals to ignore when creating new weights. By default, all residuals are used.

  • method (str, optional (default: 'cauchy')) – A method to compute the weights. One of: “cauchy”, “talworth”, “bisquare” todo: Note that at the moment, only “cauchy” is implemented.

  • simple (bool, optional (default: True)) – Use numpy absolute to compute the residuals.

Returns:

weights – Weights based on passed residuals and the chosen method. Nans have weight 0.

Return type:

np.ndarray

class pytesmo.interpolate.dctpls.ValueRange(min: float, max: float)[source]

Bases: object

max: float
min: float
pytesmo.interpolate.dctpls.calc_init_guess(y: ndarray, mask: ndarray | None = None, coeff: float | None = 0.1, sampling: float | Iterable[float] | None = None, return_distances: bool | None = True) Tuple[ndarray, ndarray | None][source]

Initial smoothing guess. This also fills any gaps in the data using a a nearest neighbour search. Later on, the initial results will be improved using the DCT-PLS. The sampling parameter allows a different unit for each dimension.

Parameters:
  • y (np.ndarray) – Data (with gaps) to interpolate

  • mask (np.ndarray, optional (default: None)) – Boolean array indicating missing values

  • coeff (float, optional (default: 0.1)) – Percent relative (0-1) of DCT coefficients to use to generate the initial guess. By default, the first 10% are used.

  • sampling (float | Iterable[float] | None, optional (default: None)) – Spacing of elements along each dimension. If a sequence, must be of length equal to the input rank; if a single number, this is used for all axes. If not specified, a grid spacing of unity is implied. —— The default sampling does not prefer any axis (i.e., 1 for all). We can e.g. pass (0.5, 1, 1) for 3D data, to prefer the temporal neighbours over spatial neighbours (assuming that dim=0 refers to different time stamps).

  • return_distances (bool, optional (default: True)) – Return Euclidean distance to the nearest valid data point in addition to initial interpolation and smoothing result.

Returns:

  • z (np.ndarray) – Initial guess of smoothened and interpolated data

  • dist (np.ndarray or None) – Euclidean distance to the nearest observation (only if mask is passed). Only if return_distances=True (otherwise this is None)

pytesmo.interpolate.dctpls.dctNd(data: ndarray, type: int | None = 2, norm: str | None = 'ortho', inverse: bool | None = False) ndarray[source]

Applies discrete cosine transform function to up to 3 dimensions of data. Nans in data are ignored

Parameters:

data: np.ndarray

1, 2, or 3-dimensional data to calculate DCT for.

type: int, optional (default: 2)

DCT Type to use (see scipy DCT docs)

norm: str or None, optional (default: ‘ortho’)

Normalization mode for DCT (see scipy DCT docs)

inverse: bool, optional (default: False)

Use the inverse function.

pytesmo.interpolate.dctpls.gcv(p: float, Lambda: ndarray, DCTy: ndarray, y: ndarray, smoothOrder: int | None = 2, Wtot: ndarray | None = None, score_only: bool | None = False)[source]

Find the best smoothing parameter. I.e. the parameter that minimizes the generalised cross validation (GCV) score.


Parameters:

p: float

Defines the order of smoothing for parameter as 10 ** p

Lambda: np.ndarray

Diagonal Eigenvalue matrix of D defined by Yueh (2005) with: lambda_i = -2 + 2*cos( (i-1) * pi / n)

DCTy: np.ndarray

Output array of dctNd

y: np.ndarray

Input data array

smoothOrder: int, optional (default: 2)

Exponential of lambda used for smoothing. Either 1 or 2

Wtot: np.ndarray, optional (default: None)

Weights for each y value to use (same shape as y). Elements in Wtot for which the counterpart in y is NaN are ignored. When None are passed, then the unweighted implementation, which is much faster, will be used.

score_only: bool, optional (default: False)

Only return the score for p, not Gamma and TrH (for optimisation) This is required for the bounded minimization which requires a single return value.

Returns:

score: np.ndarray or None

GCV score. 0, 1 or 2 dimensional … dim=Lambda.ndim-1

smooth: float or None

10**p, the smoothing parameter

Gamma: np.ndarray or None

Gamma when applying s

TrH: np.ndarray or None

Track of Hat when applying s

pytesmo.interpolate.dctpls.smoothn(data: ~numpy.ndarray, smooth: float | None = None, axis: ~typing.Tuple[int, ...] | int | None = None, data_weights: ~numpy.ndarray | None = None, smoothOrder: int | None = 2, init_guess: ~numpy.ndarray | None = None, isrobust: bool | None = True, MaxIter: int | None = 100, TolZ: float = 0.001, gap_value: float | int | None = nan, exclusion_mask: ~numpy.ndarray | None = None, data_sampling: float | ~typing.Tuple[float, ...] | None = 1.0, debug_mode: bool | None = False, return_stats: ~typing.Tuple[str, ...] | None = None) -> (<class 'numpy.ndarray'>, <class 'float'>, <class 'int'>, <class 'dict'>)[source]

Robust spline smoothing for 1-D to 3-D data. A fast, automatized and robust discretised smoothing spline for data of any dimension.

When using this algorithm, refer to:

Garcia, D. (2010) ‘Robust smoothing of gridded data in one and higher dimensions with missing values’, Computational Statistics & Data Analysis, 54(4), pp. 1167–1178. Available at: https://doi.org/10.1016/j.csda.2009.09.020

Parameters:
  • data (np.ndarray) – 1,2 or 3-dimensional array of values to smoothen. data gaps to fill should have the value given as gap_value.

  • smooth (float, optional (default: None)) – Smoothing parameter. If given, smooth must be a real positive scalar. The larger smooth is, the smoother the output will be. If None is passed smooth is automatically determined from data by minimising the generalized cross-validation (GCV) score.

  • axis (int or tuple[int,...], optional (default: None)) – Axes along which the smoothing is applied. If not given, smoothing is applied along all axes of dat.

  • data_weights (np.ndarray, optional (default: None)) – Weight (normally between 0 and 1) for each value in dat. Weights must be given as array of same shape as dat. Weight 0 would mean that a value is ignored when fitting the model. NaNs in data will be assigned the weight 0.

  • smoothOrder (int, optional (default: 2)) – Exponential of lambda and gamma used for smoothing. 1 would be a linear interpolation

  • init_guess (np.ndarray, optional (default: None)) – First guess for the interpolated data, resp. initial value for the iterative process. Must have the same shape as data if passed. If z0 / initial guess is not passed, it will be generated from the passed data.

  • isrobust (bool, optional (default: False)) – Apply robust weights (not affected by outliers)

  • MaxIter – Maximum number of iterations allowed (default = 100)

  • TolZ – Termination tolerance on Z (default = 1e-3) TolZ must be in ]0,1[

  • exclusion_mask (np.ndarray, optional (default: None)) – A boolean mask of elements in data that are excluded from calculating the smoothing function. Must have the same shape as data. Elements in data where the corresponding exclusion mask is True will still be considered in the interpolation (so that the original distances and neighbourhoods are retained), but in the final output they will be removed again and replaced with NaNs.

  • data_sampling (int or tuple, optional (default: 1)) – Sampling unit for each dimension of ax. If a tuple is given, it must be the same length as the number of dimensions of dat. If a single number is given, it is applied to all dimensions.

  • debug_mode (bool, optional (default: False)) – All debug messages are logged by the pytesmo logger (logging.getLogger(‘pytesmo’)). If this setting is activated, we add a handler to log to stdout for the DCTPLS function to print the debug messages.

  • return_stats (tuple, optional (default: None)) –

    Select which side products should be kept and returned. Removing some from the list will reduce memory usage.

    • ’initial_guess’:

      Return the initial guess for smoothing

    • ’euclidean_distance’:

      Return measure for the gap size (only when exclusion mask is set).

    • ’final_weights’:

      How much weight was assigned to each data element in the end.

    • gcv_score:

      Score of the GCV

Returns:

  • z (np.ndarray) – Smoothed and interpolated version of input data

  • smooth (float) – Final parameter that was used

  • exit_flag (int) – 1: ok 2: less than 2 data samples 3: Inner loop did not converge

  • stats (dict) – Side products that were selected in return_stats

pytesmo.interpolate.dctpls.sumnd(x)[source]

Apply numpy sum over all available dimensions

Parameters:

x (np.ndarray) – 1, 2 or 3-dimensional data array

Returns:

sum – Sum over all available dimensions

Return type:

float