pytesmo.validation_framework package
Submodules
pytesmo.validation_framework.adapters module
pytesmo.validation_framework.data_manager module
- class pytesmo.validation_framework.data_manager.DataManager(datasets, ref_name, period=None, read_ts_names='read', upscale_parms=None)[source]
Bases:
MixinReadTsClass to handle the data management.
- Parameters:
datasets (dict of dicts) –
:Keys : string, datasets names :Values : dict, containing the following fields
- ’class’object
Class containing the method read for reading the data.
- ’columns’list
List of columns which will be used in the validation process.
- ’args’list, optional
Args that are passed to the reading function.
- ’kwargs’dict, optional
Kwargs that are passed to the reading function.
- ’grids_compatible’boolean, optional
If set to True the grid point index is used directly when reading other, if False then lon, lat is used and a nearest neighbour search is necessary. default: False
- ’use_lut’boolean, optional
If set to True the grid point index (obtained from a calculated lut between reference and other) is used when reading other, if False then lon, lat is used and a nearest neighbour search is necessary. default: False
- ’max_dist’float, optional
Maximum allowed distance in meters for the lut calculation. default: None
ref_name (string) – Name of the reference dataset. The reference dataset is used as spatial reference, i.e. all other dataset will be interpolated to the locations of the reference dataset.
period (list, optional) – Of type [datetime start, datetime end]. If given then the two input datasets will be truncated to start <= dates <= end.
read_ts_names (string or dict of strings, optional) – if another method name than ‘read’ should be used for reading the data then it can be specified here. If it is a dict then specify a function name for each dataset.
upscale_parms (dict, optional. Default is None.) –
- dictionary with parameters for the upscaling methods. Keys:
’upscaling_method’: method for upscaling
’temporal_stability’: bool for using temporal stability
- ’upscaling_lut’: dict of shape
{‘other_name’:{ref gpi: [other gpis]}}
- use_lut(other_name)
Returns lut between reference and other if use_lut for other dataset was set to True.
- get_result_names()
Return results names based on reference and others names.
- property ds_dict
- get_data(gpi, lon, lat)[source]
Get all the data from this manager for a certain grid point, longitude, latidude combination.
- Parameters:
- Returns:
df_dict – Dictionary with dataset names as the key and pandas.DataFrames containing the data for the point as values. The dict will be empty if no data is available.
- Return type:
dict of pandas.DataFrames
- get_luts()[source]
Returns luts between reference and others if use_lut for other datasets was set to True.
- Returns:
luts – Keys: other datasets names Values: lut between reference and other, or None
- Return type:
- get_other_data(gpi, lon, lat)[source]
Get all the data for non reference datasets from this manager for a certain grid point, longitude, latidude combination.
- Parameters:
- Returns:
other_dataframes – Dictionary with dataset names as the key and pandas.DataFrames containing the data for the point as values. The dict will be empty if no data is available.
- Return type:
dict of pandas.DataFrames
- read_other(name, *args)[source]
Function to read and prepare non-reference datasets.
Calls read of the dataset.
Takes either 1 (gpi) or 2 (lon, lat) arguments.
- Parameters:
- Returns:
data_df – Data DataFrame.
- Return type:
pandas.DataFrame or None
- read_reference(*args)[source]
Function to read and prepare the reference dataset.
Calls read of the dataset. Takes either 1 (gpi) or 2 (lon, lat) arguments.
- Parameters:
- Returns:
ref_df – Reference dataframe.
- Return type:
pandas.DataFrame or None
- pytesmo.validation_framework.data_manager.get_result_combinations(ds_dict, n=2)[source]
Get all possible combinations dataset columns
- Parameters:
ds_dict (dict) – Dict of lists containing the dataset names as keys and a list of the columns to read from the dataset as values.
n (int) – Number of datasets for combine with each other. If n=2 always two datasets will be combined into one result. If n=3 always three datasets will be combined into one results and so on. n has to be <= the number of total datasets.
- Returns:
results_names – Containing all possible combinations of (dataset_x.column, dataset_y.column) for all datasets in ds_dict
- Return type:
list of tuples
- pytesmo.validation_framework.data_manager.get_result_names(ds_dict, refkey, n=2)[source]
Return result names based on all possible combinations based on a reference dataset.
- Parameters:
ds_dict (dict) – Dict of lists containing the dataset names as keys and a list of the columns to read from the dataset as values.
refkey (string) – dataset name to use as a reference
n (int) – Number of datasets for combine with each other. If n=2 always two datasets will be combined into one result. If n=3 always three datasets will be combined into one results and so on. n has to be <= the number of total datasets.
- Returns:
results_names – Containing all combinations of (referenceDataset.column, otherDataset.column)
- Return type:
list of tuples
pytesmo.validation_framework.data_scalers module
Data scaler classes to be used together with the validation framework.
- class pytesmo.validation_framework.data_scalers.CDFStoreParamsScaler(path, grid, percentiles=[0, 5, 10, 30, 50, 70, 90, 95, 100], **matcher_kwargs)[source]
Bases:
objectCDF scaling using stored parameters if available. If stored parameters are not available they are calculated and written to disk.
- Parameters:
path (string) – Path where the data is/should be stored
grid (
pygeogrids.grids.CellGridinstance) – Grid on which the data is stored. Should be the same as the spatial reference grid of the validation framework instance in which this scaler is used.percentiles (list or np.ndarray) – Percentiles to use for CDF matching
**matcher_kwargs (keyword arguments) – Passed on to
pytesmo.cdf_matching.CDFMatching`
- calc_parameters(data, reference_index)[source]
Calculate the percentiles used for CDF matching.
- Parameters:
data (pandas.DataFrame) – temporally matched dataset
reference_index (int) – Index of the reference column in the dataset.
- Returns:
matchers – keys -> Names of columns in the input data frame values -> nbins x 3 numpy.ndarrays with columns x_perc, y_perc,
percentiles
- Return type:
dictionary
- get_parameters(data, reference_index, gpi)[source]
Function to get scaling parameters. Try to load them, if they are not found we calculate them and store them.
- Parameters:
data (pandas.DataFrame) – temporally matched dataset
gpi (int) – grid point index of self.grid
- Returns:
params – keys -> Names of columns in the input data frame values -> numpy.ndarrays with the percentiles
- Return type:
dictionary
- scale(data, reference_index, gpi_info)[source]
Scale all columns in data to the column at the reference_index.
- Parameters:
data (pandas.DataFrame) – temporally matched dataset
reference_index (int) – Which column of the data contains the scaling reference.
gpi_info (tuple) – tuple of at least, (gpi, lon, lat) Where gpi has to be the grid point indices of the grid of this scaler.
- Raises:
ValueError – if scaling is not successful
- class pytesmo.validation_framework.data_scalers.DefaultScaler(method)[source]
Bases:
objectScaling class that implements the scaling based on a given method from the pytesmo.scaling module.
- Parameters:
method (string) – The data will be scaled into the reference space using the method specified by this string.
- scale(data, reference_index, gpi_info)[source]
Scale all columns in data to the column at the reference_index.
- Parameters:
data (pandas.DataFrame) – temporally matched dataset
reference_index (int) – Which column of the data contains the scaling reference.
gpi_info (tuple) – tuple of at least, (gpi, lon, lat) Where gpi has to be the grid point indices of the grid of this scaler.
- Raises:
ValueError – if scaling is not successful
pytesmo.validation_framework.error_handling module
Definition of exceptions and error return codes.
The error codes are integer values. They should normally be non-negative if properly handled. An error code of -1 indicates failure in handling the error. 0 indicates success.
- exception pytesmo.validation_framework.error_handling.DataManagerError[source]
Bases:
ValidationError- return_code = 8
- exception pytesmo.validation_framework.error_handling.MetricsCalculationError[source]
Bases:
ValidationError- return_code = 2
- exception pytesmo.validation_framework.error_handling.NoGpiDataError[source]
Bases:
ValidationError- return_code = 7
- exception pytesmo.validation_framework.error_handling.NoTempMatchedDataError[source]
Bases:
ValidationError- return_code = 4
- exception pytesmo.validation_framework.error_handling.ScalingError[source]
Bases:
ValidationError- return_code = 5
- exception pytesmo.validation_framework.error_handling.TemporalMatchingError[source]
Bases:
ValidationError- return_code = 3
- exception pytesmo.validation_framework.error_handling.ValidationError[source]
Bases:
Exception- return_code = -1
- exception pytesmo.validation_framework.error_handling.ValidationFailedError[source]
Bases:
ValidationError- return_code = 6
pytesmo.validation_framework.metric_calculators module
pytesmo.validation_framework.metric_calculators_adapters module
pytesmo.validation_framework.results_manager module
The results manager stores validation results in netcdf format.
- class pytesmo.validation_framework.results_manager.PointDataResults(filename, zlib=True, read_only=False)[source]
Bases:
Dataset- add_metrics_results(lons, lats, results, attr=None)[source]
Add observations over time to a locations results.
- Parameters:
lons (np.array) – Array of location longitudes, shape must match shape of arrays in data
lats (np.array) – Array of location latitudes, shape must match shape of arrays in data
results (dict) – Variable names as dict keys and data arrays as values. As returned by the metric calculators, except the RollingMetrics. Shape of data arrays must match lons/lats.
attr (dict, optional (default: None)) – Variable names as keys and attributes as dicts for each variable. Only used when the variable is created, not if it is already in the dataset.
- Returns:
idx – Indices of the new locations, can be used to add time results
- Return type:
indices
- add_result(lon, lat, data, ts_vars=None, times=None, attr=None)[source]
Add all results (time series and location metrics) for a single point.
- Parameters:
lon (float) – Longitude of the point
lat (float) – Latitude of the point
data (dict) – Dict of metric names and values. For normal (not rolling) metrics this is an array of size 1, otherwise if the same size as time.
times (np.array, optional (default: None)) – Time values, length must mach all time series (rolling) metrics in data.
attr (dict, optional (default: None)) –
- add_ts_results(idx, times, results, attr=None)[source]
Add observations over time to previously added locations results.
- Parameters:
idx (int) – Location index, as returned when adding metrics results.
times (pd.DatetimeIndex or np.array) – Datetime index as in the validation results from rolling metrics
results (dict) – Variable names as dict keys and data arrays as values. Data arrays must have same size as times.
attr (dict, optional (default: None)) – Variable names as keys and attributes as dicts for each variable. Only used when the variable is created, not if it is already in the dataset.
- read_loc(idx: int | array | None = None) DataFrame[source]
Read loc data for one/multiple/all point(s)
- property variables: array
Names of all variables in the data set
- pytesmo.validation_framework.results_manager.build_filename(root, key)[source]
Create savepath/filename that does not exceed 255 characters
- pytesmo.validation_framework.results_manager.netcdf_results_manager(results, save_path, filename: dict | None = None, ts_vars: list | None = None, zlib=True, attr=None)[source]
Write validation results to netcdf file.
- Parameters:
results (dict) – Validation results as returned by the metrics calculator. Keys are tuples that define the dataset names that were used. Values contains ‘lon’ and ‘lat’ keys for defining the points, and optionally ‘time’ which sets the time stamps for each location (if there are metrics over time in the results - e.g due to RollingMetrics)
save_path (str) – Directory where the netcdf file(s) are are created.
filename (dict, optional (default: None)) – Filename(s) (value), for each dataset combination in results (key). By default (if None is passed) the keys in results are used to generate a file name.
ts_vars (list, optional (default: None)) – List of variables in results that are treated as time series
zlib (bool, optional (default: True)) – Activate compression
attr (dict, optional (default: None)) – Variable attributes, variable names as keys, attributes as another dict in values.
pytesmo.validation_framework.start_validation module
pytesmo.validation_framework.temporal_matchers module
Created on Sep 24, 2013
@author: Christoph.Paulik@geo.tuwien.ac.at
- class pytesmo.validation_framework.temporal_matchers.BasicTemporalMatching(window=0.5)[source]
Bases:
objectTemporal matching object
- Parameters:
window (float) – window size to use for temporal matching. A match in other will only be found if it is +- window size days away from a point in reference
- combinatory_matcher(df_dict, refkey, n=2, **kwargs)[source]
Basic temporal matcher that matches always one Dataframe to the reference Dataframe resulting in matched DataFrame pairs.
If the input dict has the keys ‘data1’ and ‘data2’ then the output dict will have the key (‘data1’, ‘data2’). The new key is stored as a tuple to avoid any issues with string concetanation.
During matching the column names of the dataframes will be transformed into MultiIndex to ensure unique names.
- Parameters:
- Returns:
matched – Dictionary containing matched DataFrames. The key is put together from the keys of the input dict as a tuple of the keys of the datasets this dataframe contains.
- Return type:
dict of pandas.DataFrames
- pytesmo.validation_framework.temporal_matchers.df_name_multiindex(df, name)[source]
Rename columns of a DataFrame by using new column names that are tuples of (name, column_name) to ensure unique column names that can also be split again. This transforms the columns to a MultiIndex.
- pytesmo.validation_framework.temporal_matchers.dfdict_combined_temporal_collocation(dfs, refname, k, window=None, n=None, **kwargs)[source]
Applies
combined_temporal_collocation()on a dictionary of dataframes.- Parameters:
dfs (dict) – Dictionary of pd.DataFrames containing the dataframes to be collocated.
refname (str) – Name of the reference frame in dfs.
k (int) – Number of columns that will be put together in the output dictionary. The output will consist of all combinations of size k.
window (pd.Timedelta or float, optional) – Window around reference timestamps in which to look for data. Floats are interpreted as number of days. If it is not given, defaults to 1 hour to mimick the behaviour of
BasicTemporalMatching.combinatory_matcher.**kwargs – Keyword arguments passed to
combined_temporal_collocation().Returns –
-------- –
matched_dict (dict) – Dictionary where the key is tuples of
(refname, othernames...).
- pytesmo.validation_framework.temporal_matchers.make_combined_temporal_matcher(window)[source]
Matches multiple dataframes together to only have common timestamps.
See
pytesmo.temporal_matching.dfdict_combined_temporal_collocation()for more details- Parameters:
window (pd.Timedelta or float, optional) – Window around reference timestamps in which to look for data. Floats are interpreted as number of days. If it is not given, defaults to 1 hour to mimick the behaviour of
BasicTemporalMatching.combinatory_matcher.
pytesmo.validation_framework.upscaling module
- class pytesmo.validation_framework.upscaling.MixinReadTs[source]
Bases:
objectMixin class to provide the reading function in DataAverager and DataManager
- read_ds(name, *args)[source]
Function to read and prepare a datasets.
Calls read_ts of the dataset.
Takes either 1 (gpi) or 2 (lon, lat) arguments.
- Parameters:
name (string) – Name of the other dataset.
args (either gpi or (lon, lat)) –
gpi (int): Grid point index
lon (float): Longitude of point
lat(float): Latitude of point
- Returns:
data_df – Data DataFrame.
- Return type:
pandas.DataFrame or None
- class pytesmo.validation_framework.upscaling.Upscaling(ref_class, others_class, upscaling_lut, manager_parms)[source]
Bases:
MixinReadTsThis class provides methods to combine the measurements of validation datasets (others) that fall under the same gridpoint of the dataset being validated (reference).
The goal is to include here all identified upscaling methods to provide an estimate at the reference footprint scale.
Implemented methods:
time-stability filtering
simple averaging
- Parameters:
ref_class (<reader object> of the reference) – Class containing the method read_ts for reading the data of the reference
others_class (dict) – Dict of shape {‘other_name’: <reader object>} for the other dataset
upscaling_lut (dict) – Dict of shape {‘other_name’:{ref gpi: [other gpis]}}
manager_parms (dict) – Dict of DataManager attributes
- get_upscaled_ts(gpi, other_name, upscaling_method='average', temporal_stability=False, **kwargs) None | DataFrame[source]
Find the upscale estimate timeseries with given method, for a certain reference gpi
- Parameters:
gpi (int) – gpi value of the reference point
other_name (str) – name of the non-reference dataset to be upscaled
upscaling_method (str) –
- method to use for upscaling:
’average’ takes the simple mean of all timeseries
temporal_stability (bool, default is False) – if True, the values are filtered using the time stability concept
kwargs (keyword arguments) – arguments for the temporal window or time stability thresholds
- Returns:
upscaled – upscaled time series; if there are no points under the specific gpi, None is returned
- Return type:
pd.DataFrame or None
- static temporal_match(to_match, hours=6, drop_missing=False, **kwargs) DataFrame[source]
Temporal match to the longest timeseries
- Parameters:
- Returns:
matched – dataframe with temporally matched timeseries
- Return type:
pd.DataFrame
- static tstability_filter(df, r_min=0.6, see_max=0.05, min_n=4, **kwargs) DataFrame[source]
Uses time stability concepts to filter point-measurements (pms). Determines whether the upscaled measurement based on a simple average of all the pms is in sufficient agreement with each pm, and if not eliminates pm from the pool.
Thresholds are based on: Wagner W, Pathe C, Doubkova M, Sabel D, Bartsch A, Hasenauer S, Blöschl G, Scipal K, Martínez-Fernández J, Löw A. Temporal Stability of Soil Moisture and Radar Backscatter Observed by the Advanced Synthetic Aperture Radar (ASAR). Sensors. 2008; 8(2):1174-1197. https://doi.org/10.3390/s80201174
- Parameters:
- Returns:
filtered – filtered input
- Return type:
pd.DataFrame
- upscale(df, method='average', **kwargs) Series[source]
Handle the column names and return the upscaled Dataframe with the specified method.
- Parameters:
df (pd.DataFrame) – Dataframe of values to upscale using method
method (str) – averaging method
kwargs (keyword arguments) – Arguments for some upscaling functions
- Returns:
upscaled – dataframe with “upscaled” column
- Return type:
pytesmo.validation_framework.validation module
- class pytesmo.validation_framework.validation.Validation(datasets, spatial_ref, metrics_calculators, temporal_matcher=None, temporal_window=0.041666666666666664, temporal_ref=None, masking_datasets=None, period=None, scaling='cdf_match', scaling_ref=None)[source]
Bases:
objectClass for the validation process.
- Parameters:
datasets (dict of dicts or DataManager) –
- Keys:
string, datasets names
- Values:
dict, containing the following fields
pytesmo.validation_framework.data_manager.DataManager- ’class’: object
Class containing the method read_ts for reading the data.
- ’columns’: list
List of columns which will be used in the validation process.
- ’args’: list, optional
Args for reading the data.
- ’kwargs’: dict, optional
Kwargs for reading the data
- ’grids_compatible’: boolean, optional
If set to True the grid point index is used directly when reading other, if False then lon, lat is used and a nearest neighbour search is necessary.
- ’use_lut’: boolean, optional
If set to True the grid point index (obtained from a calculated lut between reference and other) is used when reading other, if False then lon, lat is used and a nearest neighbour search is necessary.
- ’lut_max_dist’: float, optional
Maximum allowed distance in meters for the lut calculation.
spatial_ref (string) – Name of the dataset used as a spatial, temporal and scaling reference. temporal and scaling references can be changed if needed. See the optional parameters
temporal_refandscaling_ref.metrics_calculators (dict of functions) –
The keys of the dict are tuples with the following structure: (n, k) with n >= 2 and n>=k. n must be equal to the number of datasets now. n is the number of datasets that should be temporally matched to the reference dataset and k is how many columns the metric calculator will get at once. What this means is that it is e.g. possible to temporally match 3 datasets with 3 columns in total and then give the combinations of these columns to the metric calculator in sets of 2 by specifying the dictionary like:
{ (3, 2): metric_calculator}
The values are functions that take an input DataFrame with the columns ‘ref’ for the reference and ‘n1’, ‘n2’ and so on for other datasets as well as a dictionary mapping the column names to the names of the original datasets. In this way multiple metric calculators can be applied to different combinations of n input datasets.
temporal_matcher (function, optional) – function that takes a dict of dataframes and a reference_key. It performs the temporal matching on the data and returns a dictionary of matched DataFrames that should be evaluated together by the metric calculator.
temporal_window (float, optional) – Window to allow in temporal matching in days. The window is allowed on both sides of the timestamp of the temporal reference data. Only used with the standard temporal matcher.
temporal_ref (string, optional) – If the temporal matching should use another dataset than the spatial reference as a reference dataset then give the dataset name here.
period (list, optional) – Of type [datetime start, datetime end]. If given then the two input datasets will be truncated to start <= dates <= end.
masking_datasets (dict of dictionaries) – Same format as the datasets with the difference that the read method of these datasets has to return pandas.DataFrames with only boolean columns. True means that the observations at this timestamp should be masked and False means that it should be kept.
scaling (str or None or class instance) –
If set then the data will be scaled into the reference space using the method specified by the string using the
pytesmo.validation_framework.data_scalers.DefaultScalerclass.If set to None then no scaling will be performed.
It can also be set to a class instance that implements a
scale(self, data, reference_index, gpi_info)method. Seepytesmo.validation_framework.data_scalers.DefaultScalerfor an example.
scaling_ref (string, optional) – If the scaling should be done to another dataset than the spatial reference then give the dataset name here.
- calc(gpis, lons, lats, *args, rename_cols=True, only_with_reference=False, handle_errors='raise') Mapping[Tuple[str], Mapping[str, ndarray]][source]
The argument iterables (lists or numpy.ndarrays) are processed one after the other in tuples of the form (gpis[n], lons[n], lats[n], arg1[n], ..).
- Parameters:
gpis (iterable) – The grid point indices is an identificator by which the spatial reference dataset can be read. This is either a list or a numpy.ndarray or any other iterable containing this indicator.
lons (iterable) – Longitudes of the points identified by the gpis. Has to be the same size as gpis.
lats (iterable) – latitudes of the points identified by the gpis. Has to be the same size as gpis.
args (iterables) – any addiational arguments have to have the same size as the gpis iterable. They are given to the metrics calculators as metadata. Common usage is e.g. the long name or network name of an in situ station.
rename_cols (bool, optional) – Whether to rename the columns to “ref”, “k1”, … before passing the dataframe to the metrics calculators. Default is True.
only_with_reference (bool, optional) – If this is enabled, only combinations that include the reference dataset (from the data manager) are calculated.
handle_errors (str, optional (default: 'raise')) –
Governs how to handle errors:
* `raise`: If an error occurs during validation, raise exception. * `ignore`: If an error occurs, assign the correct return code to the result template and continue with the next GPI.
- Returns:
compact_results –
- Keys:
result names, combinations of (referenceDataset.column, otherDataset.column)
- Values:
dict containing the elements returned by metrics_calculator
- Return type:
dict of dicts
- dummy_validation_result(gpi_info, rename_cols=True, only_with_reference=False) Mapping[Tuple[str], List[Mapping[str, ndarray]]][source]
Creates an empty result dictionary to be used if perform_validation fails
- get_data_for_result_tuple(n_matched_data, result_tuple)[source]
Extract a dataframe for a given result tuple from the matched dataframes.
- Parameters:
- Returns:
data – pandas DataFrame with columns extracted from the temporally matched datasets
- Return type:
pd.DataFrame
- get_processing_jobs()[source]
Returns processing jobs that this process can understand.
- Returns:
jobs – List of cells or gpis to process.
- Return type:
- k_datasets_from(n_matched_data, result_names, include_scaling_ref=True)[source]
Extract k datasets from n temporally matched ones.
This is used to send combinations of k datasets to metrics calculators expecting only k datasets.
- Parameters:
n_matched_data (dict of pandas.DataFrames) – DataFrames in which n datasets were temporally matched. The key is a tuple of the dataset names.
result_names (list) – result names to extract
include_scaling_ref (boolean, optional) – if set the scaling reference will always be included. Should only be disabled for getting the masking datasets
- Yields:
data (pd.DataFrame) – pandas DataFrame with k columns extracted from the temporally matched datasets
result (tuple) – Tuple describing which datasets and columns are in the returned data. ((dataset_name, column_name), (dataset_name2, column_name2))
- mask_dataset(ref_df, gpi_info)[source]
Mask the temporal reference dataset with the data read through the masking datasets.
- Parameters:
gpi_info (tuple) – tuple of at least, (gpi, lon, lat)
- Returns:
mask – boolean array of the size of the temporal reference read
- Return type:
- perform_validation(df_dict, gpi_info, rename_cols=True, only_with_reference=False, handle_errors='raise') Mapping[Tuple[str], List[Mapping[str, ndarray]]][source]
Perform the validation for one grid point index and return the matched datasets as well as the calculated metrics.
- Parameters:
df_dict (dict of pandas.DataFrames) – DataFrames read by the data readers for each dataset
gpi_info (tuple) – tuple of at least, (gpi, lon, lat)
rename_cols (bool, optional) – Whether to rename the columns to “ref”, “k1”, … before passing the dataframe to the metrics calculators. Default is True.
only_with_reference (bool, optional (default: False)) – Only compute metrics for dataset combinations where the reference is included.
- Returns:
matched_n (dict of pandas.DataFrames) – temporally matched data stored by (n, k) tuples
results (dict) – Dictonary of calculated metrics stored by dataset combinations tuples.
used_data (dict) – The DataFrame used for calculation of each set of metrics.
- Raises:
eh.TemporalMatchingError : – If temporal matching failed
eh.NoTempMatchedDataError : – If there is insufficient data or the temporal matching did not return data.
eh.ScalingError : – If scaling failed
- temporal_match_datasets(df_dict)[source]
Temporally match all the requested combinations of datasets.
- temporal_match_masking_data(ref_df, gpi_info)[source]
Temporal match the masking data to the reference DataFrame
- Parameters:
ref_df (pandas.DataFrame) – Reference data
- Returns:
matched_masking – Contains temporally matched masking data. This dict has only one key being a tuple that contains the matched datasets.
- Return type:
dict of pandas.DataFrames