pytesmo.validation_framework.data_manager module
- class pytesmo.validation_framework.data_manager.DataManager(datasets, ref_name, period=None, read_ts_names='read', upscale_parms=None)[source]
Bases:
MixinReadTs
Class to handle the data management.
- Parameters:
datasets (dict of dicts) –
:Keys : string, datasets names :Values : dict, containing the following fields
- ’class’object
Class containing the method read for reading the data.
- ’columns’list
List of columns which will be used in the validation process.
- ’args’list, optional
Args that are passed to the reading function.
- ’kwargs’dict, optional
Kwargs that are passed to the reading function.
- ’grids_compatible’boolean, optional
If set to True the grid point index is used directly when reading other, if False then lon, lat is used and a nearest neighbour search is necessary. default: False
- ’use_lut’boolean, optional
If set to True the grid point index (obtained from a calculated lut between reference and other) is used when reading other, if False then lon, lat is used and a nearest neighbour search is necessary. default: False
- ’max_dist’float, optional
Maximum allowed distance in meters for the lut calculation. default: None
ref_name (string) – Name of the reference dataset. The reference dataset is used as spatial reference, i.e. all other dataset will be interpolated to the locations of the reference dataset.
period (list, optional) – Of type [datetime start, datetime end]. If given then the two input datasets will be truncated to start <= dates <= end.
read_ts_names (string or dict of strings, optional) – if another method name than ‘read’ should be used for reading the data then it can be specified here. If it is a dict then specify a function name for each dataset.
upscale_parms (dict, optional. Default is None.) –
- dictionary with parameters for the upscaling methods. Keys:
’upscaling_method’: method for upscaling
’temporal_stability’: bool for using temporal stability
- ’upscaling_lut’: dict of shape
{‘other_name’:{ref gpi: [other gpis]}}
- use_lut(other_name)
Returns lut between reference and other if use_lut for other dataset was set to True.
- get_result_names()
Return results names based on reference and others names.
- property ds_dict
- get_data(gpi, lon, lat)[source]
Get all the data from this manager for a certain grid point, longitude, latidude combination.
- Parameters:
- Returns:
df_dict – Dictionary with dataset names as the key and pandas.DataFrames containing the data for the point as values. The dict will be empty if no data is available.
- Return type:
dict of pandas.DataFrames
- get_luts()[source]
Returns luts between reference and others if use_lut for other datasets was set to True.
- Returns:
luts – Keys: other datasets names Values: lut between reference and other, or None
- Return type:
- get_other_data(gpi, lon, lat)[source]
Get all the data for non reference datasets from this manager for a certain grid point, longitude, latidude combination.
- Parameters:
- Returns:
other_dataframes – Dictionary with dataset names as the key and pandas.DataFrames containing the data for the point as values. The dict will be empty if no data is available.
- Return type:
dict of pandas.DataFrames
- read_other(name, *args)[source]
Function to read and prepare non-reference datasets.
Calls read of the dataset.
Takes either 1 (gpi) or 2 (lon, lat) arguments.
- Parameters:
- Returns:
data_df – Data DataFrame.
- Return type:
pandas.DataFrame or None
- read_reference(*args)[source]
Function to read and prepare the reference dataset.
Calls read of the dataset. Takes either 1 (gpi) or 2 (lon, lat) arguments.
- Parameters:
- Returns:
ref_df – Reference dataframe.
- Return type:
pandas.DataFrame or None
- pytesmo.validation_framework.data_manager.get_result_combinations(ds_dict, n=2)[source]
Get all possible combinations dataset columns
- Parameters:
ds_dict (dict) – Dict of lists containing the dataset names as keys and a list of the columns to read from the dataset as values.
n (int) – Number of datasets for combine with each other. If n=2 always two datasets will be combined into one result. If n=3 always three datasets will be combined into one results and so on. n has to be <= the number of total datasets.
- Returns:
results_names – Containing all possible combinations of (dataset_x.column, dataset_y.column) for all datasets in ds_dict
- Return type:
list of tuples
- pytesmo.validation_framework.data_manager.get_result_names(ds_dict, refkey, n=2)[source]
Return result names based on all possible combinations based on a reference dataset.
- Parameters:
ds_dict (dict) – Dict of lists containing the dataset names as keys and a list of the columns to read from the dataset as values.
refkey (string) – dataset name to use as a reference
n (int) – Number of datasets for combine with each other. If n=2 always two datasets will be combined into one result. If n=3 always three datasets will be combined into one results and so on. n has to be <= the number of total datasets.
- Returns:
results_names – Containing all combinations of (referenceDataset.column, otherDataset.column)
- Return type:
list of tuples