{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Temporal Collocation of irregularly sampled timeseries\n", "\n", "Satellite observations usually have an irregular temporal sampling pattern (intervals between 6-36 hours), which is mostly controlled by the orbit of the satellite and the instrument measurement geometry. On the other hand, in-situ instruments or land surface models generally sample on regular time intervals (commonly every 1, 3, 6, 12 or 24 hours). \n", "In order to compute error/performance statistics (such as RMSD, bias, correlation) between the time series coming from different sources, it is required that observation pairs (or triplets, etc.) are found which (nearly) coincide in time.\n", "\n", "A simple way to identify such pairs is by using a nearest neighbor search. First, one time series needs to be selected as temporal reference (i.e. all other time series will be matched to this reference) and second, a tolerance window (typically around 1-12 hours) has to be defined characterizing the temporal correlation of neighboring observation (i.e. observations outside of the tolerance window are no longer be considered as representative neighbors). " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Temporal collocation in pytesmo\n", "\n", "Pytesmo contains the function `pytesmo.temporal_matching.temporal_collocation` for temporally collocating timeseries. Currently, it implements nearest neighbour matching and a windowed mean. It requires a reference index (can also be a DataFrame or a Series), a DataFrame (or Series) to be collocated, and a window.\n", "\n", "```\n", "collocated = temporal_collocation(reference, input_frame, window)\n", "```\n", "\n", "The window argument corresponds to the time intervals that are included in the nearest neighbour search in each direction, e.g. if the reference time is $t$ and the window $\\Delta$, the nearest neighbour inside $[t-\\Delta, t+\\Delta]$ is returned. If no neighbour is found `np.nan` is used as replacement. NaNs can be dropped from the returned dataframe by providing the optional keyword argument ``dropna=True`` to the function.\n", "\n", "Below are two simple examples which demonstrate the usage. The first example assumes that the index of data to be collocated is shifted by 3 hours with respect to the reference, while using a 6 hour window. The second example uses an index that is randomly shifted by $\\pm12$ hours with respect to the reference. The second example also uses a 6 hour window, which results in some missing values in the resulting dataframe." ] }, { "cell_type": "code", "execution_count": 47, "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "import pandas as pd\n", "\n", "from pytesmo.temporal_matching import temporal_collocation, combined_temporal_collocation" ] }, { "cell_type": "code", "execution_count": 48, "metadata": {}, "outputs": [ { "data": { "text/plain": " shifted_0 shifted_1 shifted_2\n2020-01-01 03:00:00 -0.687795 -0.626649 0.237109\n2020-01-02 03:00:00 -0.514778 -1.981137 0.354644\n2020-01-03 03:00:00 -0.600629 -0.761766 0.169777\n2020-01-04 03:00:00 -0.650058 -0.548499 0.548560\n2020-01-05 03:00:00 1.331785 -1.611482 -1.325902", "text/html": "
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
shifted_0shifted_1shifted_2
2020-01-01 03:00:00-0.687795-0.6266490.237109
2020-01-02 03:00:00-0.514778-1.9811370.354644
2020-01-03 03:00:00-0.600629-0.7617660.169777
2020-01-04 03:00:00-0.650058-0.5484990.548560
2020-01-05 03:00:001.331785-1.611482-1.325902
\n
" }, "execution_count": 48, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# create reference time series\n", "ref = pd.date_range(\"2020-01-01\", \"2020-12-31\", freq=\"D\")\n", "# temporal_collocation can also take a DataFrame or Series as reference input,\n", "# in case their index is a DatetimeIndex.\n", "\n", "# create other time series as dataframe\n", "values = np.random.randn(len(ref), 3)\n", "shifted = pd.DataFrame(values, index=ref + pd.Timedelta(hours=3), \n", " columns=list(map(lambda x: f\"shifted_{x}\", range(3))))\n", "random_shift = np.random.uniform(-12, 12, len(ref))\n", "random = pd.DataFrame(values, index=ref + pd.to_timedelta(random_shift, \"H\"),\n", " columns=list(map(lambda x: f\"random_{x}\", range(3))))\n", "\n", "shifted.head()" ] }, { "cell_type": "code", "execution_count": 49, "metadata": {}, "outputs": [ { "data": { "text/plain": " random_0 random_1 random_2\n2019-12-31 21:00:12.990837600 -0.687795 -0.626649 0.237109\n2020-01-02 03:18:11.512674000 -0.514778 -1.981137 0.354644\n2020-01-03 04:15:20.703038399 -0.600629 -0.761766 0.169777\n2020-01-03 22:47:25.851343200 -0.650058 -0.548499 0.548560\n2020-01-05 01:02:46.090482000 1.331785 -1.611482 -1.325902", "text/html": "
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
random_0random_1random_2
2019-12-31 21:00:12.990837600-0.687795-0.6266490.237109
2020-01-02 03:18:11.512674000-0.514778-1.9811370.354644
2020-01-03 04:15:20.703038399-0.600629-0.7617660.169777
2020-01-03 22:47:25.851343200-0.650058-0.5484990.548560
2020-01-05 01:02:46.0904820001.331785-1.611482-1.325902
\n
" }, "execution_count": 49, "metadata": {}, "output_type": "execute_result" } ], "source": [ "random.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can now match the shifted timeseries to the reference index by using a 6-hour window, either for a nearest neighbour search, or for taking a windowed mean. Both should return unchanges timeseries, except for the index." ] }, { "cell_type": "code", "execution_count": 50, "metadata": {}, "outputs": [ { "data": { "text/plain": " shifted_0 shifted_1 shifted_2\n2020-01-01 -0.687795 -0.626649 0.237109\n2020-01-02 -0.514778 -1.981137 0.354644\n2020-01-03 -0.600629 -0.761766 0.169777\n2020-01-04 -0.650058 -0.548499 0.548560\n2020-01-05 1.331785 -1.611482 -1.325902\n... ... ... ...\n2020-12-27 -1.030740 -0.520146 -0.049081\n2020-12-28 -1.221772 -0.770726 0.585857\n2020-12-29 0.420476 -2.558351 -0.797550\n2020-12-30 0.146506 1.014053 -0.629454\n2020-12-31 -1.296130 2.000808 -1.084081\n\n[366 rows x 3 columns]", "text/html": "
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
shifted_0shifted_1shifted_2
2020-01-01-0.687795-0.6266490.237109
2020-01-02-0.514778-1.9811370.354644
2020-01-03-0.600629-0.7617660.169777
2020-01-04-0.650058-0.5484990.548560
2020-01-051.331785-1.611482-1.325902
............
2020-12-27-1.030740-0.520146-0.049081
2020-12-28-1.221772-0.7707260.585857
2020-12-290.420476-2.558351-0.797550
2020-12-300.1465061.014053-0.629454
2020-12-31-1.2961302.000808-1.084081
\n

366 rows × 3 columns

\n
" }, "execution_count": 50, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# match the regularly shifted data\n", "window = pd.Timedelta(hours=6)\n", "matched_shifted_nn = temporal_collocation(ref, shifted, window, method=\"nearest\")\n", "matched_shifted_mean = temporal_collocation(ref, shifted, window, method=\"mean\")\n", "\n", "# the data should be the same before and after matching for both methods\n", "assert np.all(shifted.values == matched_shifted_nn.values)\n", "assert np.all(shifted.values == matched_shifted_mean.values)\n", "\n", "matched_shifted_nn\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can do the same for the randomly shifted timeseries. Here we should see some changes, because sometimes there's no value inside the window that we are looking at. However, the result of mean and nearest neighbour should be the same" ] }, { "cell_type": "code", "execution_count": 51, "metadata": {}, "outputs": [ { "data": { "text/plain": " random_0 random_1 random_2\n2020-01-01 -0.687795 -0.626649 0.237109\n2020-01-02 -0.514778 -1.981137 0.354644\n2020-01-03 -0.600629 -0.761766 0.169777\n2020-01-04 -0.650058 -0.548499 0.548560\n2020-01-05 1.331785 -1.611482 -1.325902\n... ... ... ...\n2020-12-27 -1.030740 -0.520146 -0.049081\n2020-12-28 -1.221772 -0.770726 0.585857\n2020-12-29 0.420476 -2.558351 -0.797550\n2020-12-30 0.146506 1.014053 -0.629454\n2020-12-31 -1.296130 2.000808 -1.084081\n\n[366 rows x 3 columns]", "text/html": "
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
random_0random_1random_2
2020-01-01-0.687795-0.6266490.237109
2020-01-02-0.514778-1.9811370.354644
2020-01-03-0.600629-0.7617660.169777
2020-01-04-0.650058-0.5484990.548560
2020-01-051.331785-1.611482-1.325902
............
2020-12-27-1.030740-0.520146-0.049081
2020-12-28-1.221772-0.7707260.585857
2020-12-290.420476-2.558351-0.797550
2020-12-300.1465061.014053-0.629454
2020-12-31-1.2961302.000808-1.084081
\n

366 rows × 3 columns

\n
" }, "execution_count": 51, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# match the randomly shifted data\n", "matched_random_mean = temporal_collocation(ref, random, window, method=\"mean\")\n", "matched_random_nn = temporal_collocation(ref, random, window, method=\"nearest\")\n", "\n", "# the data should be the same as before matching at the locations where the shift\n", "# was below 6 hours, and should be np.nan when shift was larger\n", "should_be_nan = np.abs(random_shift) > 6\n", "\n", "assert np.all(matched_random_nn[~should_be_nan].values == random[~should_be_nan].values)\n", "assert np.all(np.isnan(matched_random_nn[should_be_nan].values))\n", "\n", "for c in matched_random_nn.columns:\n", " df = pd.DataFrame(index=range(366),\n", " data={'mean': matched_random_mean[c].values,\n", " 'nn': matched_random_nn[c].values}).dropna()\n", " np.testing.assert_almost_equal(\n", " df.diff(axis=1).iloc[:, 1].values,\n", " np.zeros(df.index.size),\n", " decimal=4)\n", "\n", "matched_random_nn" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Returning the original index\n", "\n", "`temporal_collocation` can also return the original index of the data that was matched as a separate column in the resulting DataFrame, if required, and can additionally also calculate the distance to the reference. The column names are \"index_other\" and \"distance_other\", respectively." ] }, { "cell_type": "code", "execution_count": 52, "metadata": {}, "outputs": [ { "data": { "text/plain": " shifted_0 shifted_1 shifted_2 index_other \\\n2020-01-01 -0.687795 -0.626649 0.237109 2020-01-01 03:00:00 \n2020-01-02 -0.514778 -1.981137 0.354644 2020-01-02 03:00:00 \n2020-01-03 -0.600629 -0.761766 0.169777 2020-01-03 03:00:00 \n2020-01-04 -0.650058 -0.548499 0.548560 2020-01-04 03:00:00 \n2020-01-05 1.331785 -1.611482 -1.325902 2020-01-05 03:00:00 \n... ... ... ... ... \n2020-12-27 -1.030740 -0.520146 -0.049081 2020-12-27 03:00:00 \n2020-12-28 -1.221772 -0.770726 0.585857 2020-12-28 03:00:00 \n2020-12-29 0.420476 -2.558351 -0.797550 2020-12-29 03:00:00 \n2020-12-30 0.146506 1.014053 -0.629454 2020-12-30 03:00:00 \n2020-12-31 -1.296130 2.000808 -1.084081 2020-12-31 03:00:00 \n\n distance_other \n2020-01-01 0 days 03:00:00 \n2020-01-02 0 days 03:00:00 \n2020-01-03 0 days 03:00:00 \n2020-01-04 0 days 03:00:00 \n2020-01-05 0 days 03:00:00 \n... ... \n2020-12-27 0 days 03:00:00 \n2020-12-28 0 days 03:00:00 \n2020-12-29 0 days 03:00:00 \n2020-12-30 0 days 03:00:00 \n2020-12-31 0 days 03:00:00 \n\n[366 rows x 5 columns]", "text/html": "
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
shifted_0shifted_1shifted_2index_otherdistance_other
2020-01-01-0.687795-0.6266490.2371092020-01-01 03:00:000 days 03:00:00
2020-01-02-0.514778-1.9811370.3546442020-01-02 03:00:000 days 03:00:00
2020-01-03-0.600629-0.7617660.1697772020-01-03 03:00:000 days 03:00:00
2020-01-04-0.650058-0.5484990.5485602020-01-04 03:00:000 days 03:00:00
2020-01-051.331785-1.611482-1.3259022020-01-05 03:00:000 days 03:00:00
..................
2020-12-27-1.030740-0.520146-0.0490812020-12-27 03:00:000 days 03:00:00
2020-12-28-1.221772-0.7707260.5858572020-12-28 03:00:000 days 03:00:00
2020-12-290.420476-2.558351-0.7975502020-12-29 03:00:000 days 03:00:00
2020-12-300.1465061.014053-0.6294542020-12-30 03:00:000 days 03:00:00
2020-12-31-1.2961302.000808-1.0840812020-12-31 03:00:000 days 03:00:00
\n

366 rows × 5 columns

\n
" }, "execution_count": 52, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# also return original index and distance\n", "matched_shifted = temporal_collocation(\n", " ref, shifted, window, \n", " method=\"nearest\", \n", " return_index=True, \n", " return_distance=True\n", ")\n", "\n", "# the index should be the same as unmatched, and the distance should be 3 hours\n", "assert np.all(matched_shifted[\"index_other\"].values == shifted.index.values)\n", "assert np.all(matched_shifted[\"distance_other\"] == pd.Timedelta(hours=3))\n", "\n", "matched_shifted" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Flags\n", "\n", "Satellite data often contains flags indicating quality issues with the data. With `temporal_collocation` it is possible to use this information. Flags can either be provided as array (of the same length as the input DataFrame), or the name of a column in the DataFrame to be used as flag can be provided as string. Any non-zero flag is interpreted as indicating invalid data. By default this will not be used, but when passing ``use_invalid=True``, the invalid values will be used in case no valid match was found.\n", "\n", "For the following example, we reuse the input data shifted by 3 hours, but we will now assume that the first 3 observations had quality issues." ] }, { "cell_type": "code", "execution_count": 53, "metadata": {}, "outputs": [ { "data": { "text/plain": "array([ True, True, True, False, False, False, False, False, False,\n False])" }, "execution_count": 53, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# flag the first 3 observations as invalid\n", "flag = np.zeros(len(ref), dtype=bool)\n", "flag[0:3] = True\n", "flag[0:10]" ] }, { "cell_type": "code", "execution_count": 54, "metadata": {}, "outputs": [ { "data": { "text/plain": " shifted_0 shifted_1 shifted_2\n2020-01-01 NaN NaN NaN\n2020-01-02 NaN NaN NaN\n2020-01-03 NaN NaN NaN\n2020-01-04 -0.650058 -0.548499 0.548560\n2020-01-05 1.331785 -1.611482 -1.325902\n... ... ... ...\n2020-12-27 -1.030740 -0.520146 -0.049081\n2020-12-28 -1.221772 -0.770726 0.585857\n2020-12-29 0.420476 -2.558351 -0.797550\n2020-12-30 0.146506 1.014053 -0.629454\n2020-12-31 -1.296130 2.000808 -1.084081\n\n[366 rows x 3 columns]", "text/html": "
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
shifted_0shifted_1shifted_2
2020-01-01NaNNaNNaN
2020-01-02NaNNaNNaN
2020-01-03NaNNaNNaN
2020-01-04-0.650058-0.5484990.548560
2020-01-051.331785-1.611482-1.325902
............
2020-12-27-1.030740-0.520146-0.049081
2020-12-28-1.221772-0.7707260.585857
2020-12-290.420476-2.558351-0.797550
2020-12-300.1465061.014053-0.629454
2020-12-31-1.2961302.000808-1.084081
\n

366 rows × 3 columns

\n
" }, "execution_count": 54, "metadata": {}, "output_type": "execute_result" } ], "source": [ "matched_flagged = temporal_collocation(ref, shifted, window, flag=flag)\n", "\n", "# the first 3 values should be NaN, otherwise the result should be the same as matched_shifted\n", "assert np.all(np.isnan(matched_flagged.values[0:3, :]))\n", "assert np.all(matched_flagged.values[3:, :] == matched_shifted.values[3:, 0:3]) # excluding additonal columns\n", "matched_flagged" ] }, { "cell_type": "code", "execution_count": 55, "metadata": {}, "outputs": [ { "data": { "text/plain": " shifted_0 shifted_1 shifted_2 my_flag\n2020-01-01 NaN NaN NaN NaN\n2020-01-02 NaN NaN NaN NaN\n2020-01-03 NaN NaN NaN NaN\n2020-01-04 -0.650058 -0.548499 0.548560 False\n2020-01-05 1.331785 -1.611482 -1.325902 False\n... ... ... ... ...\n2020-12-27 -1.030740 -0.520146 -0.049081 False\n2020-12-28 -1.221772 -0.770726 0.585857 False\n2020-12-29 0.420476 -2.558351 -0.797550 False\n2020-12-30 0.146506 1.014053 -0.629454 False\n2020-12-31 -1.296130 2.000808 -1.084081 False\n\n[366 rows x 4 columns]", "text/html": "
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
shifted_0shifted_1shifted_2my_flag
2020-01-01NaNNaNNaNNaN
2020-01-02NaNNaNNaNNaN
2020-01-03NaNNaNNaNNaN
2020-01-04-0.650058-0.5484990.548560False
2020-01-051.331785-1.611482-1.325902False
...............
2020-12-27-1.030740-0.520146-0.049081False
2020-12-28-1.221772-0.7707260.585857False
2020-12-290.420476-2.558351-0.797550False
2020-12-300.1465061.014053-0.629454False
2020-12-31-1.2961302.000808-1.084081False
\n

366 rows × 4 columns

\n
" }, "execution_count": 55, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# This also works when the flag is already in the input frame, but note that\n", "# in the output frame the nonzero flag values are replaced by NaN\n", "flagged = shifted.assign(my_flag=flag)\n", "matched_flagged = temporal_collocation(ref, flagged, window, flag=\"my_flag\")\n", "\n", "# the first 3 values should be NaN, otherwise the result should be the same as matched_shifted\n", "assert np.all(np.isnan(matched_flagged.iloc[0:3, 0:3].values))\n", "assert np.all(matched_flagged.iloc[3:, 0:3].values == matched_shifted.values[3:, 0:3]) # excluding additonal columns\n", "matched_flagged" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Combined collocation\n", "\n", "It is also possible to match multiple timeseries together against a reference dataset using the function `pytesmo.temporal_matching.combined_temporal_collocation`. With the keyword argument `combined_dropna` it's possible to drop data where one of the input datasets has missing values from all other input datasets." ] }, { "cell_type": "code", "execution_count": 56, "metadata": {}, "outputs": [ { "data": { "text/plain": " random_0 random_1 random_2 shifted_0 shifted_1 shifted_2\n2020-01-01 -0.687795 -0.626649 0.237109 -0.687795 -0.626649 0.237109\n2020-01-02 -0.514778 -1.981137 0.354644 -0.514778 -1.981137 0.354644\n2020-01-03 -0.600629 -0.761766 0.169777 -0.600629 -0.761766 0.169777\n2020-01-04 -0.650058 -0.548499 0.548560 -0.650058 -0.548499 0.548560\n2020-01-05 1.331785 -1.611482 -1.325902 1.331785 -1.611482 -1.325902\n... ... ... ... ... ... ...\n2020-12-27 -1.030740 -0.520146 -0.049081 -1.030740 -0.520146 -0.049081\n2020-12-28 -1.221772 -0.770726 0.585857 -1.221772 -0.770726 0.585857\n2020-12-29 0.420476 -2.558351 -0.797550 0.420476 -2.558351 -0.797550\n2020-12-30 0.146506 1.014053 -0.629454 0.146506 1.014053 -0.629454\n2020-12-31 -1.296130 2.000808 -1.084081 -1.296130 2.000808 -1.084081\n\n[366 rows x 6 columns]", "text/html": "
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
random_0random_1random_2shifted_0shifted_1shifted_2
2020-01-01-0.687795-0.6266490.237109-0.687795-0.6266490.237109
2020-01-02-0.514778-1.9811370.354644-0.514778-1.9811370.354644
2020-01-03-0.600629-0.7617660.169777-0.600629-0.7617660.169777
2020-01-04-0.650058-0.5484990.548560-0.650058-0.5484990.548560
2020-01-051.331785-1.611482-1.3259021.331785-1.611482-1.325902
.....................
2020-12-27-1.030740-0.520146-0.049081-1.030740-0.520146-0.049081
2020-12-28-1.221772-0.7707260.585857-1.221772-0.7707260.585857
2020-12-290.420476-2.558351-0.7975500.420476-2.558351-0.797550
2020-12-300.1465061.014053-0.6294540.1465061.014053-0.629454
2020-12-31-1.2961302.000808-1.084081-1.2961302.000808-1.084081
\n

366 rows × 6 columns

\n
" }, "execution_count": 56, "metadata": {}, "output_type": "execute_result" } ], "source": [ "combined_match = combined_temporal_collocation(ref, (random, shifted), window, combined_dropna=True)\n", "# matched dataframe should have same length as matched_random_nn without NaNs\n", "assert len(combined_match == len(matched_random_nn.dropna()))\n", "combined_match" ] } ], "metadata": { "kernelspec": { "display_name": "Python [conda env:pytesmo] *", "language": "python", "name": "conda-env-pytesmo-py" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.9" } }, "nbformat": 4, "nbformat_minor": 4 }