Elastic Measures¶

Elastic measures are popular for time series analysis. Using elastic measures, the two time series do not need to be the same length, providing much more flexibility. The elastic measures implemented in this library include Dynamic Time Warping (DTW), its variations such as weighted-DTW and derivative-DTW, and edit-based distance measures such as Longest Common Subsequence (LCSS), Edit Distance with Real Penalty (ERP), Edit Distance on Real Sequences (EDR), Move Split Merge (MSM), and Time Warp Edit Distance (TWED).

General Formula¶

In general, elastic measures use a dynamic programming algorithm with the goal of minimize the total distance between the two series. This has an effect of “aligning” the time series which is useful for time-series that are out-of-sync or otherwise unaligned in ways that non-elastic measures could not distinguish.

Through this section, we will have two time series X and Y represented as such:

\[\begin{split}\begin{aligned} X = x_1,\ x_2,\ x_3,\ ...,\ x_n \\ Y = y_1,\ y_2,\ y_3,\ ...,\ y_m \\ \end{aligned}\end{split}\]

Additionally, we will have a array of size $(m,n)$ represented as:

\[\begin{equation*} DP = d_{1,1},\ d_{1,2},\ ...,\ d_{1,m},\ d_{2,m},\ ...,\ d_{n,m} \end{equation*}\]

As the elastic measures runs, it produces a warping path $W = w_1, w_2, ..., w_k = (x_1,y_1), (x_i, y_i), ..., (x_n,y_m)$ where $max(m,n) \leq k \leq m + n - 1$. In warping path element, the algorithm takes one of three steps. If $w_{k-1} = (x_{i-1}, y_i)$, then the step was horizontal. If $w_{k-1} = (x_{i}, y_{i-1})$, then the step was vertical. And if $w_{k-1} = (x_{i-1}, y_{i-1})$, the step was diagonal. Along with this, each elastic measure uses a distance function D to determine the distance between the elements of each time series. D can be divided into 4 components, $D^{v},D^{h},D^d$, and $D^u$ or distance from a vertical step, distance from a horizontal step, distance from a diagonal step, and undirectional distance. These components are the same in some measures and differ in others. The elastic distance of the two time series is $\sum_{i = 1}^{k}D(w_i)$ where $D$ is one of the component listed above. Alternatively, the elastic distance can be written as:

\[\begin{equation} ElasticDistance(X,Y) = \pi(d_{n,m}) \end{equation}\]

where

\[\begin{split}\begin{equation} d_{i,j} = \begin{cases} D^u(x_i,y_j) & \text{if $i,j = 1$}\\ d_{i-1,j} + D^v(x_i,y_j) & \text{if $i \neq 1$ and $j = 1$} \\ d_{i,j-1} + D^h(x_i,y_j) & \text{if $i = 1$ and $j \neq 1$} \\ min(d_{i-1,j-1} + D^d(x_i,y_j), d_{i-1,j}\\ + D^h(x_i,y_j), d_{i,j-1} + D^v(x_i,y_j)) & \text{if $i, j \neq 1$, $i \leq n$, $j \leq m$} \end{cases} \end{equation}\end{split}\]

The function $\pi$ is a final function depending on the measure. Throughout this paper, we will define $\pi$ and D as a simplistic way to define each function. If the the distance function differs depending on the direction that will be detailed in the measure. Additionally, certain measures will have other parameters that must be specified. These will be discussed in where applicable. In the worst case, elastic measures will take $O(mn)$ time to complete as each item within the time series are compared to each other before the final comparison is computed. There are various methods to improve this run-time, such as restricting the range of comparing values. Additionally, the space complexity in the worst case is an array of $O(mn)$ but this can be reduced to a linear size complexity as only the immediately previous values are used.

Dynamic Time Warping (DTW) and DTW variants¶

tsdistance.elastic.dtw(x, y, w=None, fast=True)¶

Dynamic Time Warping (DTW) [1]_ utilizes dynamic programming to find the optimal alignment between elements of times series $X = (x_{1}, x_{2}, ..., x_{n})$ and $Y = (y_{1}, y_{2}, ..., y_m)$ by constructing a distance matrix $M$ of shape $(n, m)$ with the following forumla:

\[\begin{split}M_{i, j} = \begin{cases} 0, \ i = j = 0 \\ d_{x_i, y_j} + min \begin{cases} M_{i-1, j}\\ M_{i, j-1} \\ M_{i-1, j-1}\\ \end{cases} where \ d_{x_i, y_j}=|x_i - y_j| \end{cases}\end{split}\]

and return the DTW distance $M_{n, m}$.

Parameters:

X (np.array) – a time series
Y (np.array) – another time series
w (float, optional) – If constraint = "Sakoe-Chiba" , w is the largest temporal shift allowed between two time series. w defaults to None, in which case the warping window is the length of the longest time series.
fast (bool, optional) – whether or not to use fast (Numba) implementation, default to True

Returns:

DTW distance

Example:

Input:

from tsdistance.ElasticMeasures import dtw
import numpy as np

X = np.array([3, 2, 4, 5, 5, 2, 4, 7, 9, 8])
Y = np.array([3, 3, 3, 1, 6, 9, 9])

dtw_dist = dtw(X, Y, 'Sakoe-Chiba', 3)
dtw_dist

Output:

4.123105625617661

References

tsdistance.elastic.ddtw(x, y, constraint=None, w=None)¶

DTW Lower Bounds¶

tsdistance.elastic.glb_dtw(y, x, w)¶

The Generalized Lower Bound (GLB) variant for DTW [1] that achieves high pruning power while caches reusable computations. GLB_DTW is a state-of-the-art DTW lower bound for DTW. The function can be imported using from tsdistance.elastic import glb_dtw.

\[\begin{split}\begin{equation} GLB\_DTW = \sqrt{(x_i - y_i)^2 + (x_{L_{\textbf{x}}} - y_{L_{\textbf{y}}})^2 + max\begin{cases} \delta_{DTW}(\textbf{y}, \textbf{x}, w) \\ \delta_{DTW}(\textbf{x}, \textbf{y}, w) \\ \end{cases}} \end{equation}\end{split}\]

where the delta functions are defined as:

\[ \begin{align}\begin{aligned}\begin{split}\delta_{DTW}(\textbf{x}, \textbf{y}, w)= \sum_{j=2}^{L_{\textbf{y}} -1}\begin{cases} (y_j - UE(\textbf{x})_j)^2 \textup{ for } y_j \geq{UE(\textbf{x})_j} \\ (y_j- LE(\textbf{x})_j)^2 \textup{ for } y_j \leq{LE(\textbf{x})_j} \\ 0 \textup{ otherwise } \end{cases}\end{split}\\\begin{split}\delta_{DTW}(\textbf{y}, \textbf{x}, w) = \sum_{i=2}^{L_{\textbf{x}} -1}\begin{cases} (x_i - UE(\textbf{y})_i)^2 \textup{ for } x_i \geq{UE(\textbf{y})_i} \\ (x_i - LE(\textbf{y})_i)^2 \textup{ for } x_i \leq{LE(\textbf{y})_i} \\ 0 \textup{ otherwise } \end{cases}\end{split}\end{aligned}\end{align} \]

Parameters:

y (np.array) – a time series
x (np.array) – another time series
w (int) – w is the largest temporal shift allowed between two time series

Returns:

The GLB_DTW distance

Return type:

float

References

tsdistance.elastic.lb_kim(y, x, fast=True)¶

LB_Kim [1]_ is the fastest DTW lower bound with $O(1)$ complexity, defined as the following:

\[\begin{split}\begin{equation} LB\_Kim(X, Y) = max\begin{cases} |X_1 - Y_1| \\ |X_L - Y_L| \\ |X_{max} - Y_{max}| \\ |X_{min} - Y_{min}| \end{cases} \end{equation}\end{split}\]

Parameters:

y (np.array) – a time series
x (np.array) – another time series
fast (bool, optional) – whether or not to use fast (Numba) implementation, default to True

Returns:

LB_Kim distance

Example:

Input:

from tsdistance.elastic import lb_kim
import numpy as np

X = np.array([3, 2, 4, 5, 5, 2, 4, 7, 9, 8])
Y = np.array([3, 3, 3, 1, 6, 9, 9])

lb_kim_dist = lb_kim(X, Y)
print(lb_kim_dist)

Output:

References

tsdistance.elastic.lb_keogh(y, x, w=None, fast=True)¶

LB_Keogh [1]_ is one of the most popular dtw lower bounds. LB_Keogh constructs upper and lower envelopes of query series and compute distance between these query envelopes and the candidate series, while utilizing a window :math:w to limit warping (shifted alignements). LB_Keogh is formally defined as:

\[\begin{split}\begin{equation} LB\_Keogh(X, Y) = \sqrt{\sum_{i=1}^{L_X}\begin{cases} (Y_{i} - UE_{i} )^2 \mbox{ if } Y_{i} > UE_{i} \\ (Y_{i} - LE_{i})^2 \mbox{ if } Y_{i} < LE_{i} \\ 0 \mbox{ otherwise } \end{cases}} \end{equation}\end{split}\]

Parameters:

y (np.array) – a time series
x (np.array) – another time series
w (float, optional) – w is the largest temporal shift allowed between two time series, defaulting to None
fast (bool, optional) – whether or not to use fast (Numba) implementation, defaulting to True

Returns:

LB_Keogh distance

Example:

Input:

from elastic import lb_keogh
import numpy as np

X = np.array([3, 2, 14, 5, 35, 2, 4, 7, 19, 8])
Y = np.array([3, -13, 3, 1, -6, 9, 9])

lb_keogh_dist = lb_keogh(Y, X, 4)
print(lb_keogh_dist)

Output:

17.029386365926403

References

tsdistance.elastic.lb_new(y, x, w=None, fast=True)¶

LB_New [1]_ is a tighter and more expensive lower bound of DTW relative to LB_Keogh. LB_New is formally defined as:

\[\begin{equation} LB\_New = \sqrt{(x_1 - y_1)^2 + (x_{L_X}-y_{L_Y})^2 + \sum_{j=2}^{L_x-1}\delta(x_j, Y_i)} \end{equation}\]

Parameters:

y (np.array) – a time series
x (np.array) – another time series
w (float, optional) – w is the largest temporal shift allowed between two time series, defaulting to None
fast (bool, optional) – whether or not to use fast (Numba) implementation, defaulting to True

Returns:

LB_New distance

Example:

Input:

from tsdistance.elastic import lb_new
import numpy as np

X = np.array([3, 2, 14, 5, 35, 2, 4, 7, 19, 8])
Y = np.array([3, -13, 3, 1, -6, 9, 9])

lb_new_dist = lb_new(Y, X, 4)
print(lb_new_dist)

Output:

17.08800749063506

References

tsdistance.elastic.lb_improved(x, y, w=None, fast=True)¶

Longest Common Subsequence (LCSS) and Derivative LCSS (DLCSS)¶

tsdistance.elastic.lcss(x, y, epsilon, w=None, fast=True)¶

tsdistance.elastic.dlcss(x, y, epsilon, constraint=None, w=None)¶

LCSS Lower Bounds¶

tsdistance.elastic.glb_lcss(y, x, epsilon, w)¶

The Generalized Lower Bound (GLB) variant for LCSS [1] that achieves high pruning power while caching reusable computations. GLB_LCSS is a state-of-the-art LCSS lower bound. The function can be imported using from tsdistance.elastic import glb_lcss.

Parameters:

y (np.array) – a time series
x (np.array) – another time series
epsilon (float) – the matching threshold
w (int) – w is the largest temporal shift allowed between two time series

Returns:

The GLB_LCSS distance

Return type:

float

References

tsdistance.elastic.lb_keogh_lcss(y, x, epsilon, w=None, fast=True)¶

The LCSS lower bound, LB_Keogh_LCSS [1]_ is adapted from LB_Keogh [2] and considers the LCSS matching threshold :math:`epsilon`in constructing envelopes.

\[\begin{split}\begin{equation} LB\_LCSS(X, Y) = 1 - \frac{1}{L_Y}\overset{L_Y}{\underset{i=1}{\sum}}\begin{cases} 1 \mbox{ if } LE_i \leq{Y_i} \leq{UE_i} \\ 0 \mbox{ otherwise } \end{cases} \end{equation}\end{split}\]

Parameters:

y (np.array) – a time series
x (np.array) – another time series
epsilon (float) – the matching threshold
w (float, optional) – w is the largest temporal shift allowed between two time series, defaulting to None
fast (bool, optional) – whether or not to use fast (Numba) implementation, defaulting to True

Returns:

LB_Keogh_LCSS distance

Example:

Input:

from elastic import lb_keogh_lcss
import numpy as np

X = np.array([3, 2, 14, 5, 35, 2, 4])
Y = np.array([3, -13, 3, 1, -6, 9, 9])
lb_keogh_lcss_dist = lb_keogh_lcss(Y, X, 4)
print(lb_keogh_lcss_dist)

Output:

0.2857142857142857

References

Sequence Weighted Alignment (SWALE)¶

tsdistance.elastic.swale(x, y, p, r, epsilon, constraint=None, w=None, fast=True)¶

SWALE Lower Bound¶

tsdistance.elastic.glb_swale(x, y, p, r, epsilon, w)¶

The Generalized Lower Bound (GLB) variant for SWALE [1] that achieves high pruning power while caching reusable computations. The function can be imported using from tsdistance.elastic import glb_swale.

Parameters:

x (np.array) – a time series
y (np.array) – another time series
p (float) – the penalty for mismatches
r (float) – the reward for matches
epsilon (float) – the matching threshold
w (int) – the largest temporal shift allowed between two time series

Returns:

The GLB_SWALE distance

Return type:

float

References

Edit Distance with Real Penalty (ERP)¶

tsdistance.elastic.erp(x, y, m, constraint=None, w=None, fast=True)¶

ERP Lower Bounds¶

tsdistance.elastic.glb_erp(y, x, m, w)¶

tsdistance.elastic.lb_erp(x, y, fast=True)¶

tsdistance.elastic.lb_kim_erp(y, x, m, fast=True)¶

tsdistance.elastic.lb_keogh_erp(y, x, m, w=None, fast=True)¶

Edit Distance on Real Sequences (EDR)¶

tsdistance.elastic.edr(x, y, m=0, constraint=None, w=100, fast=True)¶

EDR Lower Bound¶

tsdistance.elastic.glb_edr(x, y, epsilon, w)¶

The Generalized Lower Bound (GLB) variant for EDR [1] that achieves high pruning power while caching reusable computations. The function can be imported using from tsdistance.elastic import glb_edr.

Parameters:

x (np.array) – a time series
y (np.array) – another time series
epsilon (float) – the matching threshold
w (int) – the largest temporal shift allowed between two time series

Returns:

The GLB_EDR distance

Return type:

float

References

Time Warp Edit Distance (TWED)¶

tsdistance.elastic.twed(x, timesx, y, timesy, lamb, nu, constraint=None, w=None, fast=True)¶

TWED Lower Bound¶

tsdistance.elastic.glb_twed(x, y, lamb, w)¶

tsdistance.elastic.lb_twed(y, x, lamb, nu, w=None, fast=True)¶

Move-Split-Merge (MSM)¶

tsdistance.elastic.msm(x, y, c, constraint=None, w=None, fast=True)¶

MSM Lower Bound¶

tsdistance.elastic.glb_msm(x, y, c, w)¶

The Generalized Lower Bound (GLB) variant for MSM [1] that achieves high pruning power while caching reusable computations. GLB_MSM is a state-of-the-art MSM lower bound, which can be imported using from tsdistance.elastic import glb_msm.

Parameters:

x (np.array) – a time series
y (np.array) – another time series
c (float) – the cost for move or split operation for MSM
w (int) – the largest temporal shift allowed between two time series

Returns:

The GLB_MSM distance

Return type:

float

References

tsdistance.elastic.lb_msm(y, x, c, w=None, fast=True)¶

LB_MSM is an MSM lower bound which constructs upper and lower envelopes, formally defined as:

\[\begin{split}\begin{equation} LB\_MSM(X, Y) = |X_1 - Y_1| + \overset{L_Y}{\underset{i=2}{\sum}}\begin{cases} min(|Y_i - X_{max}|, c) \textup{ if } Y_{i-1}\geq{Y_{i}} > X_{max} \\ min(|Y_i - X_{min}|, c) \textup{ if } Y_{i-1}\leq{Y_{i}} < X_{min} \\ 0 \textup{ otherwise } \end{cases} \end{equation}\end{split}\]

Parameters:

y (np.array) – a time series
x (np.array) – another time series
c (float) – the cost for one move or split* operation
w (float, optional) – w is the largest temporal shift allowed between two time series, defaulting to None
fast (bool, optional) – whether or not to use fast (Numba) implementation, defaulting to True

Returns:

LB_MSM distance

Example:

Input:

from elastic import lb_msm
import numpy as np

X = np.array([-3, 2, -4, -5, -3, -2, -93])
Y = np.array([3, 13, 3, 1, 6, 9, 9])
lb_msm_dist = lb_msm(Y, X, c = 2, w = 3)
print(lb_msm_dist)

Output:

9.0

References