Lockstep Measures¶
Lockstep measures involve some element-wise comparison between two time series. This restricts the time series to those of the same length. Despite this restriction, lock-step measures are very versatile due to the sheer variety of them. Here we will provide a equation and short description of each one in our library.
The usage of Lockstep measures is consistent across all measures.
Below is an example of using manhattan distance:
from tsdistance.lockstep import manhattan
import numpy as np
ts1 = np.array([1, 2, 3, 4, 5, 9, 7])
ts2 = np.array([8, 9, 9, 7, 3, 1, 2])
dist_manhattan = manhattan(ts1, ts2)
print(dist_manhattan)
Output:
38
Minkowski Functions¶
The Minkowski functions include Euclidean distance, Manhattan Distance and Chebyshev’s distance. They are all variations of:
- tsdistance.lockstep.minkowski(x, y, p)¶
The formula for minkowski function is: \(\begin{equation*}(\sum_{i=1}^n |X_i - Y_i|^p)^{\frac{1}{p}}\end{equation*}\)
- Parameters:
x (np.array) – a time series
y (np.array) – another time series
p (float) – parameter for \(p\) in the formula above
- Returns:
the Minkowski distance
- tsdistance.lockstep.abs_euclidean(x, y)¶
Euclidean distance is our most intuitive way of defining distance as that’s how we define it in our physical world. The formula is: \(\sqrt{\sum_{i=1}^n(X_i - Y_i)^2}\)
- Parameters:
x (np.array) – a time series
y (np.array) – another time series
- Returns:
the Euclidean distance
- tsdistance.lockstep.manhattan(x, y)¶
Manhattan distance is when \(p = 1\). Manhattan distance is often called city-block distance as in the 2-dimensional case it is often represented using city-blocks. Manhattan distance’s advantage is that outliers skew the result less than in Chebyshev or Euclidean distance. The formula is: \(\sum_{i=1}^n |X_i - Y_i|\)
- Parameters:
x (np.array) – a time series
y (np.array) – another time series
- Returns:
the Manhattan distance
- tsdistance.lockstep.chebyshev(x, y)¶
{Chebyshev distance} is represented as the limit as p tends towards infinity. Chebyshev distance is computed as: \(max_i(X_i - Y_i) = \lim_{p \rightarrow \infty} (\sum_{i=1}^n |X_i - Y_i|^p)^{\frac{1}{p}}\)
- Parameters:
x (np.array) – a time series
y (np.array) – another time series
- Returns:
the Chebyshev distance
L1 Functions¶
The \(L_1\) functions all involve using the Manhattan metric in some fashion, see formula for each approach below for details.
- tsdistance.lockstep.sorensen(x, y)¶
Sorensen distance is the \(L_1\) distance but divided by the sum of the two time series. Because of this, the range of the Sorensen distance is \([0,1]\). It is often used in ecology and environmental sciences. The formula is: \(\frac{\sum_{i=1}^n |X_i - Y_i|}{\sum_{i=1}^n(X_i + Y_i)}\)
- Parameters:
x (np.array) – a time series
y (np.array) – another time series
- Returns:
the Sorensen distance
- tsdistance.lockstep.gower(x, y)¶
Gower distance is the average distance between the elements. It is often used for mixed qualitative and quantitative data. The formula is: \(\frac{1}{n} * \sum_{i=1}^n |X_i - Y_i|\)
- Parameters:
x (np.array) – a time series
y (np.array) – another time series
- Returns:
the Gower distance
- tsdistance.lockstep.soergel(x, y)¶
Soergel distance is the \(L_1\) distance divided by the sum of the maximum of each element pair. The formula is: \(\frac{\sum_{i=1}^n |X_i - Y_i|}{\sum_{i=1}^n max(X_i,Y_i)}\)
- Parameters:
x (np.array) – a time series
y (np.array) – another time series
- Returns:
the Soergel distance
- tsdistance.lockstep.Kulczynski(x, y)¶
Kulczynski distance is very similar but the \(L_1\) distance is divided by the sum of the minimum of each element pair. The formula is: \(\frac{\sum_{i=1}^n|X_i - Y_i|}{\sum_{i=1}^n min(X_i,Y_i)}\).
- Parameters:
x (np.array) – a time series
y (np.array) – another time series
- Returns:
the Kulczynski distance
- tsdistance.lockstep.canberra(x, y)¶
Canberra distance is the \(L_1\) distance but each element difference is divided by the element sum. Canberra distance is often used for data scattered about an origin. The formula is: \(\sum \frac{|X_i - Y_i|}{X_i + Y_i}\)
- Parameters:
x (np.array) – a time series
y (np.array) – another time series
- Returns:
the Canberra distance
- tsdistance.lockstep.lorentzian(x, y)¶
Lorentzian distance is the natural log of the \(L_1\) distance between to time series. To avoid \(ln(0)\) and guarantee non-negative distances 1 is added. The formula is: \(sum_{i=1}^n ln(1 + |X_i - Y_i|)\)
- Parameters:
x (np.array) – a time series
y (np.array) – another time series
- Returns:
the Lorentzian distance
- tsdistance.lockstep.Intersection(x, y)¶
Intersection Functions¶
The intersection family of functions have a strong relationship with the \(L_1\) family of functions. Many of the intersection functions can be converted to \(L_1\) by replacing \(\min(X_i,Y_i)\) with \(\frac{|X_i, Y_i|}{2}\). One commonality between the intersection family of functions is the use of the element-wise minimum of the two time series.
- tsdistance.lockstep.wave_hedges(x, y)¶
Wave Hedges distance is the length of the time series subtracted by the sum of the ratio of the minimum and maximum of each element pair. The formula is: \(\sum_{i=1}^n1 - \frac{min(X_i,Y_i)}{max(X_i,Y_i)}\)
- Parameters:
x (np.array) – a time series
y (np.array) – another time series
- Returns:
the Wave Hedges distance
- tsdistance.lockstep.czekanowski(x, y)¶
Czekanowski distance is the intersection equivalent of Sorensen. It is the sum of the minimums of each element pair divided by the sum of the elements multiplied by 2. The formula is: \(2\frac{\sum_{i=1}^nmin(X_i,Y_i)}{\sum_{i=1}^nX_i + Y_i}\)
- Parameters:
x (np.array) – a time series
y (np.array) – another time series
- Returns:
the Czekanowski distance
- tsdistance.lockstep.motyka(x, y)¶
- tsdistance.lockstep.tanimoto(x, y)¶
Inner Product Functions¶
The inner product functions all use the sum of pairwise multiplication of the elements from both time series.
- tsdistance.lockstep.innerproduct(x, y)¶
Inner Product distance is the dot product between two time series. The formula is: \(\sum_{i=1}^n (X_iY_i)\)
- Parameters:
x (np.array) – a time series
y (np.array) – another time series
- Returns:
the Inner Product distance
- tsdistance.lockstep.harmonicmean(x, y)¶
- tsdistance.lockstep.kumarhassebrook(x, y)¶
Kumar-Hassebrook distance is like harmonic mean distance but the denominator is reduced by the product of the elements. The formula is: \(\frac{\sum_{i=1}^nX_iY_i}{\sum_{i=1}^n(X_i + Y_i) - \sum_{i=1}^n X_iY_i}\)
- Parameters:
x (np.array) – a time series
y (np.array) – another time series
- Returns:
the Kumar-Hassebrook distance
- tsdistance.lockstep.jaccard(x, y)¶
- tsdistance.lockstep.cosine(x, y)¶
Cosine distance is the complement of the cosine similarity that measures the angle between two vectors. As compared to the Inner Product distance, Cosine distance does not take the time series magnitude into account. The formula is: \(1 - \frac{\sum_{i=1}^n X_iY_i}{\sqrt{\sum_{i=1}^nX_i^2}\sqrt{\sum{i=1}^nY_i^2}}\)
- Parameters:
x (np.array) – a time series
y (np.array) – another time series
- Returns:
the Cosine distance
- tsdistance.lockstep.dice(x, y)¶
Dice distance is the complement of the Dice similarity. It is not a metric but it is widely used in biological taxonomy. The formula is: \(1 - \frac{2\sum_{i=1}^nX_iY_i}{\sum_{i=1}^nX^2 + y^2}\)
- Parameters:
x (np.array) – a time series
y (np.array) – another time series
- Returns:
the Dice distance
Squared Chord Functions¶
The Squared Chord functions are a set of geometric mean distances. Thus, these distance functions are not compatible with negative values in either time series.
- tsdistance.lockstep.fidelity(x, y)¶
Fidelity distance is the sum of the square root of the element-wise product of elements from two time series. The formula is: \(\sum_{i = 1}^n \sqrt{X_iY_i}\)
- Parameters:
x (np.array) – a time series
y (np.array) – another time series
- Returns:
the Fidelity distance
- tsdistance.lockstep.bhattacharyya(x, y)¶
- tsdistance.lockstep.Square_chord(x, y)¶
Squared Chord distance is the sum of the square of the differences of the square roots of each element. This exaggerates more dissimilar features. The formula is: \(\sum_{i=1}^n(\sqrt{X_i}-\sqrt{Y_i})^2\)
- Parameters:
x (np.array) – a time series
y (np.array) – another time series
- Returns:
the Squared Chord distance
- tsdistance.lockstep.hellinger(x, y)¶
- tsdistance.lockstep.matusita(x, y)¶
Squared L2 Functions¶
The squared \(L_2\) distance functions are a group of distance measures that all have \((X_i - Y_i)^2\) as the base.
- tsdistance.lockstep.squared_euclidean(x, y)¶
Squared Euclidean distance is the square of the Euclidean distance. The formula is: \(\sum_{i=1}^n (X_i - Y_i)^2\)
- Parameters:
x (np.array) – a time series
y (np.array) – another time series
- Returns:
the Squared Euclidean distance
- tsdistance.lockstep.clark(x, y)¶
- tsdistance.lockstep.neyman(x, y)¶
Neyman Chi Squared distance is the sum of squared difference of the element pairs divided by the element in the first time series. The formula is \(\sum_{i=1}^n(\frac{(X_i - Y_i)^2}{X_i})\).
- Parameters:
x (np.array) – a time series
y (np.array) – another time series
- Returns:
the Neyman Chi Squared distance
- tsdistance.lockstep.pearson(x, y)¶
Pearson Chi Squared distance is the sum of squared difference of the element pairs divided by the element in the second time series. Notably, \(Pearson(X,Y)\) is equal to \(Neyman(Y,X)\). The formula is: \(\sum_{i=1}^n(\frac{(X_i - Y_i)^2}{Y_i})\)
- Parameters:
x (np.array) – a time series
y (np.array) – another time series
- Returns:
the Pearson Chi Squared distance
- tsdistance.lockstep.squared_chi(x, y)¶
Squared Chi distance is the sum of the squared difference of the element pairs divided by the sum of the element pairs. This can be considered a symmetric version of the Neyman Chi Squared distance. The formula is: \(\sum_{i=1}^n(\frac{(X_i - Y_i)^2}{X_i + Y_i}\)
- Parameters:
x (np.array) – a time series
y (np.array) – another time series
- Returns:
the Pearson Squared Chi distance
- tsdistance.lockstep.K_divergence(x, y)¶
Divergence distance is the sum of the squared difference of the element pairs over the squared sum multplied by 2. Divergence distance is not a metric. The formula is: \(2\sum_{i=1}^n\frac{(X_i - Y_i)^2}{(X_i + Y_i)^2}\)
- Parameters:
x (np.array) – a time series
y (np.array) – another time series
- Returns:
the Divergence distance
- tsdistance.lockstep.additive_symm_chi(x, y)¶
Additive Symmetric Chi distance is the sum of the square of the difference of the element pairs multiplied by the sum of the element pairs divided by the product of the element pairs. The formula is: \(2\sum_{i=1}^n\frac{(X_i - Y_i)^2(X_i + Y_i)}{X_iY_i}\)
- Parameters:
x (np.array) – a time series
y (np.array) – another time series
- Returns:
the Divergence distance
- tsdistance.lockstep.prob_symmetric_chi(x, y)¶
Probabilistic Symmetric Chi distance is Squared Chi distance multiplied by 2. The formula is: \(2\sum_{i=1}^n\frac{(X_i - Y_i)^2}{X_i + Y_i}\)
- Parameters:
x (np.array) – a time series
y (np.array) – another time series
- Returns:
the Probabilistic Symmetric Chi distance
Shannon’s Enthropy Functions¶
The following functions are based on Shannon’s Entropy metric which has to deal with how much information a variable contains and the probabilistic uncertainty of information.
- tsdistance.lockstep.kullback(x, y)¶
Kullback-Leibler distance is known as KL divergence or information deviation. It is a measure of how different two probability distributions are to each other. The formula is: \(\sum_{i=1}^nX_iln(\frac{2X_i}{X_i + Y_i})\)
- Parameters:
x (np.array) – a time series
y (np.array) – another time series
- Returns:
the Kullback-Leibler distance
- tsdistance.lockstep.jeffrey(x, y)¶
Jeffreys distance is considered to be the symmetric version of Kullback-Leibler distance. The formula is: \(\sum_{i=1}^n(X_i-Y_i)ln(\frac{X_i}{Y_i})\)
- Parameters:
x (np.array) – a time series
y (np.array) – another time series
- Returns:
the Jeffreys distance
- tsdistance.lockstep.K_divergence(x, y)¶
Divergence distance is the sum of the squared difference of the element pairs over the squared sum multplied by 2. Divergence distance is not a metric. The formula is: \(2\sum_{i=1}^n\frac{(X_i - Y_i)^2}{(X_i + Y_i)^2}\)
- Parameters:
x (np.array) – a time series
y (np.array) – another time series
- Returns:
the Divergence distance
- tsdistance.lockstep.topsoe(x, y)¶
Topsoe distance is a symmetric version of K divergence distance. The formula is: \(\sum_{i=1}^nX_iln(\frac{2X_i}{X_i + Y_i}) + Y_iln(\frac{2Y_i}{Y_i + X_i})\)
- Parameters:
x (np.array) – a time series
y (np.array) – another time series
- Returns:
the Topsoe distance
- tsdistance.lockstep.jensen_shannon(x, y)¶
Jensen-Shannon distance is Topsoe distance divided by 2. The formula is: \(\frac{\sum_{i=1}^nX_iln(\frac{2X_i}{X_i + Y_i}) + Y_iln(\frac{2Y_i}{Y_i + X_i})}{2}\)
- Parameters:
x (np.array) – a time series
y (np.array) – another time series
- Returns:
the Jensen-Shannon distance
- tsdistance.lockstep.jensen_difference(x, y)¶
The formula for Jensen Difference Distance is: \(\sum_{i=1}^n \frac{X_iln(X_i) + Y_iln(Y_i)}{2} - \frac{X_i + Y_i}{2} * ln(\frac{X_i + Y_i}{2})\)
- Parameters:
x (np.array) – a time series
y (np.array) – another time series
- Returns:
the Jensen Difference Distance
Vicissitude Functions¶
This group of functions is based on Vicis-Wave Hedges function.
- tsdistance.lockstep.vicis_wave_hedges(x, y)¶
Vicis-Wave Hedges distance is a variant of the Wave Hedges function and can be considered a \(L_1\) function. The formula is: \(\sum_{i=1}^n \frac{X_i - Y_i}{min(X_i,Y_i)}\)
- Parameters:
x (np.array) – a time series
y (np.array) – another time series
- Returns:
the Vicis-Wave Hedges distance
- tsdistance.lockstep.emanon2(x, y)¶
Emamon 2 distance is a variant of Vicis Wave Hedges where the squared differences and minimums are added together. The formula is: \(\sum_{i=1}^n \frac{(X_i - Y_i)^2}{min(X_i,Y_i)^2}\)
- Parameters:
x (np.array) – a time series
y (np.array) – another time series
- Returns:
the Emamon 2 distance
- tsdistance.lockstep.emanon3(x, y)¶
Emamon 3 distance is another variant of Vicis Wave Hedges where only the differences are squared. The formula is: \(\sum_{i=1}^n \frac{(X_i - Y_i)^2}{min(X_i,Y_i)}\)
- Parameters:
x (np.array) – a time series
y (np.array) – another time series
- Returns:
the Emamon 3 distance
- tsdistance.lockstep.emanon4(x, y)¶
- tsdistance.lockstep.max_symmetric_chi(x, y)¶
Max-Symmetric Chi distance takes the maximum of the Pearson and Neyman distances. The formula is: \(max(\sum_{i=1}^n\frac{(X_i - Y_i)^2}{X_i},\sum_{i=1}^n\frac{(X_i - Y_i)^2}{Y_i})\)
- Parameters:
x (np.array) – a time series
y (np.array) – another time series
- Returns:
the Max-Symmetric Chi distance
- tsdistance.lockstep.min_symmetric_chi(x, y)¶
Min-Symmetric Chi takes the minimum of the Perason and Neyman distances. The formula is: \(min(\sum_{i=1}^n\frac{(X_i - Y_i)^2}{X_i},\sum_{i=1}^n\frac{(X_i-Y_i)^2}{Y_i})\)
- Parameters:
x (np.array) – a time series
y (np.array) – another time series
- Returns:
the Min-Symmetric Chi distance
Combination Functions¶
The combination functions take approaches from multiple types of functions displayed already.
- tsdistance.lockstep.taneja(x, y)¶
Taneja distance utilizes both the arithmetic and geometric mean. The formula is: \(\sum_{i=1}^n\frac{(X_i + Y_i)}{2} * ln(\frac{X_i + Y_i}{2\sqrt{X_iY_i}})\)
- Parameters:
x (np.array) – a time series
y (np.array) – another time series
- Returns:
the Taneja distance
- tsdistance.lockstep.kumar_johnson(x, y)¶
The formula for Kumar-Johnson distance is: \(\sum_{i=1}^n\frac{(X_i^2 - Y_i^2)^2}{2(X_iY_i)^{\frac{1}{2}}}\)
- Parameters:
x (np.array) – a time series
y (np.array) – another time series
- Returns:
the Kumar-Johnson distance
- tsdistance.lockstep.avg_l1_linf(x, y)¶
Avg(\(L_1\),:math:L_infty) is the average between the \(L_1\) distance and Chebyshev distance. The formula is: \(\frac{\sum_{i=1}^n(|X_i - Y_i|) + max(X_i - Y_i)}{2}\)
- Parameters:
x (np.array) – a time series
y (np.array) – another time series
- Returns:
the Avg(\(L_1\),:math:L_infty}) distance