Lockstep Measures

Lockstep measures involve some element-wise comparison between two time series. This restricts the time series to those of the same length. Despite this restriction, lock-step measures are very versatile due to the sheer variety of them. Here we will provide a equation and short description of each one in our library.

The usage of Lockstep measures is consistent across all measures. Below is an example of using manhattan distance:

from tsdistance.lockstep import manhattan
import numpy as np

ts1 = np.array([1, 2, 3, 4, 5, 9, 7])
ts2 = np.array([8, 9, 9, 7, 3, 1, 2])

dist_manhattan = manhattan(ts1, ts2)
print(dist_manhattan)

Output:

38

Minkowski Functions

The Minkowski functions include Euclidean distance, Manhattan Distance and Chebyshev’s distance. They are all variations of:

\[\begin{equation*} (\sum_{i=1}^n |X_i - Y_i|^p)^{\frac{1}{p}} \end{equation*}\]
tsdistance.lockstep.minkowski(x, y, p)

The formula for minkowski function is: \(\begin{equation*}(\sum_{i=1}^n |X_i - Y_i|^p)^{\frac{1}{p}}\end{equation*}\)

Parameters:
  • x (np.array) – a time series

  • y (np.array) – another time series

  • p (float) – parameter for \(p\) in the formula above

Returns:

the Minkowski distance

tsdistance.lockstep.abs_euclidean(x, y)

Euclidean distance is our most intuitive way of defining distance as that’s how we define it in our physical world. The formula is: \(\sqrt{\sum_{i=1}^n(X_i - Y_i)^2}\)

Parameters:
  • x (np.array) – a time series

  • y (np.array) – another time series

Returns:

the Euclidean distance

tsdistance.lockstep.manhattan(x, y)

Manhattan distance is when \(p = 1\). Manhattan distance is often called city-block distance as in the 2-dimensional case it is often represented using city-blocks. Manhattan distance’s advantage is that outliers skew the result less than in Chebyshev or Euclidean distance. The formula is: \(\sum_{i=1}^n |X_i - Y_i|\)

Parameters:
  • x (np.array) – a time series

  • y (np.array) – another time series

Returns:

the Manhattan distance

tsdistance.lockstep.chebyshev(x, y)

{Chebyshev distance} is represented as the limit as p tends towards infinity. Chebyshev distance is computed as: \(max_i(X_i - Y_i) = \lim_{p \rightarrow \infty} (\sum_{i=1}^n |X_i - Y_i|^p)^{\frac{1}{p}}\)

Parameters:
  • x (np.array) – a time series

  • y (np.array) – another time series

Returns:

the Chebyshev distance

L1 Functions

The \(L_1\) functions all involve using the Manhattan metric in some fashion, see formula for each approach below for details.

tsdistance.lockstep.sorensen(x, y)

Sorensen distance is the \(L_1\) distance but divided by the sum of the two time series. Because of this, the range of the Sorensen distance is \([0,1]\). It is often used in ecology and environmental sciences. The formula is: \(\frac{\sum_{i=1}^n |X_i - Y_i|}{\sum_{i=1}^n(X_i + Y_i)}\)

Parameters:
  • x (np.array) – a time series

  • y (np.array) – another time series

Returns:

the Sorensen distance

tsdistance.lockstep.gower(x, y)

Gower distance is the average distance between the elements. It is often used for mixed qualitative and quantitative data. The formula is: \(\frac{1}{n} * \sum_{i=1}^n |X_i - Y_i|\)

Parameters:
  • x (np.array) – a time series

  • y (np.array) – another time series

Returns:

the Gower distance

tsdistance.lockstep.soergel(x, y)

Soergel distance is the \(L_1\) distance divided by the sum of the maximum of each element pair. The formula is: \(\frac{\sum_{i=1}^n |X_i - Y_i|}{\sum_{i=1}^n max(X_i,Y_i)}\)

Parameters:
  • x (np.array) – a time series

  • y (np.array) – another time series

Returns:

the Soergel distance

tsdistance.lockstep.Kulczynski(x, y)

Kulczynski distance is very similar but the \(L_1\) distance is divided by the sum of the minimum of each element pair. The formula is: \(\frac{\sum_{i=1}^n|X_i - Y_i|}{\sum_{i=1}^n min(X_i,Y_i)}\).

Parameters:
  • x (np.array) – a time series

  • y (np.array) – another time series

Returns:

the Kulczynski distance

tsdistance.lockstep.canberra(x, y)

Canberra distance is the \(L_1\) distance but each element difference is divided by the element sum. Canberra distance is often used for data scattered about an origin. The formula is: \(\sum \frac{|X_i - Y_i|}{X_i + Y_i}\)

Parameters:
  • x (np.array) – a time series

  • y (np.array) – another time series

Returns:

the Canberra distance

tsdistance.lockstep.lorentzian(x, y)

Lorentzian distance is the natural log of the \(L_1\) distance between to time series. To avoid \(ln(0)\) and guarantee non-negative distances 1 is added. The formula is: \(sum_{i=1}^n ln(1 + |X_i - Y_i|)\)

Parameters:
  • x (np.array) – a time series

  • y (np.array) – another time series

Returns:

the Lorentzian distance

tsdistance.lockstep.Intersection(x, y)

Intersection Functions

The intersection family of functions have a strong relationship with the \(L_1\) family of functions. Many of the intersection functions can be converted to \(L_1\) by replacing \(\min(X_i,Y_i)\) with \(\frac{|X_i, Y_i|}{2}\). One commonality between the intersection family of functions is the use of the element-wise minimum of the two time series.

tsdistance.lockstep.wave_hedges(x, y)

Wave Hedges distance is the length of the time series subtracted by the sum of the ratio of the minimum and maximum of each element pair. The formula is: \(\sum_{i=1}^n1 - \frac{min(X_i,Y_i)}{max(X_i,Y_i)}\)

Parameters:
  • x (np.array) – a time series

  • y (np.array) – another time series

Returns:

the Wave Hedges distance

tsdistance.lockstep.czekanowski(x, y)

Czekanowski distance is the intersection equivalent of Sorensen. It is the sum of the minimums of each element pair divided by the sum of the elements multiplied by 2. The formula is: \(2\frac{\sum_{i=1}^nmin(X_i,Y_i)}{\sum_{i=1}^nX_i + Y_i}\)

Parameters:
  • x (np.array) – a time series

  • y (np.array) – another time series

Returns:

the Czekanowski distance

tsdistance.lockstep.motyka(x, y)
tsdistance.lockstep.tanimoto(x, y)

Inner Product Functions

The inner product functions all use the sum of pairwise multiplication of the elements from both time series.

tsdistance.lockstep.innerproduct(x, y)

Inner Product distance is the dot product between two time series. The formula is: \(\sum_{i=1}^n (X_iY_i)\)

Parameters:
  • x (np.array) – a time series

  • y (np.array) – another time series

Returns:

the Inner Product distance

tsdistance.lockstep.harmonicmean(x, y)
tsdistance.lockstep.kumarhassebrook(x, y)

Kumar-Hassebrook distance is like harmonic mean distance but the denominator is reduced by the product of the elements. The formula is: \(\frac{\sum_{i=1}^nX_iY_i}{\sum_{i=1}^n(X_i + Y_i) - \sum_{i=1}^n X_iY_i}\)

Parameters:
  • x (np.array) – a time series

  • y (np.array) – another time series

Returns:

the Kumar-Hassebrook distance

tsdistance.lockstep.jaccard(x, y)
tsdistance.lockstep.cosine(x, y)

Cosine distance is the complement of the cosine similarity that measures the angle between two vectors. As compared to the Inner Product distance, Cosine distance does not take the time series magnitude into account. The formula is: \(1 - \frac{\sum_{i=1}^n X_iY_i}{\sqrt{\sum_{i=1}^nX_i^2}\sqrt{\sum{i=1}^nY_i^2}}\)

Parameters:
  • x (np.array) – a time series

  • y (np.array) – another time series

Returns:

the Cosine distance

tsdistance.lockstep.dice(x, y)

Dice distance is the complement of the Dice similarity. It is not a metric but it is widely used in biological taxonomy. The formula is: \(1 - \frac{2\sum_{i=1}^nX_iY_i}{\sum_{i=1}^nX^2 + y^2}\)

Parameters:
  • x (np.array) – a time series

  • y (np.array) – another time series

Returns:

the Dice distance

Squared Chord Functions

The Squared Chord functions are a set of geometric mean distances. Thus, these distance functions are not compatible with negative values in either time series.

tsdistance.lockstep.fidelity(x, y)

Fidelity distance is the sum of the square root of the element-wise product of elements from two time series. The formula is: \(\sum_{i = 1}^n \sqrt{X_iY_i}\)

Parameters:
  • x (np.array) – a time series

  • y (np.array) – another time series

Returns:

the Fidelity distance

tsdistance.lockstep.bhattacharyya(x, y)
tsdistance.lockstep.Square_chord(x, y)

Squared Chord distance is the sum of the square of the differences of the square roots of each element. This exaggerates more dissimilar features. The formula is: \(\sum_{i=1}^n(\sqrt{X_i}-\sqrt{Y_i})^2\)

Parameters:
  • x (np.array) – a time series

  • y (np.array) – another time series

Returns:

the Squared Chord distance

tsdistance.lockstep.hellinger(x, y)
tsdistance.lockstep.matusita(x, y)

Squared L2 Functions

The squared \(L_2\) distance functions are a group of distance measures that all have \((X_i - Y_i)^2\) as the base.

tsdistance.lockstep.squared_euclidean(x, y)

Squared Euclidean distance is the square of the Euclidean distance. The formula is: \(\sum_{i=1}^n (X_i - Y_i)^2\)

Parameters:
  • x (np.array) – a time series

  • y (np.array) – another time series

Returns:

the Squared Euclidean distance

tsdistance.lockstep.clark(x, y)
tsdistance.lockstep.neyman(x, y)

Neyman Chi Squared distance is the sum of squared difference of the element pairs divided by the element in the first time series. The formula is \(\sum_{i=1}^n(\frac{(X_i - Y_i)^2}{X_i})\).

Parameters:
  • x (np.array) – a time series

  • y (np.array) – another time series

Returns:

the Neyman Chi Squared distance

tsdistance.lockstep.pearson(x, y)

Pearson Chi Squared distance is the sum of squared difference of the element pairs divided by the element in the second time series. Notably, \(Pearson(X,Y)\) is equal to \(Neyman(Y,X)\). The formula is: \(\sum_{i=1}^n(\frac{(X_i - Y_i)^2}{Y_i})\)

Parameters:
  • x (np.array) – a time series

  • y (np.array) – another time series

Returns:

the Pearson Chi Squared distance

tsdistance.lockstep.squared_chi(x, y)

Squared Chi distance is the sum of the squared difference of the element pairs divided by the sum of the element pairs. This can be considered a symmetric version of the Neyman Chi Squared distance. The formula is: \(\sum_{i=1}^n(\frac{(X_i - Y_i)^2}{X_i + Y_i}\)

Parameters:
  • x (np.array) – a time series

  • y (np.array) – another time series

Returns:

the Pearson Squared Chi distance

tsdistance.lockstep.K_divergence(x, y)

Divergence distance is the sum of the squared difference of the element pairs over the squared sum multplied by 2. Divergence distance is not a metric. The formula is: \(2\sum_{i=1}^n\frac{(X_i - Y_i)^2}{(X_i + Y_i)^2}\)

Parameters:
  • x (np.array) – a time series

  • y (np.array) – another time series

Returns:

the Divergence distance

tsdistance.lockstep.additive_symm_chi(x, y)

Additive Symmetric Chi distance is the sum of the square of the difference of the element pairs multiplied by the sum of the element pairs divided by the product of the element pairs. The formula is: \(2\sum_{i=1}^n\frac{(X_i - Y_i)^2(X_i + Y_i)}{X_iY_i}\)

Parameters:
  • x (np.array) – a time series

  • y (np.array) – another time series

Returns:

the Divergence distance

tsdistance.lockstep.prob_symmetric_chi(x, y)

Probabilistic Symmetric Chi distance is Squared Chi distance multiplied by 2. The formula is: \(2\sum_{i=1}^n\frac{(X_i - Y_i)^2}{X_i + Y_i}\)

Parameters:
  • x (np.array) – a time series

  • y (np.array) – another time series

Returns:

the Probabilistic Symmetric Chi distance

Shannon’s Enthropy Functions

The following functions are based on Shannon’s Entropy metric which has to deal with how much information a variable contains and the probabilistic uncertainty of information.

tsdistance.lockstep.kullback(x, y)

Kullback-Leibler distance is known as KL divergence or information deviation. It is a measure of how different two probability distributions are to each other. The formula is: \(\sum_{i=1}^nX_iln(\frac{2X_i}{X_i + Y_i})\)

Parameters:
  • x (np.array) – a time series

  • y (np.array) – another time series

Returns:

the Kullback-Leibler distance

tsdistance.lockstep.jeffrey(x, y)

Jeffreys distance is considered to be the symmetric version of Kullback-Leibler distance. The formula is: \(\sum_{i=1}^n(X_i-Y_i)ln(\frac{X_i}{Y_i})\)

Parameters:
  • x (np.array) – a time series

  • y (np.array) – another time series

Returns:

the Jeffreys distance

tsdistance.lockstep.K_divergence(x, y)

Divergence distance is the sum of the squared difference of the element pairs over the squared sum multplied by 2. Divergence distance is not a metric. The formula is: \(2\sum_{i=1}^n\frac{(X_i - Y_i)^2}{(X_i + Y_i)^2}\)

Parameters:
  • x (np.array) – a time series

  • y (np.array) – another time series

Returns:

the Divergence distance

tsdistance.lockstep.topsoe(x, y)

Topsoe distance is a symmetric version of K divergence distance. The formula is: \(\sum_{i=1}^nX_iln(\frac{2X_i}{X_i + Y_i}) + Y_iln(\frac{2Y_i}{Y_i + X_i})\)

Parameters:
  • x (np.array) – a time series

  • y (np.array) – another time series

Returns:

the Topsoe distance

tsdistance.lockstep.jensen_shannon(x, y)

Jensen-Shannon distance is Topsoe distance divided by 2. The formula is: \(\frac{\sum_{i=1}^nX_iln(\frac{2X_i}{X_i + Y_i}) + Y_iln(\frac{2Y_i}{Y_i + X_i})}{2}\)

Parameters:
  • x (np.array) – a time series

  • y (np.array) – another time series

Returns:

the Jensen-Shannon distance

tsdistance.lockstep.jensen_difference(x, y)

The formula for Jensen Difference Distance is: \(\sum_{i=1}^n \frac{X_iln(X_i) + Y_iln(Y_i)}{2} - \frac{X_i + Y_i}{2} * ln(\frac{X_i + Y_i}{2})\)

Parameters:
  • x (np.array) – a time series

  • y (np.array) – another time series

Returns:

the Jensen Difference Distance

Vicissitude Functions

This group of functions is based on Vicis-Wave Hedges function.

tsdistance.lockstep.vicis_wave_hedges(x, y)

Vicis-Wave Hedges distance is a variant of the Wave Hedges function and can be considered a \(L_1\) function. The formula is: \(\sum_{i=1}^n \frac{X_i - Y_i}{min(X_i,Y_i)}\)

Parameters:
  • x (np.array) – a time series

  • y (np.array) – another time series

Returns:

the Vicis-Wave Hedges distance

tsdistance.lockstep.emanon2(x, y)

Emamon 2 distance is a variant of Vicis Wave Hedges where the squared differences and minimums are added together. The formula is: \(\sum_{i=1}^n \frac{(X_i - Y_i)^2}{min(X_i,Y_i)^2}\)

Parameters:
  • x (np.array) – a time series

  • y (np.array) – another time series

Returns:

the Emamon 2 distance

tsdistance.lockstep.emanon3(x, y)

Emamon 3 distance is another variant of Vicis Wave Hedges where only the differences are squared. The formula is: \(\sum_{i=1}^n \frac{(X_i - Y_i)^2}{min(X_i,Y_i)}\)

Parameters:
  • x (np.array) – a time series

  • y (np.array) – another time series

Returns:

the Emamon 3 distance

tsdistance.lockstep.emanon4(x, y)
tsdistance.lockstep.max_symmetric_chi(x, y)

Max-Symmetric Chi distance takes the maximum of the Pearson and Neyman distances. The formula is: \(max(\sum_{i=1}^n\frac{(X_i - Y_i)^2}{X_i},\sum_{i=1}^n\frac{(X_i - Y_i)^2}{Y_i})\)

Parameters:
  • x (np.array) – a time series

  • y (np.array) – another time series

Returns:

the Max-Symmetric Chi distance

tsdistance.lockstep.min_symmetric_chi(x, y)

Min-Symmetric Chi takes the minimum of the Perason and Neyman distances. The formula is: \(min(\sum_{i=1}^n\frac{(X_i - Y_i)^2}{X_i},\sum_{i=1}^n\frac{(X_i-Y_i)^2}{Y_i})\)

Parameters:
  • x (np.array) – a time series

  • y (np.array) – another time series

Returns:

the Min-Symmetric Chi distance

Combination Functions

The combination functions take approaches from multiple types of functions displayed already.

tsdistance.lockstep.taneja(x, y)

Taneja distance utilizes both the arithmetic and geometric mean. The formula is: \(\sum_{i=1}^n\frac{(X_i + Y_i)}{2} * ln(\frac{X_i + Y_i}{2\sqrt{X_iY_i}})\)

Parameters:
  • x (np.array) – a time series

  • y (np.array) – another time series

Returns:

the Taneja distance

tsdistance.lockstep.kumar_johnson(x, y)

The formula for Kumar-Johnson distance is: \(\sum_{i=1}^n\frac{(X_i^2 - Y_i^2)^2}{2(X_iY_i)^{\frac{1}{2}}}\)

Parameters:
  • x (np.array) – a time series

  • y (np.array) – another time series

Returns:

the Kumar-Johnson distance

tsdistance.lockstep.avg_l1_linf(x, y)

Avg(\(L_1\),:math:L_infty) is the average between the \(L_1\) distance and Chebyshev distance. The formula is: \(\frac{\sum_{i=1}^n(|X_i - Y_i|) + max(X_i - Y_i)}{2}\)

Parameters:
  • x (np.array) – a time series

  • y (np.array) – another time series

Returns:

the Avg(\(L_1\),:math:L_infty}) distance