pdmlabs.utils.distance

pdmlabs.utils.distance#

Distance measure classes for time-series anomaly detection.

This module provides advanced distance metrics designed for time-series anomaly detection, particularly for reconstruction-based and distance-based methods. Each class measures the dissimilarity between observed and predicted/expected subsequences.

Classes:

Euclidean: Lp norm-based distance with optional normalization Mahalanobis: Covariance-based distance for multivariate time-series Garch: Volatility-adjusted distance using ARCH/GARCH models SSA_DISTANCE: Singular Spectrum Analysis distance for contextual anomalies

Usage:

The distance classes are typically used with time-series anomaly detectors that have train data (X_train_), a window size, and estimation/prediction errors.

Steps: 1. Create distance object: dist = Euclidean(power=2, neighborhood=100) 2. Set detector: dist.set_param() after assigning detector 3. Measure: score = dist.measure(X_real, X_predicted, index)

Example

>>> from pdmlabs.utils.distance import Euclidean
>>> euclidean_dist = Euclidean(power=2, norm=True)
>>> euclidean_dist.detector = my_detector  # Set detector first
>>> euclidean_dist.set_param()
>>> score = euclidean_dist.measure(X_obs, X_pred, sample_index)

Classes

`DTW`([method])	The function class for dynamic time warping measure
`EDRS`([method, ep, vol])	The function class for edit distance on real sequences
`Euclidean`([power, neighborhood, window, norm])	Lp norm distance measure for reconstruction-based anomaly detection.
`Fourier`([power])	The function class for Fourier measure good for contextual anomolies ---------- power: int, optional (default = 2) Lp norm for dissimiarlity measure considered .. attribute:: decision_scores_.
`Garch`([p, q, mean, vol])	GARCH-based distance measure using volatility modeling.
`Mahalanobis`([probability])	Mahalanobis distance measure accounting for covariance structure.
`SSA_DISTANCE`([method, e])	The function class for SSA measure good for contextual anomolies ---------- method : string, optional (default='linear' ) The method to fit the line and derives the SSA score e: float, optional (default = 1) The upper bound to start new line search for linear method .. attribute:: decision_scores_.
`TWED`([gamma, v])	Function class for Time-warped edit distance(TWED) measure

class pdmlabs.utils.distance.DTW(method='L2')#

Bases: object

The function class for dynamic time warping measure

Avaliable “L2”, “L1”, and custom

decision_scores_#

The outlier scores of the training data. The higher, the more abnormal. Outliers tend to have higher scores. This value is available once the detector is fitted.

Type:: numpy array of shape (n_samples,)

detector#

the anomaly detector that is used

Type:: Object classifier

measure(X1, X2, start_index)#

Obtain the SSA similarity score. :param X1: the reference timeseries :type X1: numpy array of shape (n, ) :param X2: the tested timeseries :type X2: numpy array of shape (n, ) :param index: :type index: int, :param current index for the subseqeuence that is being measured:

Returns:: score
Return type:: float, the higher the more dissimilar are the two curves

set_param()#: update the parameters with the detector that is used since the FFT measure doens’t need the attributes of detector or characteristics of X_train, the process is omitted.

class pdmlabs.utils.distance.EDRS(method='L1', ep=False, vol=False)#

Bases: object

The function class for edit distance on real sequences

Avaliable “L2”, “L1”, and custom

ep: float, optiona (default = 0.1): the threshold value to decide Di_j
votboolean, optional (default = False): whether to adapt a chaging votilities estimaed by garch for ep at different windows.

decision_scores_#

The outlier scores of the training data. The higher, the more abnormal. Outliers tend to have higher scores. This value is available once the detector is fitted.

Type:: numpy array of shape (n_samples,)

detector#

the anomaly detector that is used

Type:: Object classifier

measure(X1, X2, start_index)#

Obtain the SSA similarity score. :param X1: the reference timeseries :type X1: numpy array of shape (n, ) :param X2: the tested timeseries :type X2: numpy array of shape (n, ) :param index: :type index: int, :param current index for the subseqeuence that is being measured:

Returns:: score
Return type:: float, the higher the more dissimilar are the two curves

set_param()#: update the ep based on the votalitiy of the model

class pdmlabs.utils.distance.Euclidean(power=1, neighborhood=100, window=20, norm=False)#

Bases: object

Lp norm distance measure for reconstruction-based anomaly detection.

Computes the Lp norm (L1, L2, etc.) between observed and predicted time-series subsequences. Can optionally normalize by local neighborhood statistics to account for varying baseline behavior.

Parameters:

power (int, default=1) – The power p for the Lp norm. power=1 is Manhattan distance, power=2 is Euclidean.
neighborhood (int, default=100) – Size of neighborhood window (±samples around each point) used to compute normalization factor D (difference between max and min in neighborhood). If None, skips normalization.
window (int, default=20) – Length of each subsequence being compared.
norm (bool, default=False) – If True, normalizes distance by local neighborhood statistics (dividing by D). If False, returns raw Lp norm.

decision_scores_#

List of (index, score) tuples generated during measure() calls.

Type:: list

detector#

Reference to the anomaly detector providing X_train_, window, neighborhood.

Type:: object

Examples

>>> # Unnormalized L2 distance
>>> euclidean = Euclidean(power=2, norm=False)
>>> score = euclidean.measure(observed_X, predicted_X, start_index)

>>> # Normalized distance with neighborhood statistics
>>> euclidean_norm = Euclidean(power=2, neighborhood=50, norm=True)
>>> euclidean_norm.detector = detector  # Must set detector first
>>> euclidean_norm.set_param()
>>> score = euclidean_norm.measure(X_real, X_pred, idx)

Notes

Without normalization, the metric treats all points equally regardless of their local context. Normalization accounts for natural variations in the data by dividing by the range observed in nearby samples.

measure(X, Y, index)#

Calculate distance anomaly score between observed and expected subsequences.

Computes the Lp norm between X and Y, optionally normalized by neighborhood statistics. Higher scores indicate greater dissimilarity (anomaly).

Parameters:

X (np.ndarray) – Observed/actual subsequence values (1D array).
Y (np.ndarray) – Expected/predicted subsequence values (1D array, same length as X).
index (int) – Position in the training data where this window starts. Used to identify the neighborhood region for normalization.

Returns:

Anomaly score (distance). Range depends on normalization: - Without norm: typically 0 to infinity - With norm: typically 0 to 1 (normalized)

Return type:

float

Notes

Stores result in self.decision_scores_ list for analysis
For normalized mode: D = max(neighborhood) - min(neighborhood)
Handles edge cases: empty arrays, single dimensions, boundary indices

Examples

>>> dist = Euclidean(power=2)
>>> dist.detector = detector
>>> dist.set_param()
>>> X_real = np.array([1.0, 2.0, 3.0])
>>> X_pred = np.array([1.1, 2.1, 2.9])
>>> score = dist.measure(X_real, X_pred, 100)  # ~0.14

set_param()#

Initialize parameters from detector object.

Extracts window, neighborhood, training data, and other properties from the detector to ensure consistency. Must be called after setting the detector attribute.

Returns:: Returns self for method chaining.
Return type:: self
Raises:: AttributeError – If detector is None or lacks required attributes.

class pdmlabs.utils.distance.Fourier(power=2)#

Bases: object

The function class for Fourier measure good for contextual anomolies ———- power: int, optional (default = 2)

Lp norm for dissimiarlity measure considered

decision_scores_#

The outlier scores of the training data. The higher, the more abnormal. Outliers tend to have higher scores. This value is available once the detector is fitted.

Type:: numpy array of shape (n_samples,)

detector#

the anomaly detector that is used

Type:: Object classifier

measure(X2, X3, start_index)#

Obtain the SSA similarity score. :param X2: the reference timeseries :type X2: numpy array of shape (n, ) :param X3: the tested timeseries :type X3: numpy array of shape (n, ) :param index: :type index: int, :param current index for the subseqeuence that is being measured:

Returns:: score
Return type:: float, the higher the more dissimilar are the two curves

set_param()#: update the parameters with the detector that is used since the FFT measure doens’t need the attributes of detector or characteristics of X_train, the process is omitted.

class pdmlabs.utils.distance.Garch(p=1, q=1, mean='zero', vol='garch')#

Bases: object

GARCH-based distance measure using volatility modeling.

Models residual volatility using ARCH/GARCH (Generalized AutoRegressive Conditional Heteroskedasticity) models. Divides reconstruction error by the modeled volatility to create volatility-adjusted anomaly scores.

Parameters:

p (int, default=1) – ARCH order (number of lagged squared errors in volatility equation).
q (int, default=1) – GARCH order (number of lagged variance terms in volatility equation).
mean (str, default='zero') – Mean model: ‘Zero’, ‘Constant’, ‘AR’, ‘ARX’, ‘ARMAX’, etc.
vol (str, default='garch') – Volatility model: ‘GARCH’, ‘ARCH’, ‘ConstantMean’, ‘ZeroMean’, etc.

decision_scores_#

List of (index, score) tuples.

Type:: list

volatility#

Estimated volatility at each timepoint.

Type:: np.ndarray

Examples

>>> garch_dist = Garch(p=1, q=1, mean='zero', vol='garch')
>>> garch_dist.detector = detector
>>> garch_dist.set_param()  # Fits GARCH model on training residuals
>>> score = garch_dist.measure(X_real, X_pred, idx)

Notes

GARCH models are effective for time-series with changing volatility
Residuals are scaled by 10 during fitting for numerical stability
Volatility-adjusted errors better capture anomalies in volatile regimes
More computationally expensive than simpler distance measures

measure(X, Y, index)#

Derive the decision score based on the given distance measure :param X: The real input samples subsequence. :type X: numpy array of shape (n_samples, ) :param Y: The estimated input samples subsequence. :type Y: numpy array of shape (n_samples, ) :param Index: :type Index: int :param the index of the starting point in the subsequence:

Returns:: score – dissimiarity score between the two subsquences
Return type:: float

set_param()#: update the parameters with the detector that is used

class pdmlabs.utils.distance.Mahalanobis(probability=False)#

Bases: object

Mahalanobis distance measure accounting for covariance structure.

Computes Mahalanobis distance which measures dissimilarity while accounting for correlations between dimensions. Uses covariance matrix computed from residuals (X_train - estimation) in a neighborhood window.

Parameters:: probability (bool, default=False) – If False: Returns Mahalanobis distance (quadratic form with inverse covariance) If True: Returns probability via multivariate/univariate normal PDF

detector#

Reference to anomaly detector with X_train_, estimation, window, n_initial_.

Type:: object

decision_scores_#

List of (index, score) tuples.

Type:: list

cov#

Covariance matrix of residuals (computed in set_param).

Type:: np.ndarray

mu#

Mean vector (typically zeros).

Type:: np.ndarray

Examples

>>> # Mahalanobis distance
>>> maha = Mahalanobis(probability=False)
>>> maha.detector = detector
>>> maha.set_param()
>>> score = maha.measure(X_obs, X_pred, idx)

>>> # Probability-based scoring
>>> maha_prob = Mahalanobis(probability=True)
>>> maha_prob.detector = detector
>>> maha_prob.set_param()
>>> prob_score = maha_prob.measure(X_obs, X_pred, idx)

Notes

Mahalanobis distance is particularly useful when features have different scales or when correlations between features are important for anomaly detection.

measure(X, Y, index)#

Derive the decision score based on the given distance measure :param X: The real input samples subsequence. :type X: numpy array of shape (n_samples, ) :param Y: The estimated input samples subsequence. :type Y: numpy array of shape (n_samples, ) :param Index: :type Index: int :param the index of the starting point in the subsequence:

Returns:: score – dissimiarity score between the two subsquence
Return type:: float

norm_pdf_multivariate(x)#: multivarite normal density function

normpdf(x)#: univariate normal

set_param()#: update the parameters with the detector that is used

class pdmlabs.utils.distance.SSA_DISTANCE(method='linear', e=1)#

Bases: object

The function class for SSA measure good for contextual anomolies ———- method : string, optional (default=’linear’ )

The method to fit the line and derives the SSA score

e: float, optional (default = 1): The upper bound to start new line search for linear method

decision_scores_#

The outlier scores of the training data. The higher, the more abnormal. Outliers tend to have higher scores. This value is available once the detector is fitted.

Type:: numpy array of shape (n_samples,)

detector#

the anomaly detector that is used

Type:: Object classifier

Linearization(X2)#

Obtain the linearized curve. :param X2: the time series curve to be fitted :type X2: numpy array of shape (n, ) :param e: :type e: float, integer, or numpy array :param weights to obtain the:

Returns:: fit
Return type:: parameters for the fitted linear curve

measure(X2, X3, start_index)#

Obtain the SSA similarity score. :param X2: the reference timeseries :type X2: numpy array of shape (n, ) :param X3: the tested timeseries :type X3: numpy array of shape (n, ) :param e: :type e: float, integer, or numpy array :param weights to obtain the:

Returns:: score
Return type:: float, the higher the more dissimilar are the two curves

set_param()#: update the parameters with the detector that is used. Since the SSA measure doens’t need the attributes of detector or characteristics of X_train, the process is omitted.

class pdmlabs.utils.distance.TWED(gamma=0.1, v=0.1)#

Bases: object

Function class for Time-warped edit distance(TWED) measure

Avaliable “L2”, “L1”, and custom

gamma: float, optiona (default = 0.1): mismatch penalty
vfloat, optional (default = False): stifness parameter

decision_scores_#

The outlier scores of the training data. The higher, the more abnormal. Outliers tend to have higher scores. This value is available once the detector is fitted.

Type:: numpy array of shape (n_samples,)

detector#

the anomaly detector that is used

Type:: Object classifier

measure(A, B, start_index)#

Obtain the SSA similarity score. :param X1: the reference timeseries :type X1: numpy array of shape (n, ) :param X2: the tested timeseries :type X2: numpy array of shape (n, ) :param index: :type index: int, :param current index for the subseqeuence that is being measured:

Returns:: score
Return type:: float, the higher the more dissimilar are the two curves

set_param()#: No need

pdmlabs.utils.distance

Contents

pdmlabs.utils.distance#