pdmlabs.utils.distance#
Distance measure classes for time-series anomaly detection.
This module provides advanced distance metrics designed for time-series anomaly detection, particularly for reconstruction-based and distance-based methods. Each class measures the dissimilarity between observed and predicted/expected subsequences.
- Classes:
Euclidean: Lp norm-based distance with optional normalization Mahalanobis: Covariance-based distance for multivariate time-series Garch: Volatility-adjusted distance using ARCH/GARCH models SSA_DISTANCE: Singular Spectrum Analysis distance for contextual anomalies
- Usage:
The distance classes are typically used with time-series anomaly detectors that have train data (X_train_), a window size, and estimation/prediction errors.
Steps: 1. Create distance object: dist = Euclidean(power=2, neighborhood=100) 2. Set detector: dist.set_param() after assigning detector 3. Measure: score = dist.measure(X_real, X_predicted, index)
Example
>>> from pdmlabs.utils.distance import Euclidean
>>> euclidean_dist = Euclidean(power=2, norm=True)
>>> euclidean_dist.detector = my_detector # Set detector first
>>> euclidean_dist.set_param()
>>> score = euclidean_dist.measure(X_obs, X_pred, sample_index)
Classes
|
The function class for dynamic time warping measure |
|
The function class for edit distance on real sequences |
|
Lp norm distance measure for reconstruction-based anomaly detection. |
|
The function class for Fourier measure good for contextual anomolies ---------- power: int, optional (default = 2) Lp norm for dissimiarlity measure considered .. attribute:: decision_scores_. |
|
GARCH-based distance measure using volatility modeling. |
|
Mahalanobis distance measure accounting for covariance structure. |
|
The function class for SSA measure good for contextual anomolies ---------- method : string, optional (default='linear' ) The method to fit the line and derives the SSA score e: float, optional (default = 1) The upper bound to start new line search for linear method .. attribute:: decision_scores_. |
|
Function class for Time-warped edit distance(TWED) measure |
- class pdmlabs.utils.distance.DTW(method='L2')#
Bases:
objectThe function class for dynamic time warping measure
Avaliable “L2”, “L1”, and custom
- decision_scores_#
The outlier scores of the training data. The higher, the more abnormal. Outliers tend to have higher scores. This value is available once the detector is fitted.
- Type:
numpy array of shape (n_samples,)
- detector#
the anomaly detector that is used
- Type:
Object classifier
- measure(X1, X2, start_index)#
Obtain the SSA similarity score. :param X1: the reference timeseries :type X1: numpy array of shape (n, ) :param X2: the tested timeseries :type X2: numpy array of shape (n, ) :param index: :type index: int, :param current index for the subseqeuence that is being measured:
- Returns:
score
- Return type:
float, the higher the more dissimilar are the two curves
- set_param()#
update the parameters with the detector that is used since the FFT measure doens’t need the attributes of detector or characteristics of X_train, the process is omitted.
- class pdmlabs.utils.distance.EDRS(method='L1', ep=False, vol=False)#
Bases:
objectThe function class for edit distance on real sequences
Avaliable “L2”, “L1”, and custom
- ep: float, optiona (default = 0.1)
the threshold value to decide Di_j
- votboolean, optional (default = False)
whether to adapt a chaging votilities estimaed by garch for ep at different windows.
- decision_scores_#
The outlier scores of the training data. The higher, the more abnormal. Outliers tend to have higher scores. This value is available once the detector is fitted.
- Type:
numpy array of shape (n_samples,)
- detector#
the anomaly detector that is used
- Type:
Object classifier
- measure(X1, X2, start_index)#
Obtain the SSA similarity score. :param X1: the reference timeseries :type X1: numpy array of shape (n, ) :param X2: the tested timeseries :type X2: numpy array of shape (n, ) :param index: :type index: int, :param current index for the subseqeuence that is being measured:
- Returns:
score
- Return type:
float, the higher the more dissimilar are the two curves
- set_param()#
update the ep based on the votalitiy of the model
- class pdmlabs.utils.distance.Euclidean(power=1, neighborhood=100, window=20, norm=False)#
Bases:
objectLp norm distance measure for reconstruction-based anomaly detection.
Computes the Lp norm (L1, L2, etc.) between observed and predicted time-series subsequences. Can optionally normalize by local neighborhood statistics to account for varying baseline behavior.
- Parameters:
power (int, default=1) – The power p for the Lp norm. power=1 is Manhattan distance, power=2 is Euclidean.
neighborhood (int, default=100) – Size of neighborhood window (±samples around each point) used to compute normalization factor D (difference between max and min in neighborhood). If None, skips normalization.
window (int, default=20) – Length of each subsequence being compared.
norm (bool, default=False) – If True, normalizes distance by local neighborhood statistics (dividing by D). If False, returns raw Lp norm.
- decision_scores_#
List of (index, score) tuples generated during measure() calls.
- Type:
list
Examples
>>> # Unnormalized L2 distance >>> euclidean = Euclidean(power=2, norm=False) >>> score = euclidean.measure(observed_X, predicted_X, start_index)
>>> # Normalized distance with neighborhood statistics >>> euclidean_norm = Euclidean(power=2, neighborhood=50, norm=True) >>> euclidean_norm.detector = detector # Must set detector first >>> euclidean_norm.set_param() >>> score = euclidean_norm.measure(X_real, X_pred, idx)
Notes
Without normalization, the metric treats all points equally regardless of their local context. Normalization accounts for natural variations in the data by dividing by the range observed in nearby samples.
- measure(X, Y, index)#
Calculate distance anomaly score between observed and expected subsequences.
Computes the Lp norm between X and Y, optionally normalized by neighborhood statistics. Higher scores indicate greater dissimilarity (anomaly).
- Parameters:
X (np.ndarray) – Observed/actual subsequence values (1D array).
Y (np.ndarray) – Expected/predicted subsequence values (1D array, same length as X).
index (int) – Position in the training data where this window starts. Used to identify the neighborhood region for normalization.
- Returns:
Anomaly score (distance). Range depends on normalization: - Without norm: typically 0 to infinity - With norm: typically 0 to 1 (normalized)
- Return type:
float
Notes
Stores result in self.decision_scores_ list for analysis
For normalized mode: D = max(neighborhood) - min(neighborhood)
Handles edge cases: empty arrays, single dimensions, boundary indices
Examples
>>> dist = Euclidean(power=2) >>> dist.detector = detector >>> dist.set_param() >>> X_real = np.array([1.0, 2.0, 3.0]) >>> X_pred = np.array([1.1, 2.1, 2.9]) >>> score = dist.measure(X_real, X_pred, 100) # ~0.14
- set_param()#
Initialize parameters from detector object.
Extracts window, neighborhood, training data, and other properties from the detector to ensure consistency. Must be called after setting the detector attribute.
- Returns:
Returns self for method chaining.
- Return type:
self
- Raises:
AttributeError – If detector is None or lacks required attributes.
- class pdmlabs.utils.distance.Fourier(power=2)#
Bases:
objectThe function class for Fourier measure good for contextual anomolies ———- power: int, optional (default = 2)
Lp norm for dissimiarlity measure considered
- decision_scores_#
The outlier scores of the training data. The higher, the more abnormal. Outliers tend to have higher scores. This value is available once the detector is fitted.
- Type:
numpy array of shape (n_samples,)
- detector#
the anomaly detector that is used
- Type:
Object classifier
- measure(X2, X3, start_index)#
Obtain the SSA similarity score. :param X2: the reference timeseries :type X2: numpy array of shape (n, ) :param X3: the tested timeseries :type X3: numpy array of shape (n, ) :param index: :type index: int, :param current index for the subseqeuence that is being measured:
- Returns:
score
- Return type:
float, the higher the more dissimilar are the two curves
- set_param()#
update the parameters with the detector that is used since the FFT measure doens’t need the attributes of detector or characteristics of X_train, the process is omitted.
- class pdmlabs.utils.distance.Garch(p=1, q=1, mean='zero', vol='garch')#
Bases:
objectGARCH-based distance measure using volatility modeling.
Models residual volatility using ARCH/GARCH (Generalized AutoRegressive Conditional Heteroskedasticity) models. Divides reconstruction error by the modeled volatility to create volatility-adjusted anomaly scores.
- Parameters:
p (int, default=1) – ARCH order (number of lagged squared errors in volatility equation).
q (int, default=1) – GARCH order (number of lagged variance terms in volatility equation).
mean (str, default='zero') – Mean model: ‘Zero’, ‘Constant’, ‘AR’, ‘ARX’, ‘ARMAX’, etc.
vol (str, default='garch') – Volatility model: ‘GARCH’, ‘ARCH’, ‘ConstantMean’, ‘ZeroMean’, etc.
- decision_scores_#
List of (index, score) tuples.
- Type:
list
- volatility#
Estimated volatility at each timepoint.
- Type:
np.ndarray
Examples
>>> garch_dist = Garch(p=1, q=1, mean='zero', vol='garch') >>> garch_dist.detector = detector >>> garch_dist.set_param() # Fits GARCH model on training residuals >>> score = garch_dist.measure(X_real, X_pred, idx)
Notes
GARCH models are effective for time-series with changing volatility
Residuals are scaled by 10 during fitting for numerical stability
Volatility-adjusted errors better capture anomalies in volatile regimes
More computationally expensive than simpler distance measures
- measure(X, Y, index)#
Derive the decision score based on the given distance measure :param X: The real input samples subsequence. :type X: numpy array of shape (n_samples, ) :param Y: The estimated input samples subsequence. :type Y: numpy array of shape (n_samples, ) :param Index: :type Index: int :param the index of the starting point in the subsequence:
- Returns:
score – dissimiarity score between the two subsquences
- Return type:
float
- set_param()#
update the parameters with the detector that is used
- class pdmlabs.utils.distance.Mahalanobis(probability=False)#
Bases:
objectMahalanobis distance measure accounting for covariance structure.
Computes Mahalanobis distance which measures dissimilarity while accounting for correlations between dimensions. Uses covariance matrix computed from residuals (X_train - estimation) in a neighborhood window.
- Parameters:
probability (bool, default=False) – If False: Returns Mahalanobis distance (quadratic form with inverse covariance) If True: Returns probability via multivariate/univariate normal PDF
- detector#
Reference to anomaly detector with X_train_, estimation, window, n_initial_.
- Type:
object
- decision_scores_#
List of (index, score) tuples.
- Type:
list
- cov#
Covariance matrix of residuals (computed in set_param).
- Type:
np.ndarray
- mu#
Mean vector (typically zeros).
- Type:
np.ndarray
Examples
>>> # Mahalanobis distance >>> maha = Mahalanobis(probability=False) >>> maha.detector = detector >>> maha.set_param() >>> score = maha.measure(X_obs, X_pred, idx)
>>> # Probability-based scoring >>> maha_prob = Mahalanobis(probability=True) >>> maha_prob.detector = detector >>> maha_prob.set_param() >>> prob_score = maha_prob.measure(X_obs, X_pred, idx)
Notes
Mahalanobis distance is particularly useful when features have different scales or when correlations between features are important for anomaly detection.
- measure(X, Y, index)#
Derive the decision score based on the given distance measure :param X: The real input samples subsequence. :type X: numpy array of shape (n_samples, ) :param Y: The estimated input samples subsequence. :type Y: numpy array of shape (n_samples, ) :param Index: :type Index: int :param the index of the starting point in the subsequence:
- Returns:
score – dissimiarity score between the two subsquence
- Return type:
float
- norm_pdf_multivariate(x)#
multivarite normal density function
- normpdf(x)#
univariate normal
- set_param()#
update the parameters with the detector that is used
- class pdmlabs.utils.distance.SSA_DISTANCE(method='linear', e=1)#
Bases:
objectThe function class for SSA measure good for contextual anomolies ———- method : string, optional (default=’linear’ )
The method to fit the line and derives the SSA score
- e: float, optional (default = 1)
The upper bound to start new line search for linear method
- decision_scores_#
The outlier scores of the training data. The higher, the more abnormal. Outliers tend to have higher scores. This value is available once the detector is fitted.
- Type:
numpy array of shape (n_samples,)
- detector#
the anomaly detector that is used
- Type:
Object classifier
- Linearization(X2)#
Obtain the linearized curve. :param X2: the time series curve to be fitted :type X2: numpy array of shape (n, ) :param e: :type e: float, integer, or numpy array :param weights to obtain the:
- Returns:
fit
- Return type:
parameters for the fitted linear curve
- measure(X2, X3, start_index)#
Obtain the SSA similarity score. :param X2: the reference timeseries :type X2: numpy array of shape (n, ) :param X3: the tested timeseries :type X3: numpy array of shape (n, ) :param e: :type e: float, integer, or numpy array :param weights to obtain the:
- Returns:
score
- Return type:
float, the higher the more dissimilar are the two curves
- set_param()#
update the parameters with the detector that is used. Since the SSA measure doens’t need the attributes of detector or characteristics of X_train, the process is omitted.
- class pdmlabs.utils.distance.TWED(gamma=0.1, v=0.1)#
Bases:
objectFunction class for Time-warped edit distance(TWED) measure
Avaliable “L2”, “L1”, and custom
- gamma: float, optiona (default = 0.1)
mismatch penalty
- vfloat, optional (default = False)
stifness parameter
- decision_scores_#
The outlier scores of the training data. The higher, the more abnormal. Outliers tend to have higher scores. This value is available once the detector is fitted.
- Type:
numpy array of shape (n_samples,)
- detector#
the anomaly detector that is used
- Type:
Object classifier
- measure(A, B, start_index)#
Obtain the SSA similarity score. :param X1: the reference timeseries :type X1: numpy array of shape (n, ) :param X2: the tested timeseries :type X2: numpy array of shape (n, ) :param index: :type index: int, :param current index for the subseqeuence that is being measured:
- Returns:
score
- Return type:
float, the higher the more dissimilar are the two curves
- set_param()#
No need