pdmlabs.utils.automatic_parameter_generation#

Automatic parameter space generation for anomaly detection methods.

This module provides functions to automatically generate hyperparameter search spaces for various anomaly detection techniques (IF, LOF, KNN, LSTM, OCSVM, etc.) across different learning paradigms (online, offline, unsupervised, supervised, semi-supervised).

The parameter generation adapts to constraints like maximum profile length and data multivariate/univariate nature.

Key Functions:
  • uniform: Generate uniformly spaced parameters (linear)

  • get_exponential_parameters: Generate exponentially spaced parameters (quadratic)

  • uniform_even_numbers: Generate even-valued uniform parameters

  • online_technique: Parameters for online anomaly detection methods

  • incremental_technique: Parameters for incremental learning methods

  • unsupervised_technique: Parameters for unsupervised methods

  • semi_technique: Parameters for semi-supervised learning

  • supervised_technique: Parameters for supervised learning

  • post_proccessing_params: Parameters for post-processing methods

  • pre_proccessing_params: Parameters for pre-processing methods

Example

>>> from pdmlabs.utils.automatic_parameter_generation import online_technique
>>> # Get parameter space for Isolation Forest online detection
>>> max_profile = 500
>>> param_space = online_technique('IF', maximum_profile=max_profile)
>>> print(param_space)
{'n_estimators': [50, 100, 150, 200], 'max_samples': [...], ...}

Functions

default_TSB_semi(name, maximum_profile)

Generate default/baseline parameter sets for semi-supervised learning.

get_exponential_parameters(min_val, max_val, ...)

Generate exponentially spaced parameters (quadratic growth).

incremental_technique(name, maximum_profile)

Generate parameter space for incremental anomaly detection methods.

incremental_windows(max_wait)

Generate incremental learning window parameters.

online_technique(name, maximum_profile[, ...])

Generate parameter space for online anomaly detection methods.

post_proccessing_params(name, maximum_profile)

Generate parameter space for post-processor methods.

pre_proccessing_params(name, maximum_profile)

Generate parameter space for pre-processor methods.

profile_values(max_wait[, moment])

Generate profile length (historical buffer) parameters for incremental learning.

semi_technique(name, maximum_profile[, ...])

Generate parameter space for semi-supervised anomaly detection methods.

supervised_technique(name, maximum_profile)

Generate parameter space for supervised anomaly detection methods.

uniform(min_val, max_val, num_params[, to_int])

Generate uniformly spaced parameters in linear space.

uniform_even_numbers(min_val, max_val, ...)

Generate uniformly spaced even-numbered parameters.

unsupervised_technique(name, maximum_profile)

Generate parameter space for unsupervised anomaly detection methods.

pdmlabs.utils.automatic_parameter_generation.default_TSB_semi(name, maximum_profile)#

Generate default/baseline parameter sets for semi-supervised learning.

Uses single pre-tuned values instead of search spaces. Faster evaluation but less flexible than parameter grids. Useful for baseline comparisons.

Parameters:
  • name (str) – Algorithm name.

  • maximum_profile (int) – Maximum historical samples (used to scale some parameters).

Returns:

Parameter dictionary with single values (not lists) for each hyperparameter.

Return type:

dict

pdmlabs.utils.automatic_parameter_generation.get_exponential_parameters(min_val, max_val, num_params, to_int=False)#

Generate exponentially spaced parameters (quadratic growth).

Creates values in the range [min_val, max_val] using exponential spacing. Useful for parameters like window sizes, model capacity, or other measures where doubling might be more meaningful than adding a constant.

Algorithm: 1. Take square root of min and max values 2. Generate uniformly spaced values in sqrt-space 3. Square them back to get exponentially-spaced values

Parameters:
  • min_val (float) – Minimum parameter value.

  • max_val (float) – Maximum parameter value.

  • num_params (int) – Number of parameter values to generate.

  • to_int (bool, default=False) – If True, convert to integers.

Returns:

Sorted list of unique parameter values within [min_val, max_val]. Values outside this range are filtered out.

Return type:

list

Examples

>>> # Window sizes (quadratic growth)
>>> windows = get_exponential_parameters(10, 200, 5, to_int=True)
>>> print(windows)
[10, 25, 62, 100, 185]  # Approximately quadratic growth
>>> # Compare with linear spacing:
>>> linear = uniform(10, 200, 5, to_int=True)
>>> print(linear)
[10, 57, 105, 152, 200]  # Linear growth

Notes

  • Typically generates fewer unique values than num_params due to rounding

  • Particularly useful for time-window and buffer size parameters

  • More efficient sampling of large parameter spaces

pdmlabs.utils.automatic_parameter_generation.incremental_technique(name, maximum_profile, multivariate=True)#

Generate parameter space for incremental anomaly detection methods.

Incremental methods learn from streaming data one sample at a time and continuously update their models. Parameter spaces accommodate limited memory.

Parameters:
  • name (str) – Algorithm name. Supported: ‘IF’, ‘OCSVM’, ‘PB’, ‘KNN’, ‘NP’, ‘LOF’, ‘LTSF’, ‘TRANAD’, ‘USAD’, ‘HBOS’, ‘PCA’

  • maximum_profile (int) – Maximum historical samples available.

  • multivariate (bool, default=True) – Whether data is multivariate or univariate.

Returns:

  • dict – Parameter space with algorithm-specific hyperparameters. Includes deep learning parameters (epochs, learning rate) for neural methods.

  • Supported Algorithms

    • Traditional: IF, LOF, OCSVM, KNN, NP, HBOS, PCA

    • Time-series: LTSF (Linear Time-Series Forecasting)

    • Deep Learning: TRANAD, USAD (both neural anomaly detectors)

Examples

>>> params = incremental_technique('LOF', maximum_profile=500)
>>> print(params)
{'n_neighbors': [1, 2, 4, 8, 16, 32, 64]}
>>> params_lstm = incremental_technique('LTSF', maximum_profile=1000)
>>> print('learning_rate' in params_lstm)
True
pdmlabs.utils.automatic_parameter_generation.incremental_windows(max_wait)#

Generate incremental learning window parameters.

For online methods that process data in sliding windows, generates parameters controlling the initial window size, step size, and window length parameters.

Parameters:

max_wait (int) – Maximum allowed wait time (bounds window sizes).

Returns:

(incremental_slide, initial_incremental_window_length, incremental_window_length) - incremental_slide: Step sizes between windows - initial_incremental_window_length: Initial buffer size for warm-up - incremental_window_length: Steady-state window size

Return type:

tuple[list, list, list]

Notes

Removes 0 and 1 from slide candidates to avoid invalid window configurations.

pdmlabs.utils.automatic_parameter_generation.online_technique(name, maximum_profile, multivariate=True)#

Generate parameter space for online anomaly detection methods.

Online methods process data in a streaming fashion with limited memory. Parameter spaces are adapted to fit within maximum_profile constraints.

Parameters:
  • name (str) – Anomaly detection algorithm name. Supported: ‘CNN’, ‘IF’, ‘OCSVM’, ‘PB’, ‘KNN’, ‘NP’

  • maximum_profile (int) – Maximum number of historical samples available for learning. Used to bound window sizes, sample counts, etc.

  • multivariate (bool, default=True) – Whether data has multiple features (True) or univariate (False). Affects parameter ranges for sequence-based methods.

Returns:

  • dict – Parameter space dictionary where keys are hyperparameter names and values are lists of candidate values for grid search.

  • Supported Algorithms

    • CNN: Convolutional Neural Network

    • IF: Isolation Forest

    • OCSVM: One-Class Support Vector Machine

    • PB: Prophet-Based (placeholder)

    • KNN: K-Nearest Neighbors

    • NP: Nearest Neighbors Polynomial/Ball

Examples

>>> # Isolation Forest parameters for profile length 500
>>> param_space_IF = online_technique('IF', maximum_profile=500)
>>> print(param_space_IF.keys())
dict_keys(['n_estimators', 'max_samples', 'random_state', 'max_features', 'bootstrap'])
>>> # CNN parameters for multivariate data
>>> param_space_CNN = online_technique('CNN', maximum_profile=600, multivariate=True)
>>> print('window_size' in param_space_CNN)
True

Notes

  • Window/buffer sizes scale with maximum_profile (typically max_profile/4 to max_profile)

  • Exponential parameter generation used for window sizes and sample counts

  • Random seed fixed to 42 for reproducibility

  • Some methods have empty parameter dictionaries (e.g., PB)

pdmlabs.utils.automatic_parameter_generation.post_proccessing_params(name, maximum_profile)#

Generate parameter space for post-processor methods.

Post-processors refine raw anomaly scores through smoothing, normalization, or recalibration.

Parameters:
  • name (str) – Post-processor name. Supported: ‘Default’, ‘Dynamic Threshold’, ‘Moving2T’, ‘SelfTuning’, ‘Moving Average’

  • maximum_profile (int) – Maximum historical samples (scales window parameters).

Returns:

  • dict – Parameter space for post-processing.

  • Supported Methods

    • Default: Identity (no post-processing)

    • Dynamic Threshold: NASA dynamic thresholding

    • Moving2T: Two-pass moving threshold

    • SelfTuning: Adaptive normalization

    • Moving Average: Rolling window smoothing

pdmlabs.utils.automatic_parameter_generation.pre_proccessing_params(name, maximum_profile)#

Generate parameter space for pre-processor methods.

Pre-processors clean and transform raw data before anomaly detection.

Parameters:
  • name (str) – Pre-processor name. Supported: ‘Default’, ‘Keep Features’, ‘MinMax Scaler (semi)’, ‘Windowing (one column)’, ‘Mean Aggregator’

  • maximum_profile (int) – Maximum historical samples.

Returns:

  • dict – Parameter space for pre-processing.

  • Supported Methods

    • Default: Identity

    • Keep Features: Feature selection

    • MinMax Scaler: Normalization

    • Windowing: Lagged feature creation

    • Mean Aggregator: Downsampling

pdmlabs.utils.automatic_parameter_generation.profile_values(max_wait, moment=False)#

Generate profile length (historical buffer) parameters for incremental learning.

The profile is the amount of historical data kept in memory to learn or calibrate anomaly detection models.

Parameters:
  • max_wait (int) – Maximum wait time (controls the range of profile lengths).

  • moment (bool, default=False) – If False: Generate search space of profile lengths. If True: Return fixed moment-based profile (1027 samples).

Returns:

List of profile length candidates or fixed value.

Return type:

list[int]

Notes

  • Without moment: Returns 16 exponentially-spaced values from max_wait/10 to max_wait

  • With moment: Returns [1027] for moment-based methods

pdmlabs.utils.automatic_parameter_generation.semi_technique(name, maximum_profile, multivariate=True)#

Generate parameter space for semi-supervised anomaly detection methods.

Semi-supervised methods use a small amount of labeled data (typically failures) combined with large amounts of unlabeled normal data.

Parameters:
  • name (str) – Algorithm. Supported: ‘IF’, ‘OCSVM’, ‘PB’, ‘KNN’, ‘NP’, ‘LOF’, ‘LTSF’, ‘TRANAD’, ‘USAD’, ‘HBOS’, ‘PCA’, ‘CNN’, ‘LSTM’

  • maximum_profile (int) – Maximum historical samples.

  • multivariate (bool, default=True) – Multivariate or univariate.

Returns:

Parameter space combining supervised and unsupervised parameters.

Return type:

dict

pdmlabs.utils.automatic_parameter_generation.supervised_technique(name, maximum_profile, multivariate=True)#

Generate parameter space for supervised anomaly detection methods.

Supervised methods learn from labeled failure/non-failure examples.

Parameters:
  • name (str) – Algorithm name. Supported: ‘XGBOOST’

  • maximum_profile (int) – Maximum historical samples.

  • multivariate (bool, default=True) – Multivariate or univariate.

Returns:

  • dict – Parameter space for supervised learning.

  • Supported Methods

    • XGBOOST: Gradient boosting with tree-based learners

Notes

  • XGBoost parameters adapted to maximum_profile for tree depth and leaf count

pdmlabs.utils.automatic_parameter_generation.uniform(min_val, max_val, num_params, to_int=False)#

Generate uniformly spaced parameters in linear space.

Creates evenly distributed values between min_val and max_val. Useful for parameters without natural exponential scaling.

Parameters:
  • min_val (float) – Minimum parameter value.

  • max_val (float) – Maximum parameter value.

  • num_params (int) – Number of values to generate.

  • to_int (bool, default=False) – If True, convert values to integers.

Returns:

Sorted list of unique parameter values.

Return type:

list

Examples

>>> # Float parameters
>>> alphas = uniform(0.1, 0.9, 5)
>>> print(alphas)
[0.1, 0.3, 0.5, 0.7, 0.9]
>>> # Integer parameters
>>> batch_sizes = uniform(32, 256, 5, to_int=True)
>>> print(batch_sizes)
[32, 96, 160, 224, 256]
pdmlabs.utils.automatic_parameter_generation.uniform_even_numbers(min_val, max_val, num_params)#

Generate uniformly spaced even-numbered parameters.

Parameters:
  • min_val (int) – Minimum value (will be rounded to nearest even number).

  • max_val (int) – Maximum value (will be rounded to nearest even number).

  • num_params (int) – Number of parameter values to generate.

Returns:

Sorted list of unique even-valued parameters.

Return type:

list[int]

Examples

>>> params = uniform_even_numbers(10, 30, 5)
>>> print(params)
[10, 14, 18, 22, 26, 30]
pdmlabs.utils.automatic_parameter_generation.unsupervised_technique(name, maximum_profile, multivariate=True)#

Generate parameter space for unsupervised anomaly detection methods.

Unsupervised methods learn from unlabeled data without any failure information. Useful for discovery of unknown anomaly types.

Parameters:
  • name (str) – Algorithm name. Supported: ‘NP’, ‘DAMP’, ‘KNN’, ‘IF’, ‘LOF’, ‘SAND’, ‘HBOS’, ‘PCA’, ‘CHRONOS’

  • maximum_profile (int) – Maximum historical samples.

  • multivariate (bool, default=True) – Multivariate (True) or univariate (False) data.

Returns:

Parameter space for unsupervised learning.

Return type:

dict

Notes

  • Parameters include window/buffer configuration for online processing

  • LOF, IF, SAND include sliding window and overlap strategies

  • CHRONOS (probabilistic forecasting) has context length and sampling parameters