pdmlabs.experiment.batch.supervised_experiment#

Classes

SupervisedPdMExperiment(experiment_name, ...)

Supervised anomaly detection with labeled anomaly windows.

class pdmlabs.experiment.batch.supervised_experiment.SupervisedPdMExperiment(experiment_name: str, pipeline: PdMPipeline, param_space: dict, constraint_function: Callable = None, target_data: list[DataFrame] = None, target_sources: list[str] = None, historic_data: list[DataFrame] = [], historic_sources: list[str] = [], optimization_param: str = 'AD1_AUC', initial_random: int = 2, num_iteration: int = 20, batch_size: int = 1, n_jobs: int = 1, random_state: int = 42, random_n_tries: int = 3, constraint_max_retries: int = 10, historic_data_header: str = 'infer', target_data_header: str = 'infer', artifacts: str = 'artifacts', debug: bool = False, delay: float = None, log_best_scores: bool = False, maximize: bool = True, custom_evaluators: list = None)#

Bases: PdMExperiment

Supervised anomaly detection with labeled anomaly windows.

This experiment flavor is designed for scenarios where you have: - Training data with explicit anomaly labels (ranges or boolean arrays) - A supervised method that can learn from these labels - Target data to evaluate on

It implements a train-once, test-many approach: 1. Fits method, preprocessor, postprocessor on ALL historic labeled data (once) 2. Then applies to each target scenario independently

This differs from semi-supervised (which fits per-scenario) and ensures consistent model training across all test scenarios.

pipeline#

Must have ‘anomaly_labels’ key in dataset. Lists of arrays matching historic_data in length and dimensionality.

Type:

PdMPipeline

param_space#

Hyperparameter search space.

Type:

dict

Raises:
  • ValueError – If dataset lacks ‘anomaly_labels’ key.

  • ValueError – If anomaly_labels length does not match historic_data.

  • IncompatibleMethodException – If method does not implement SupervisedMethodInterface.

Examples

>>> dataset = {
...     'historic_data': [df_train],
...     'target_data': [df_test],
...     'anomaly_labels': [label_array],  # 1D array of same length as df_train
...     ...
... }
>>> from pdmlabs.experiment.batch.supervised_experiment import SupervisedPdMExperiment
>>> experiment = SupervisedPdMExperiment(
...     experiment_name='supervised-demo',
...     pipeline=pipeline,
...     param_space={'method_nu': [0.05, 0.1, 0.2]},
...     num_iteration=20
... )
>>> results = experiment.execute()
execute() dict#

Run supervised experiment with labeled anomaly training data.

Trains a supervised method once on all labeled historic data, then evaluates on each target scenario:

  1. Preprocesses all historic data (fit once)

  2. Fits method on all labeled historic data (single consolidated training)

  3. Fits postprocessor on labeled data

  4. For each target: a. Preprocesses using the fitted preprocessor b. Applies fitted method to get anomaly scores c. Postprocesses scores d. Thresholds to get binary predictions e. Evaluates against ground truth

  5. Returns best parameters found

This approach ensures the model is trained consistently across all test scenarios, unlike semi-supervised where the model adapts per-scenario.

Returns:

Result dictionary with:
  • ’best_params’: Best parameter combination found

  • ’best_objective’: Best metric value

  • ’th’: Best decision threshold

Return type:

dict

Raises:
  • ValueError – If anomaly_labels dimension mismatches data.

  • IncompatibleMethodException – If method is not SupervisedMethodInterface.

Examples

>>> results = experiment.execute()
>>> print(f"Best threshold: {results['th']:.3f}")
Best threshold: 0.645