pdmlabs.experiment.batch.supervised_experiment#
Classes
|
Supervised anomaly detection with labeled anomaly windows. |
- class pdmlabs.experiment.batch.supervised_experiment.SupervisedPdMExperiment(experiment_name: str, pipeline: PdMPipeline, param_space: dict, constraint_function: Callable = None, target_data: list[DataFrame] = None, target_sources: list[str] = None, historic_data: list[DataFrame] = [], historic_sources: list[str] = [], optimization_param: str = 'AD1_AUC', initial_random: int = 2, num_iteration: int = 20, batch_size: int = 1, n_jobs: int = 1, random_state: int = 42, random_n_tries: int = 3, constraint_max_retries: int = 10, historic_data_header: str = 'infer', target_data_header: str = 'infer', artifacts: str = 'artifacts', debug: bool = False, delay: float = None, log_best_scores: bool = False, maximize: bool = True, custom_evaluators: list = None)#
Bases:
PdMExperimentSupervised anomaly detection with labeled anomaly windows.
This experiment flavor is designed for scenarios where you have: - Training data with explicit anomaly labels (ranges or boolean arrays) - A supervised method that can learn from these labels - Target data to evaluate on
It implements a train-once, test-many approach: 1. Fits method, preprocessor, postprocessor on ALL historic labeled data (once) 2. Then applies to each target scenario independently
This differs from semi-supervised (which fits per-scenario) and ensures consistent model training across all test scenarios.
- pipeline#
Must have ‘anomaly_labels’ key in dataset. Lists of arrays matching historic_data in length and dimensionality.
- Type:
- param_space#
Hyperparameter search space.
- Type:
dict
- Raises:
ValueError – If dataset lacks ‘anomaly_labels’ key.
ValueError – If anomaly_labels length does not match historic_data.
IncompatibleMethodException – If method does not implement SupervisedMethodInterface.
Examples
>>> dataset = { ... 'historic_data': [df_train], ... 'target_data': [df_test], ... 'anomaly_labels': [label_array], # 1D array of same length as df_train ... ... ... } >>> from pdmlabs.experiment.batch.supervised_experiment import SupervisedPdMExperiment >>> experiment = SupervisedPdMExperiment( ... experiment_name='supervised-demo', ... pipeline=pipeline, ... param_space={'method_nu': [0.05, 0.1, 0.2]}, ... num_iteration=20 ... ) >>> results = experiment.execute()
- execute() dict#
Run supervised experiment with labeled anomaly training data.
Trains a supervised method once on all labeled historic data, then evaluates on each target scenario:
Preprocesses all historic data (fit once)
Fits method on all labeled historic data (single consolidated training)
Fits postprocessor on labeled data
For each target: a. Preprocesses using the fitted preprocessor b. Applies fitted method to get anomaly scores c. Postprocesses scores d. Thresholds to get binary predictions e. Evaluates against ground truth
Returns best parameters found
This approach ensures the model is trained consistently across all test scenarios, unlike semi-supervised where the model adapts per-scenario.
- Returns:
- Result dictionary with:
’best_params’: Best parameter combination found
’best_objective’: Best metric value
’th’: Best decision threshold
- Return type:
dict
- Raises:
ValueError – If anomaly_labels dimension mismatches data.
IncompatibleMethodException – If method is not SupervisedMethodInterface.
Examples
>>> results = experiment.execute() >>> print(f"Best threshold: {results['th']:.3f}") Best threshold: 0.645