pdmlabs.experiment#

Experiment classes for automated PdM model evaluation and hyperparameter tuning.

This module provides the experiment framework that orchestrates: - Parameter space exploration via Mango (Bayesian/random optimization) - Cross-validation and temporal evaluation - MLflow tracking and artifact management - PdM-aware metric computation

Core Abstractions:

PdMExperiment (experiment.py)

Abstract base class for all experiment flavors. Defines common interface: execute() -> dict

Batch Experiments (batch/)

Designed for offline/retrospective evaluation. Best for production validation and performance reporting. - AutoProfileSemiSupervisedPdMExperiment: Auto-tuned profile size - SemiSupervisedPdMExperiment: Per-scenario invariance - SupervisedPdMExperiment: Labeled data training - UnsupervisedPdMExperiment: No labels, pattern-based detection - IncrementalSemiSupervisedPdMExperiment: Online-style incremental fitting - SupervisedRULPdMExperiment: Remaining useful life regression - Supervised_SA_PdMExperiment: Survival analysis

Streaming Experiments (streaming/)

Early-stage stubs for real-time scenarios. Currently not production-ready; use batch for now. - StreamingSemiSupervisedPdMExperiment - StreamingUnsupervisedPdMExperiment

Typical Workflow:

  1. Prepare dataset (dict with features, labels, events, sources)

  2. Create PdMPipeline specifying method, preprocessor, postprocessor, thresholder

  3. Define param_space for Mango optimization

  4. Choose experiment flavor (e.g., AutoProfileSemiSupervisedPdMExperiment)

  5. Call experiment.execute() to run optimization

  6. Access best_params and metrics from result dict

  7. View runs in MLflow UI

Example

>>> from pdmlabs.pipeline.pipeline import PdMPipeline
>>> from pdmlabs.experiment.batch import AutoProfileSemiSupervisedPdMExperiment
>>>
>>> pipeline = PdMPipeline(
...     dataset=my_dataset,
...     method=IsolationForest,
...     preprocessor=StandardScaler,
...     postprocessor=NoPostprocessor,
...     thresholder=StaticThreshold
... )
>>> param_space = {'profile_size': [10, 20, 50], 'method_contamination': [0.01, 0.05]}
>>> experiment = AutoProfileSemiSupervisedPdMExperiment(
...     experiment_name='demo-auto-profile',
...     pipeline=pipeline,
...     param_space=param_space,
...     num_iteration=30,
...     n_jobs=4
... )
>>> results = experiment.execute()
>>> print(f"Best profile size: {results['best_params']['profile_size']}")
Best profile size: 20

See also

  • pdmlabs.pipeline: PdMPipeline and data contract definition

  • pdmlabs.method: Available anomaly detection methods

  • pdmlabs.preprocessing, postprocessing, thresholding: Pipeline components

  • pdmlabs.evaluation: PdM-aware evaluation metrics

  • pdmlabs.mango: Mango tuner configuration

class pdmlabs.experiment.PdMExperiment(experiment_name: str, pipeline: PdMPipeline, param_space: dict, constraint_function: Callable = None, target_data: list[DataFrame] = None, target_sources: list[str] = None, historic_data: list[DataFrame] = [], historic_sources: list[str] = [], optimization_param: str = 'AD1_AUC', initial_random: int = 2, num_iteration: int = 20, batch_size: int = 1, n_jobs: int = 1, random_state: int = 42, random_n_tries: int = 3, constraint_max_retries: int = 10, historic_data_header: str = 'infer', target_data_header: str = 'infer', artifacts: str = 'artifacts', debug: bool = False, delay: float = None, log_best_scores: bool = False, maximize: bool = True, custom_evaluators: list = None)#

Bases: ABC

Base abstract class for all predictive maintenance experiment flavors.

This class orchestrates the automated execution of anomaly detection experiments using Bayesian optimization (via Mango) to search parameter spaces and MLflow for run tracking and reproducibility.

An experiment combines a PdMPipeline (which defines the processing steps) with a parameter space to search over. It performs hyperparameter optimization by:

  1. Registering an MLflow experiment

  2. Running objective evaluations with different parameter combinations

  3. Training, predicting, and evaluating across train/test splits

  4. Returning the best found parameters and their performance metrics

Concrete implementations (e.g., AutoProfileSemiSupervisedPdMExperiment, SupervisedPdMExperiment) override the abstract execute() method to implement experiment-specific logic (e.g., semi-supervised, supervised, RUL prediction).

experiment_name#

Name of the experiment (MLflow experiment identifier).

Type:

str

pipeline#

Pipeline defining dataset, preprocessing, method, postprocessing, and thresholding steps.

Type:

PdMPipeline

param_space#

Parameter space for Mango optimization. Keys are parameter names (e.g., ‘method_alpha’, ‘preprocessor_scale’), values are parameter ranges.

Type:

dict

optimization_param#

Metric to optimize (‘AD1_AUC’, ‘AD2_AUC’, ‘AD3_AUC’, etc).

Type:

str

initial_random#

Number of initial random exploration steps before Bayesian optimization.

Type:

int

num_iteration#

Total number of optimization iterations.

Type:

int

n_jobs#

Number of parallel jobs for optimization.

Type:

int

random_state#

Random seed for reproducibility.

Type:

int

maximize#

Whether to maximize (True) or minimize (False) optimization_param.

Type:

bool

debug#

If True, generates debug plots and logs them to MLflow.

Type:

bool

event_data#

Event mappings from the pipeline (failures, resets, sources).

Raises:
  • ValueError – If required dataset keys are missing (e.g., ‘anomaly_labels’ for supervised).

  • IncompatibleMethodException – If the selected method is incompatible with the experiment flavor.

abstract execute() dict#

Execute the parameter optimization loop and return results.

This method must be implemented by subclasses to define experiment-specific logic (e.g., semi-supervised, supervised, RUL prediction). It typically:

  1. Uses Mango tuner to search the parameter space

  2. For each parameter combination: - Creates pipeline components (method, preprocessor, postprocessor, thresholder) - Fits on historic/training data - Predicts on target/test data - Evaluates using PdM metrics

  3. Returns the best parameters and their performance

Returns:

Result dictionary containing:
  • ’best_params’: dict of best found parameters

  • ’best_objective’: best optimization metric value

  • ’th’: best threshold value

  • Additional experiment-specific results (e.g., ‘per_method’ for batch flavors)

Return type:

dict

Raises:

NotImplementedError – This is an abstract method and must be overridden.

Examples

See subclasses like SemiSupervisedPdMExperiment.execute() for concrete examples.

plot_SA_of_RUL(plot_test_preds, result_labels, is_rtf)#

Generate and log RUL survival analysis plots with predictions vs labels.

For each test set, overlays predicted RUL trajectories against ground-truth labels. Color indicates predicted status (red for failure, black for normal).

Parameters:
  • plot_test_preds (list) – List of prediction arrays, one per target source.

  • result_labels (list) – Corresponding ground-truth RUL label arrays.

  • is_rtf (list) – Run-to-failure flags, one per array (1=RTF scenario, 0=otherwise).

Examples

>>> preds = [[[10, 20, 15], [100, 101, 102]], ...]
>>> labels = [[[5, 4, 3], [98, 97, 96]], ...]
>>> flags = [1, 0, ...]
>>> experiment.plot_SA_of_RUL(preds, labels, flags)

Modules

batch

Batch experiment classes for offline/retrospective PdM evaluation.

experiment

streaming

Streaming experiment classes for online/real-time PdM evaluation (experimental).