pdmlabs.experiment#
Experiment classes for automated PdM model evaluation and hyperparameter tuning.
This module provides the experiment framework that orchestrates: - Parameter space exploration via Mango (Bayesian/random optimization) - Cross-validation and temporal evaluation - MLflow tracking and artifact management - PdM-aware metric computation
Core Abstractions:
- PdMExperiment (experiment.py)
Abstract base class for all experiment flavors. Defines common interface: execute() -> dict
- Batch Experiments (batch/)
Designed for offline/retrospective evaluation. Best for production validation and performance reporting. - AutoProfileSemiSupervisedPdMExperiment: Auto-tuned profile size - SemiSupervisedPdMExperiment: Per-scenario invariance - SupervisedPdMExperiment: Labeled data training - UnsupervisedPdMExperiment: No labels, pattern-based detection - IncrementalSemiSupervisedPdMExperiment: Online-style incremental fitting - SupervisedRULPdMExperiment: Remaining useful life regression - Supervised_SA_PdMExperiment: Survival analysis
- Streaming Experiments (streaming/)
Early-stage stubs for real-time scenarios. Currently not production-ready; use batch for now. - StreamingSemiSupervisedPdMExperiment - StreamingUnsupervisedPdMExperiment
Typical Workflow:
Prepare dataset (dict with features, labels, events, sources)
Create PdMPipeline specifying method, preprocessor, postprocessor, thresholder
Define param_space for Mango optimization
Choose experiment flavor (e.g., AutoProfileSemiSupervisedPdMExperiment)
Call experiment.execute() to run optimization
Access best_params and metrics from result dict
View runs in MLflow UI
Example
>>> from pdmlabs.pipeline.pipeline import PdMPipeline
>>> from pdmlabs.experiment.batch import AutoProfileSemiSupervisedPdMExperiment
>>>
>>> pipeline = PdMPipeline(
... dataset=my_dataset,
... method=IsolationForest,
... preprocessor=StandardScaler,
... postprocessor=NoPostprocessor,
... thresholder=StaticThreshold
... )
>>> param_space = {'profile_size': [10, 20, 50], 'method_contamination': [0.01, 0.05]}
>>> experiment = AutoProfileSemiSupervisedPdMExperiment(
... experiment_name='demo-auto-profile',
... pipeline=pipeline,
... param_space=param_space,
... num_iteration=30,
... n_jobs=4
... )
>>> results = experiment.execute()
>>> print(f"Best profile size: {results['best_params']['profile_size']}")
Best profile size: 20
See also
pdmlabs.pipeline: PdMPipeline and data contract definition
pdmlabs.method: Available anomaly detection methods
pdmlabs.preprocessing, postprocessing, thresholding: Pipeline components
pdmlabs.evaluation: PdM-aware evaluation metrics
pdmlabs.mango: Mango tuner configuration
- class pdmlabs.experiment.PdMExperiment(experiment_name: str, pipeline: PdMPipeline, param_space: dict, constraint_function: Callable = None, target_data: list[DataFrame] = None, target_sources: list[str] = None, historic_data: list[DataFrame] = [], historic_sources: list[str] = [], optimization_param: str = 'AD1_AUC', initial_random: int = 2, num_iteration: int = 20, batch_size: int = 1, n_jobs: int = 1, random_state: int = 42, random_n_tries: int = 3, constraint_max_retries: int = 10, historic_data_header: str = 'infer', target_data_header: str = 'infer', artifacts: str = 'artifacts', debug: bool = False, delay: float = None, log_best_scores: bool = False, maximize: bool = True, custom_evaluators: list = None)#
Bases:
ABCBase abstract class for all predictive maintenance experiment flavors.
This class orchestrates the automated execution of anomaly detection experiments using Bayesian optimization (via Mango) to search parameter spaces and MLflow for run tracking and reproducibility.
An experiment combines a PdMPipeline (which defines the processing steps) with a parameter space to search over. It performs hyperparameter optimization by:
Registering an MLflow experiment
Running objective evaluations with different parameter combinations
Training, predicting, and evaluating across train/test splits
Returning the best found parameters and their performance metrics
Concrete implementations (e.g., AutoProfileSemiSupervisedPdMExperiment, SupervisedPdMExperiment) override the abstract execute() method to implement experiment-specific logic (e.g., semi-supervised, supervised, RUL prediction).
- experiment_name#
Name of the experiment (MLflow experiment identifier).
- Type:
str
- pipeline#
Pipeline defining dataset, preprocessing, method, postprocessing, and thresholding steps.
- Type:
- param_space#
Parameter space for Mango optimization. Keys are parameter names (e.g., ‘method_alpha’, ‘preprocessor_scale’), values are parameter ranges.
- Type:
dict
- optimization_param#
Metric to optimize (‘AD1_AUC’, ‘AD2_AUC’, ‘AD3_AUC’, etc).
- Type:
str
- initial_random#
Number of initial random exploration steps before Bayesian optimization.
- Type:
int
- num_iteration#
Total number of optimization iterations.
- Type:
int
- n_jobs#
Number of parallel jobs for optimization.
- Type:
int
- random_state#
Random seed for reproducibility.
- Type:
int
- maximize#
Whether to maximize (True) or minimize (False) optimization_param.
- Type:
bool
- debug#
If True, generates debug plots and logs them to MLflow.
- Type:
bool
- event_data#
Event mappings from the pipeline (failures, resets, sources).
- Raises:
ValueError – If required dataset keys are missing (e.g., ‘anomaly_labels’ for supervised).
IncompatibleMethodException – If the selected method is incompatible with the experiment flavor.
- abstract execute() dict#
Execute the parameter optimization loop and return results.
This method must be implemented by subclasses to define experiment-specific logic (e.g., semi-supervised, supervised, RUL prediction). It typically:
Uses Mango tuner to search the parameter space
For each parameter combination: - Creates pipeline components (method, preprocessor, postprocessor, thresholder) - Fits on historic/training data - Predicts on target/test data - Evaluates using PdM metrics
Returns the best parameters and their performance
- Returns:
- Result dictionary containing:
’best_params’: dict of best found parameters
’best_objective’: best optimization metric value
’th’: best threshold value
Additional experiment-specific results (e.g., ‘per_method’ for batch flavors)
- Return type:
dict
- Raises:
NotImplementedError – This is an abstract method and must be overridden.
Examples
See subclasses like SemiSupervisedPdMExperiment.execute() for concrete examples.
- plot_SA_of_RUL(plot_test_preds, result_labels, is_rtf)#
Generate and log RUL survival analysis plots with predictions vs labels.
For each test set, overlays predicted RUL trajectories against ground-truth labels. Color indicates predicted status (red for failure, black for normal).
- Parameters:
plot_test_preds (list) – List of prediction arrays, one per target source.
result_labels (list) – Corresponding ground-truth RUL label arrays.
is_rtf (list) – Run-to-failure flags, one per array (1=RTF scenario, 0=otherwise).
Examples
>>> preds = [[[10, 20, 15], [100, 101, 102]], ...] >>> labels = [[[5, 4, 3], [98, 97, 96]], ...] >>> flags = [1, 0, ...] >>> experiment.plot_SA_of_RUL(preds, labels, flags)
Modules
Batch experiment classes for offline/retrospective PdM evaluation. |
|
Streaming experiment classes for online/real-time PdM evaluation (experimental). |