pdmlabs.experiment.batch#
Batch experiment classes for offline/retrospective PdM evaluation.
Batch experiments are designed for comprehensive, offline evaluation of anomaly detection and remaining useful life (RUL) prediction models. They perform:
Complete parameter space exploration (Mango optimization)
Temporal cross-validation (train on past, test on future)
Full dataset evaluation before deployment
Available Experiment Flavors:
- AutoProfileSemiSupervisedPdMExperiment
Auto-tunes the size of the “normal profile” (initial N timesteps). Best for: Scenarios with clear startup transients where initial behavior characterizes normal operation.
- SemiSupervisedPdMExperiment
Fits method independently on each target scenario. Best for: Multiple independent test scenarios; adapts to local patterns.
- SupervisedPdMExperiment
Uses labeled anomaly windows to train; train-once, test-many. Best for: Well-labeled historic data; consistent training across tests.
- UnsupervisedPdMExperiment
No labels; learns patterns from data alone. Best for: Early-stage PdM without failure labels; baseline comparisons.
- IncrementalSemiSupervisedPdMExperiment
Processes data incrementally with optional model retraining. Best for: Simulating online behavior; testing concept drift.
- SupervisedRULPdMExperiment
Predicts remaining useful life (continuous regression). Best for: RUL-aware maintenance scheduling; time-to-failure estimation.
- Supervised_SA_PdMExperiment
Survival analysis for failure time prediction. Best for: Complex failure dynamics; competing risks scenarios.
Choosing an Experiment Flavor:
Do you have labeled anomaly/failure data? - Yes → SupervisedPdMExperiment - No → Go to step 2
Is the initial portion of data clearly “normal”? - Yes → AutoProfileSemiSupervisedPdMExperiment - No → SemiSupervisedPdMExperiment
Are you predicting discrete anomalies or continuous RUL? - RUL → SupervisedRULPdMExperiment (with labeled RUL data) - Anomalies → Choose from above
Typical Pattern:
>>> from pdmlabs.experiment.batch import AutoProfileSemiSupervisedPdMExperiment
>>> experiment = AutoProfileSemiSupervisedPdMExperiment(
... experiment_name='my-battery-pd',
... pipeline=pipeline,
... param_space={...},
... num_iteration=30,
... n_jobs=4,
... debug=False
... )
>>> results = experiment.execute()
>>> best_params = results['best_params']
>>> best_metric = results['best_objective']
MLflow Integration:
All batch experiments automatically log to MLflow: - Experiment grouped by name - Each parameter combination = one MLflow run - Parameters, metrics, artifacts, and models logged - Browse results in MLflow UI: mlflow ui
See also
pdmlabs.experiment.experiment: PdMExperiment base class
pdmlabs.pipeline: Define dataset and pipeline
pdmlabs.mango: Mango tuner configuration
- class pdmlabs.experiment.batch.AutoProfileSemiSupervisedPdMExperiment(*args, **kwargs)#
Bases:
PdMExperimentSemi-supervised anomaly detection with automatic profile-based learning.
This experiment flavor implements an “auto-profiling” semi-supervised approach:
For each target scenario, uses an initial profile (first N timesteps) as normal behavior
Fits the anomaly detection method only on this profile
Applies the fitted method to detect anomalies in the rest of the scenario
Automatically determines the profile size via hyperparameter search
This is useful when: - You have unlabeled data with clear patterns at the start (normal operating condition) - You want to adapt to gradual drift without constant retraining - You have limited labeled anomaly examples
The “auto-profiling” optimization searches over profile_size (and optionally init_profile_size) to find the size of the normal behavior window that yields best performance.
- pipeline#
Must have ‘failure’ or ‘reset’ events to define scenario boundaries.
- Type:
- param_space#
Must include ‘profile_size’ key. Example: {‘profile_size’: [10, 20, 50], ‘method_alpha’: [0.1, 0.5, 1.0]}
- Type:
dict
- Raises:
IncompatibleMethodException – If method does not implement SemiSupervisedMethodInterface.
ValueError – If pipeline lacks required event definitions.
Examples
>>> from pdmlabs.method.isolation_forest import IsolationForest >>> from pdmlabs.preprocessing.no_preprocessor import NoPreprocessor >>> # ... setup pipeline ... >>> param_space = { ... 'profile_size': [10, 20, 50], ... 'method_alpha': [0.1, 1.0] ... } >>> experiment = AutoProfileSemiSupervisedPdMExperiment( ... experiment_name='auto-profile-demo', ... pipeline=pipeline, ... param_space=param_space, ... num_iteration=30, ... n_jobs=4 ... ) >>> results = experiment.execute() >>> print(f"Best profile size: {results['best_params']['profile_size']}") Best profile size: 20
- execute() dict#
Run the auto-profile semi-supervised optimization experiment.
Searches parameter space to find the best profile size and method parameters. For each combination:
For each target scenario: a. Segments by reset/failure events b. Uses first N timesteps (profile_size) as normal pattern c. Fits method on profile d. Predicts on remaining data e. Applies postprocessor and thresholder
Evaluates across all scenarios using PdM metrics
Returns best parameters
- Returns:
- Result dictionary with:
’best_params’: Best found parameters (includes profile_size)
’best_objective’: Best metric value achieved
’th’: Best threshold for decision boundary
- Return type:
dict
- Raises:
IncompatibleMethodException – If method is not SemiSupervisedMethodInterface.
Exception – If pipeline setup is invalid or data processing fails.
Examples
>>> experiment = AutoProfileSemiSupervisedPdMExperiment(...) >>> results = experiment.execute() >>> print(results['best_params']['profile_size']) 25
- class pdmlabs.experiment.batch.IncrementalSemiSupervisedPdMExperiment(refit_new_method_object: bool = True, refit_new_preprocessor_object: bool = True, *args, **kwargs)#
Bases:
PdMExperimentIncremental semi-supervised anomaly detection with model retraining.
This experiment flavor implements online/incremental learning scenarios: 1. Processes data incrementally over time (not all at once) 2. Optionally retrains method and preprocessor on newly observed data 3. Maintains model adaptation as new patterns emerge
Different from semi-supervised: - Processes scenarios in temporal segments (not monolithic) - Can retrain models as new data arrives - Simulates streaming/online scenarios while using batch methods
Useful for: - Detecting concept drift without full retraining - Systems where data arrives in batches - Simulating online learning behavior - Gradual performance monitoring
- refit_new_method_object#
If True, creates fresh method for increments. If False, updates same method. Default True.
- Type:
bool
- refit_new_preprocessor_object#
If True, creates fresh preprocessor. If False, updates same preprocessor. Default True.
- Type:
bool
- Raises:
IncompatibleMethodException – If method does not implement SemiSupervisedMethodInterface.
ShortScenarioLengthException – If scenario is too short for incremental processing.
Examples
>>> experiment = IncrementalSemiSupervisedPdMExperiment( ... experiment_name='incremental-demo', ... pipeline=pipeline, ... param_space={'method_alpha': [0.1, 1.0]}, ... refit_new_method_object=True, ... num_iteration=20 ... ) >>> results = experiment.execute()
- execute() dict#
Run incremental semi-supervised experiment with model retraining.
Processes data incrementally, optionally retraining models:
For each parameter combination: a. For each target scenario’s temporal segments:
Fits method (optionally new instance)
Predicts anomaly scores
Applies postprocessor and thresholder
Evaluates current segment
Aggregates across segments
Returns best parameters found
- Returns:
Result dictionary with best_params, best_objective, and threshold.
- Return type:
dict
- Raises:
IncompatibleMethodException – If method is not SemiSupervisedMethodInterface.
ShortScenarioLengthException – If scenario is too short to process incrementally.
- class pdmlabs.experiment.batch.SemiSupervisedPdMExperiment(experiment_name: str, pipeline: PdMPipeline, param_space: dict, constraint_function: Callable = None, target_data: list[DataFrame] = None, target_sources: list[str] = None, historic_data: list[DataFrame] = [], historic_sources: list[str] = [], optimization_param: str = 'AD1_AUC', initial_random: int = 2, num_iteration: int = 20, batch_size: int = 1, n_jobs: int = 1, random_state: int = 42, random_n_tries: int = 3, constraint_max_retries: int = 10, historic_data_header: str = 'infer', target_data_header: str = 'infer', artifacts: str = 'artifacts', debug: bool = False, delay: float = None, log_best_scores: bool = False, maximize: bool = True, custom_evaluators: list = None)#
Bases:
PdMExperimentSemi-supervised anomaly detection with per-scenario learning.
This experiment flavor implements adaptive, scenario-specific anomaly detection: 1. For each target scenario, treats it as a self-contained learning problem 2. Fits method on the entire target scenario (unsupervised learning) 3. No reference data or cross-scenario knowledge transfer
Different from: - Supervised: no labels available - AutoProfile: doesn’t use an initial profile - Unsupervised fit: fits once globally; semi-supervised fits per-scenario
Useful when: - You have multiple target scenarios with different characteristics - You want scenario-specific adaptation - Scenarios are independent or have unknown relationships
- Raises:
IncompatibleMethodException – If method does not implement SemiSupervisedMethodInterface.
Examples
>>> experiment = SemiSupervisedPdMExperiment( ... experiment_name='semisup-demo', ... pipeline=pipeline, ... param_space={'method_n_neighbors': [5, 10, 20]}, ... num_iteration=25 ... ) >>> results = experiment.execute()
- execute() dict#
Run semi-supervised experiment with per-scenario fitting.
Fits method independently for each target scenario, adapting to local characteristics. For each parameter combination:
For each target scenario (if segmented by reset/failure)
Aggregates evaluations across scenarios
Returns best parameters
- Returns:
Result dictionary with best_params, best_objective, and threshold.
- Return type:
dict
- Raises:
IncompatibleMethodException – If method is not SemiSupervisedMethodInterface.
- class pdmlabs.experiment.batch.SupervisedPdMExperiment(experiment_name: str, pipeline: PdMPipeline, param_space: dict, constraint_function: Callable = None, target_data: list[DataFrame] = None, target_sources: list[str] = None, historic_data: list[DataFrame] = [], historic_sources: list[str] = [], optimization_param: str = 'AD1_AUC', initial_random: int = 2, num_iteration: int = 20, batch_size: int = 1, n_jobs: int = 1, random_state: int = 42, random_n_tries: int = 3, constraint_max_retries: int = 10, historic_data_header: str = 'infer', target_data_header: str = 'infer', artifacts: str = 'artifacts', debug: bool = False, delay: float = None, log_best_scores: bool = False, maximize: bool = True, custom_evaluators: list = None)#
Bases:
PdMExperimentSupervised anomaly detection with labeled anomaly windows.
This experiment flavor is designed for scenarios where you have: - Training data with explicit anomaly labels (ranges or boolean arrays) - A supervised method that can learn from these labels - Target data to evaluate on
It implements a train-once, test-many approach: 1. Fits method, preprocessor, postprocessor on ALL historic labeled data (once) 2. Then applies to each target scenario independently
This differs from semi-supervised (which fits per-scenario) and ensures consistent model training across all test scenarios.
- pipeline#
Must have ‘anomaly_labels’ key in dataset. Lists of arrays matching historic_data in length and dimensionality.
- Type:
- param_space#
Hyperparameter search space.
- Type:
dict
- Raises:
ValueError – If dataset lacks ‘anomaly_labels’ key.
ValueError – If anomaly_labels length does not match historic_data.
IncompatibleMethodException – If method does not implement SupervisedMethodInterface.
Examples
>>> dataset = { ... 'historic_data': [df_train], ... 'target_data': [df_test], ... 'anomaly_labels': [label_array], # 1D array of same length as df_train ... ... ... } >>> from pdmlabs.experiment.batch.supervised_experiment import SupervisedPdMExperiment >>> experiment = SupervisedPdMExperiment( ... experiment_name='supervised-demo', ... pipeline=pipeline, ... param_space={'method_nu': [0.05, 0.1, 0.2]}, ... num_iteration=20 ... ) >>> results = experiment.execute()
- execute() dict#
Run supervised experiment with labeled anomaly training data.
Trains a supervised method once on all labeled historic data, then evaluates on each target scenario:
Preprocesses all historic data (fit once)
Fits method on all labeled historic data (single consolidated training)
Fits postprocessor on labeled data
For each target: a. Preprocesses using the fitted preprocessor b. Applies fitted method to get anomaly scores c. Postprocesses scores d. Thresholds to get binary predictions e. Evaluates against ground truth
Returns best parameters found
This approach ensures the model is trained consistently across all test scenarios, unlike semi-supervised where the model adapts per-scenario.
- Returns:
- Result dictionary with:
’best_params’: Best parameter combination found
’best_objective’: Best metric value
’th’: Best decision threshold
- Return type:
dict
- Raises:
ValueError – If anomaly_labels dimension mismatches data.
IncompatibleMethodException – If method is not SupervisedMethodInterface.
Examples
>>> results = experiment.execute() >>> print(f"Best threshold: {results['th']:.3f}") Best threshold: 0.645
- class pdmlabs.experiment.batch.SupervisedRULPdMExperiment(experiment_name: str, pipeline: PdMPipeline, param_space: dict, constraint_function: Callable = None, target_data: list[DataFrame] = None, target_sources: list[str] = None, historic_data: list[DataFrame] = [], historic_sources: list[str] = [], optimization_param: str = 'AD1_AUC', initial_random: int = 2, num_iteration: int = 20, batch_size: int = 1, n_jobs: int = 1, random_state: int = 42, random_n_tries: int = 3, constraint_max_retries: int = 10, historic_data_header: str = 'infer', target_data_header: str = 'infer', artifacts: str = 'artifacts', debug: bool = False, delay: float = None, log_best_scores: bool = False, maximize: bool = True, custom_evaluators: list = None)#
Bases:
PdMExperimentSupervised Remaining Useful Life (RUL) prediction experiment.
This experiment flavor is for RUL regression where: - Target is continuous (time to failure, cycles to failure, etc.) - Method must implement SupervisedMethodInterface with RUL prediction - Evaluation metrics are regression-based (MAE, MSE, etc.) not classification
Differs from anomaly detection experiments: - Predicts continuous RUL rather than binary anomalies - Evaluation based on prediction accuracy, not detection timing - May use different postprocessing
Useful for: - Predictive maintenance with remaining life estimates - RUL-aware planning and maintenance scheduling
- Raises:
IncompatibleMethodException – If method is not SupervisedMethodInterface.
Examples
>>> experiment = SupervisedRULPdMExperiment( ... experiment_name='rul-demo', ... pipeline=pipeline, ... param_space={'method_fit_intercept': [True, False]}, ... optimization_param='MAE' ... ) >>> results = experiment.execute()
- execute() dict#
Run supervised RUL prediction experiment.
Trains RUL regression methods and evaluates on test scenarios:
Fits method on labeled historic RUL data (once)
For each target scenario: a. Preprocesses target data b. Predicts RUL values c. Compares against ground truth
Returns best parameters
- Returns:
Result dictionary with best_params, best_objective, and metrics.
- Return type:
dict
- Raises:
IncompatibleMethodException – If method is not SupervisedMethodInterface.
- class pdmlabs.experiment.batch.Supervised_SA_PdMExperiment(experiment_name: str, pipeline: PdMPipeline, param_space: dict, constraint_function: Callable = None, target_data: list[DataFrame] = None, target_sources: list[str] = None, historic_data: list[DataFrame] = [], historic_sources: list[str] = [], optimization_param: str = 'AD1_AUC', initial_random: int = 2, num_iteration: int = 20, batch_size: int = 1, n_jobs: int = 1, random_state: int = 42, random_n_tries: int = 3, constraint_max_retries: int = 10, historic_data_header: str = 'infer', target_data_header: str = 'infer', artifacts: str = 'artifacts', debug: bool = False, delay: float = None, log_best_scores: bool = False, maximize: bool = True, custom_evaluators: list = None)#
Bases:
PdMExperimentSupervised Survival Analysis (SA) experiment.
This experiment flavor combines supervised learning with survival analysis concepts: - Treats PdM as a survival prediction problem - Uses labeled data to train models that predict survival curves - Evaluates using survival analysis metrics (Kaplan-Meier, concordance, etc.)
Useful for: - Complex failure time modeling - Understanding failure hazards and risks - Competing risks scenarios
- Raises:
IncompatibleMethodException – If method does not implement SupervisedMethodInterface.
Examples
>>> experiment = Supervised_SA_PdMExperiment( ... experiment_name='survival-demo', ... pipeline=pipeline, ... param_space={'method_alpha': [0.01, 0.1, 1.0]}, ... num_iteration=20 ... ) >>> results = experiment.execute()
- execute() dict#
Run supervised survival analysis experiment.
Applies survival analysis techniques to PdM: 1. Fits method on labeled historic data 2. For each target scenario:
Fits model to scenario-specific data
Generates survival predictions
Evaluates against ground truth
Returns best parameters
- Returns:
Result dictionary with best_params, best_objective, and SA metrics.
- Return type:
dict
- Raises:
IncompatibleMethodException – If method is not SupervisedMethodInterface.
- class pdmlabs.experiment.batch.UnsupervisedPdMExperiment(experiment_name: str, pipeline: PdMPipeline, param_space: dict, constraint_function: Callable = None, target_data: list[DataFrame] = None, target_sources: list[str] = None, historic_data: list[DataFrame] = [], historic_sources: list[str] = [], optimization_param: str = 'AD1_AUC', initial_random: int = 2, num_iteration: int = 20, batch_size: int = 1, n_jobs: int = 1, random_state: int = 42, random_n_tries: int = 3, constraint_max_retries: int = 10, historic_data_header: str = 'infer', target_data_header: str = 'infer', artifacts: str = 'artifacts', debug: bool = False, delay: float = None, log_best_scores: bool = False, maximize: bool = True, custom_evaluators: list = None)#
Bases:
PdMExperimentUnsupervised anomaly detection without any labeled data.
This experiment flavor is designed for scenarios where: - You have no labeled anomaly data - You want the method to learn patterns from the entire dataset - Each target scenario is evaluated independently
The approach fits the method on a per-scenario basis (similar to semi-supervised) but without leveraging any labels. The method must determine anomalies based on statistical or distributional properties alone.
Suitable for: - Early-stage PdM where no failure labels exist - Discovering unknown failure modes - Baseline comparisons
- Raises:
IncompatibleMethodException – If method does not implement UnsupervisedMethodInterface.
Examples
>>> from pdmlabs.method.isolation_forest import IsolationForest >>> experiment = UnsupervisedPdMExperiment( ... experiment_name='unsupervised-demo', ... pipeline=pipeline, ... param_space={'method_contamination': [0.01, 0.05, 0.1]}, ... num_iteration=20 ... ) >>> results = experiment.execute()
- execute() dict#
Run unsupervised experiment without labeled training data.
For each parameter combination: 1. For each target scenario (segmented by reset/failure events):
Fits method using only the entire scenario data (unsupervised)
Predicts anomaly scores
Applies postprocessor and thresholder
Evaluates using PdM metrics to find best parameters
- Returns:
Result dictionary with best_params, best_objective, and threshold.
- Return type:
dict
- Raises:
IncompatibleMethodException – If method is not UnsupervisedMethodInterface.
Modules