pdmlabs.experiment.batch#

Batch experiment classes for offline/retrospective PdM evaluation.

Batch experiments are designed for comprehensive, offline evaluation of anomaly detection and remaining useful life (RUL) prediction models. They perform:

  • Complete parameter space exploration (Mango optimization)

  • Temporal cross-validation (train on past, test on future)

  • Full dataset evaluation before deployment

Available Experiment Flavors:

AutoProfileSemiSupervisedPdMExperiment

Auto-tunes the size of the “normal profile” (initial N timesteps). Best for: Scenarios with clear startup transients where initial behavior characterizes normal operation.

SemiSupervisedPdMExperiment

Fits method independently on each target scenario. Best for: Multiple independent test scenarios; adapts to local patterns.

SupervisedPdMExperiment

Uses labeled anomaly windows to train; train-once, test-many. Best for: Well-labeled historic data; consistent training across tests.

UnsupervisedPdMExperiment

No labels; learns patterns from data alone. Best for: Early-stage PdM without failure labels; baseline comparisons.

IncrementalSemiSupervisedPdMExperiment

Processes data incrementally with optional model retraining. Best for: Simulating online behavior; testing concept drift.

SupervisedRULPdMExperiment

Predicts remaining useful life (continuous regression). Best for: RUL-aware maintenance scheduling; time-to-failure estimation.

Supervised_SA_PdMExperiment

Survival analysis for failure time prediction. Best for: Complex failure dynamics; competing risks scenarios.

Choosing an Experiment Flavor:

  1. Do you have labeled anomaly/failure data? - Yes → SupervisedPdMExperiment - No → Go to step 2

  2. Is the initial portion of data clearly “normal”? - Yes → AutoProfileSemiSupervisedPdMExperiment - No → SemiSupervisedPdMExperiment

  3. Are you predicting discrete anomalies or continuous RUL? - RUL → SupervisedRULPdMExperiment (with labeled RUL data) - Anomalies → Choose from above

Typical Pattern:

>>> from pdmlabs.experiment.batch import AutoProfileSemiSupervisedPdMExperiment
>>> experiment = AutoProfileSemiSupervisedPdMExperiment(
...     experiment_name='my-battery-pd',
...     pipeline=pipeline,
...     param_space={...},
...     num_iteration=30,
...     n_jobs=4,
...     debug=False
... )
>>> results = experiment.execute()
>>> best_params = results['best_params']
>>> best_metric = results['best_objective']

MLflow Integration:

All batch experiments automatically log to MLflow: - Experiment grouped by name - Each parameter combination = one MLflow run - Parameters, metrics, artifacts, and models logged - Browse results in MLflow UI: mlflow ui

See also

  • pdmlabs.experiment.experiment: PdMExperiment base class

  • pdmlabs.pipeline: Define dataset and pipeline

  • pdmlabs.mango: Mango tuner configuration

class pdmlabs.experiment.batch.AutoProfileSemiSupervisedPdMExperiment(*args, **kwargs)#

Bases: PdMExperiment

Semi-supervised anomaly detection with automatic profile-based learning.

This experiment flavor implements an “auto-profiling” semi-supervised approach:

  1. For each target scenario, uses an initial profile (first N timesteps) as normal behavior

  2. Fits the anomaly detection method only on this profile

  3. Applies the fitted method to detect anomalies in the rest of the scenario

  4. Automatically determines the profile size via hyperparameter search

This is useful when: - You have unlabeled data with clear patterns at the start (normal operating condition) - You want to adapt to gradual drift without constant retraining - You have limited labeled anomaly examples

The “auto-profiling” optimization searches over profile_size (and optionally init_profile_size) to find the size of the normal behavior window that yields best performance.

pipeline#

Must have ‘failure’ or ‘reset’ events to define scenario boundaries.

Type:

PdMPipeline

param_space#

Must include ‘profile_size’ key. Example: {‘profile_size’: [10, 20, 50], ‘method_alpha’: [0.1, 0.5, 1.0]}

Type:

dict

Raises:
  • IncompatibleMethodException – If method does not implement SemiSupervisedMethodInterface.

  • ValueError – If pipeline lacks required event definitions.

Examples

>>> from pdmlabs.method.isolation_forest import IsolationForest
>>> from pdmlabs.preprocessing.no_preprocessor import NoPreprocessor
>>> # ... setup pipeline ...
>>> param_space = {
...     'profile_size': [10, 20, 50],
...     'method_alpha': [0.1, 1.0]
... }
>>> experiment = AutoProfileSemiSupervisedPdMExperiment(
...     experiment_name='auto-profile-demo',
...     pipeline=pipeline,
...     param_space=param_space,
...     num_iteration=30,
...     n_jobs=4
... )
>>> results = experiment.execute()
>>> print(f"Best profile size: {results['best_params']['profile_size']}")
Best profile size: 20
execute() dict#

Run the auto-profile semi-supervised optimization experiment.

Searches parameter space to find the best profile size and method parameters. For each combination:

  1. For each target scenario: a. Segments by reset/failure events b. Uses first N timesteps (profile_size) as normal pattern c. Fits method on profile d. Predicts on remaining data e. Applies postprocessor and thresholder

  2. Evaluates across all scenarios using PdM metrics

  3. Returns best parameters

Returns:

Result dictionary with:
  • ’best_params’: Best found parameters (includes profile_size)

  • ’best_objective’: Best metric value achieved

  • ’th’: Best threshold for decision boundary

Return type:

dict

Raises:
  • IncompatibleMethodException – If method is not SemiSupervisedMethodInterface.

  • Exception – If pipeline setup is invalid or data processing fails.

Examples

>>> experiment = AutoProfileSemiSupervisedPdMExperiment(...)
>>> results = experiment.execute()
>>> print(results['best_params']['profile_size'])
25
class pdmlabs.experiment.batch.IncrementalSemiSupervisedPdMExperiment(refit_new_method_object: bool = True, refit_new_preprocessor_object: bool = True, *args, **kwargs)#

Bases: PdMExperiment

Incremental semi-supervised anomaly detection with model retraining.

This experiment flavor implements online/incremental learning scenarios: 1. Processes data incrementally over time (not all at once) 2. Optionally retrains method and preprocessor on newly observed data 3. Maintains model adaptation as new patterns emerge

Different from semi-supervised: - Processes scenarios in temporal segments (not monolithic) - Can retrain models as new data arrives - Simulates streaming/online scenarios while using batch methods

Useful for: - Detecting concept drift without full retraining - Systems where data arrives in batches - Simulating online learning behavior - Gradual performance monitoring

refit_new_method_object#

If True, creates fresh method for increments. If False, updates same method. Default True.

Type:

bool

refit_new_preprocessor_object#

If True, creates fresh preprocessor. If False, updates same preprocessor. Default True.

Type:

bool

Raises:

Examples

>>> experiment = IncrementalSemiSupervisedPdMExperiment(
...     experiment_name='incremental-demo',
...     pipeline=pipeline,
...     param_space={'method_alpha': [0.1, 1.0]},
...     refit_new_method_object=True,
...     num_iteration=20
... )
>>> results = experiment.execute()
execute() dict#

Run incremental semi-supervised experiment with model retraining.

Processes data incrementally, optionally retraining models:

  1. For each parameter combination: a. For each target scenario’s temporal segments:

    • Fits method (optionally new instance)

    • Predicts anomaly scores

    • Applies postprocessor and thresholder

    • Evaluates current segment

    1. Aggregates across segments

  2. Returns best parameters found

Returns:

Result dictionary with best_params, best_objective, and threshold.

Return type:

dict

Raises:
class pdmlabs.experiment.batch.SemiSupervisedPdMExperiment(experiment_name: str, pipeline: PdMPipeline, param_space: dict, constraint_function: Callable = None, target_data: list[DataFrame] = None, target_sources: list[str] = None, historic_data: list[DataFrame] = [], historic_sources: list[str] = [], optimization_param: str = 'AD1_AUC', initial_random: int = 2, num_iteration: int = 20, batch_size: int = 1, n_jobs: int = 1, random_state: int = 42, random_n_tries: int = 3, constraint_max_retries: int = 10, historic_data_header: str = 'infer', target_data_header: str = 'infer', artifacts: str = 'artifacts', debug: bool = False, delay: float = None, log_best_scores: bool = False, maximize: bool = True, custom_evaluators: list = None)#

Bases: PdMExperiment

Semi-supervised anomaly detection with per-scenario learning.

This experiment flavor implements adaptive, scenario-specific anomaly detection: 1. For each target scenario, treats it as a self-contained learning problem 2. Fits method on the entire target scenario (unsupervised learning) 3. No reference data or cross-scenario knowledge transfer

Different from: - Supervised: no labels available - AutoProfile: doesn’t use an initial profile - Unsupervised fit: fits once globally; semi-supervised fits per-scenario

Useful when: - You have multiple target scenarios with different characteristics - You want scenario-specific adaptation - Scenarios are independent or have unknown relationships

Raises:

IncompatibleMethodException – If method does not implement SemiSupervisedMethodInterface.

Examples

>>> experiment = SemiSupervisedPdMExperiment(
...     experiment_name='semisup-demo',
...     pipeline=pipeline,
...     param_space={'method_n_neighbors': [5, 10, 20]},
...     num_iteration=25
... )
>>> results = experiment.execute()
execute() dict#

Run semi-supervised experiment with per-scenario fitting.

Fits method independently for each target scenario, adapting to local characteristics. For each parameter combination:

  1. For each target scenario (if segmented by reset/failure)

  2. Aggregates evaluations across scenarios

  3. Returns best parameters

Returns:

Result dictionary with best_params, best_objective, and threshold.

Return type:

dict

Raises:

IncompatibleMethodException – If method is not SemiSupervisedMethodInterface.

class pdmlabs.experiment.batch.SupervisedPdMExperiment(experiment_name: str, pipeline: PdMPipeline, param_space: dict, constraint_function: Callable = None, target_data: list[DataFrame] = None, target_sources: list[str] = None, historic_data: list[DataFrame] = [], historic_sources: list[str] = [], optimization_param: str = 'AD1_AUC', initial_random: int = 2, num_iteration: int = 20, batch_size: int = 1, n_jobs: int = 1, random_state: int = 42, random_n_tries: int = 3, constraint_max_retries: int = 10, historic_data_header: str = 'infer', target_data_header: str = 'infer', artifacts: str = 'artifacts', debug: bool = False, delay: float = None, log_best_scores: bool = False, maximize: bool = True, custom_evaluators: list = None)#

Bases: PdMExperiment

Supervised anomaly detection with labeled anomaly windows.

This experiment flavor is designed for scenarios where you have: - Training data with explicit anomaly labels (ranges or boolean arrays) - A supervised method that can learn from these labels - Target data to evaluate on

It implements a train-once, test-many approach: 1. Fits method, preprocessor, postprocessor on ALL historic labeled data (once) 2. Then applies to each target scenario independently

This differs from semi-supervised (which fits per-scenario) and ensures consistent model training across all test scenarios.

pipeline#

Must have ‘anomaly_labels’ key in dataset. Lists of arrays matching historic_data in length and dimensionality.

Type:

PdMPipeline

param_space#

Hyperparameter search space.

Type:

dict

Raises:
  • ValueError – If dataset lacks ‘anomaly_labels’ key.

  • ValueError – If anomaly_labels length does not match historic_data.

  • IncompatibleMethodException – If method does not implement SupervisedMethodInterface.

Examples

>>> dataset = {
...     'historic_data': [df_train],
...     'target_data': [df_test],
...     'anomaly_labels': [label_array],  # 1D array of same length as df_train
...     ...
... }
>>> from pdmlabs.experiment.batch.supervised_experiment import SupervisedPdMExperiment
>>> experiment = SupervisedPdMExperiment(
...     experiment_name='supervised-demo',
...     pipeline=pipeline,
...     param_space={'method_nu': [0.05, 0.1, 0.2]},
...     num_iteration=20
... )
>>> results = experiment.execute()
execute() dict#

Run supervised experiment with labeled anomaly training data.

Trains a supervised method once on all labeled historic data, then evaluates on each target scenario:

  1. Preprocesses all historic data (fit once)

  2. Fits method on all labeled historic data (single consolidated training)

  3. Fits postprocessor on labeled data

  4. For each target: a. Preprocesses using the fitted preprocessor b. Applies fitted method to get anomaly scores c. Postprocesses scores d. Thresholds to get binary predictions e. Evaluates against ground truth

  5. Returns best parameters found

This approach ensures the model is trained consistently across all test scenarios, unlike semi-supervised where the model adapts per-scenario.

Returns:

Result dictionary with:
  • ’best_params’: Best parameter combination found

  • ’best_objective’: Best metric value

  • ’th’: Best decision threshold

Return type:

dict

Raises:
  • ValueError – If anomaly_labels dimension mismatches data.

  • IncompatibleMethodException – If method is not SupervisedMethodInterface.

Examples

>>> results = experiment.execute()
>>> print(f"Best threshold: {results['th']:.3f}")
Best threshold: 0.645
class pdmlabs.experiment.batch.SupervisedRULPdMExperiment(experiment_name: str, pipeline: PdMPipeline, param_space: dict, constraint_function: Callable = None, target_data: list[DataFrame] = None, target_sources: list[str] = None, historic_data: list[DataFrame] = [], historic_sources: list[str] = [], optimization_param: str = 'AD1_AUC', initial_random: int = 2, num_iteration: int = 20, batch_size: int = 1, n_jobs: int = 1, random_state: int = 42, random_n_tries: int = 3, constraint_max_retries: int = 10, historic_data_header: str = 'infer', target_data_header: str = 'infer', artifacts: str = 'artifacts', debug: bool = False, delay: float = None, log_best_scores: bool = False, maximize: bool = True, custom_evaluators: list = None)#

Bases: PdMExperiment

Supervised Remaining Useful Life (RUL) prediction experiment.

This experiment flavor is for RUL regression where: - Target is continuous (time to failure, cycles to failure, etc.) - Method must implement SupervisedMethodInterface with RUL prediction - Evaluation metrics are regression-based (MAE, MSE, etc.) not classification

Differs from anomaly detection experiments: - Predicts continuous RUL rather than binary anomalies - Evaluation based on prediction accuracy, not detection timing - May use different postprocessing

Useful for: - Predictive maintenance with remaining life estimates - RUL-aware planning and maintenance scheduling

Raises:

IncompatibleMethodException – If method is not SupervisedMethodInterface.

Examples

>>> experiment = SupervisedRULPdMExperiment(
...     experiment_name='rul-demo',
...     pipeline=pipeline,
...     param_space={'method_fit_intercept': [True, False]},
...     optimization_param='MAE'
... )
>>> results = experiment.execute()
execute() dict#

Run supervised RUL prediction experiment.

Trains RUL regression methods and evaluates on test scenarios:

  1. Fits method on labeled historic RUL data (once)

  2. For each target scenario: a. Preprocesses target data b. Predicts RUL values c. Compares against ground truth

  3. Returns best parameters

Returns:

Result dictionary with best_params, best_objective, and metrics.

Return type:

dict

Raises:

IncompatibleMethodException – If method is not SupervisedMethodInterface.

class pdmlabs.experiment.batch.Supervised_SA_PdMExperiment(experiment_name: str, pipeline: PdMPipeline, param_space: dict, constraint_function: Callable = None, target_data: list[DataFrame] = None, target_sources: list[str] = None, historic_data: list[DataFrame] = [], historic_sources: list[str] = [], optimization_param: str = 'AD1_AUC', initial_random: int = 2, num_iteration: int = 20, batch_size: int = 1, n_jobs: int = 1, random_state: int = 42, random_n_tries: int = 3, constraint_max_retries: int = 10, historic_data_header: str = 'infer', target_data_header: str = 'infer', artifacts: str = 'artifacts', debug: bool = False, delay: float = None, log_best_scores: bool = False, maximize: bool = True, custom_evaluators: list = None)#

Bases: PdMExperiment

Supervised Survival Analysis (SA) experiment.

This experiment flavor combines supervised learning with survival analysis concepts: - Treats PdM as a survival prediction problem - Uses labeled data to train models that predict survival curves - Evaluates using survival analysis metrics (Kaplan-Meier, concordance, etc.)

Useful for: - Complex failure time modeling - Understanding failure hazards and risks - Competing risks scenarios

Raises:

IncompatibleMethodException – If method does not implement SupervisedMethodInterface.

Examples

>>> experiment = Supervised_SA_PdMExperiment(
...     experiment_name='survival-demo',
...     pipeline=pipeline,
...     param_space={'method_alpha': [0.01, 0.1, 1.0]},
...     num_iteration=20
... )
>>> results = experiment.execute()
execute() dict#

Run supervised survival analysis experiment.

Applies survival analysis techniques to PdM: 1. Fits method on labeled historic data 2. For each target scenario:

  1. Fits model to scenario-specific data

  2. Generates survival predictions

  3. Evaluates against ground truth

  1. Returns best parameters

Returns:

Result dictionary with best_params, best_objective, and SA metrics.

Return type:

dict

Raises:

IncompatibleMethodException – If method is not SupervisedMethodInterface.

class pdmlabs.experiment.batch.UnsupervisedPdMExperiment(experiment_name: str, pipeline: PdMPipeline, param_space: dict, constraint_function: Callable = None, target_data: list[DataFrame] = None, target_sources: list[str] = None, historic_data: list[DataFrame] = [], historic_sources: list[str] = [], optimization_param: str = 'AD1_AUC', initial_random: int = 2, num_iteration: int = 20, batch_size: int = 1, n_jobs: int = 1, random_state: int = 42, random_n_tries: int = 3, constraint_max_retries: int = 10, historic_data_header: str = 'infer', target_data_header: str = 'infer', artifacts: str = 'artifacts', debug: bool = False, delay: float = None, log_best_scores: bool = False, maximize: bool = True, custom_evaluators: list = None)#

Bases: PdMExperiment

Unsupervised anomaly detection without any labeled data.

This experiment flavor is designed for scenarios where: - You have no labeled anomaly data - You want the method to learn patterns from the entire dataset - Each target scenario is evaluated independently

The approach fits the method on a per-scenario basis (similar to semi-supervised) but without leveraging any labels. The method must determine anomalies based on statistical or distributional properties alone.

Suitable for: - Early-stage PdM where no failure labels exist - Discovering unknown failure modes - Baseline comparisons

Raises:

IncompatibleMethodException – If method does not implement UnsupervisedMethodInterface.

Examples

>>> from pdmlabs.method.isolation_forest import IsolationForest
>>> experiment = UnsupervisedPdMExperiment(
...     experiment_name='unsupervised-demo',
...     pipeline=pipeline,
...     param_space={'method_contamination': [0.01, 0.05, 0.1]},
...     num_iteration=20
... )
>>> results = experiment.execute()
execute() dict#

Run unsupervised experiment without labeled training data.

For each parameter combination: 1. For each target scenario (segmented by reset/failure events):

  1. Fits method using only the entire scenario data (unsupervised)

  2. Predicts anomaly scores

  3. Applies postprocessor and thresholder

  1. Evaluates using PdM metrics to find best parameters

Returns:

Result dictionary with best_params, best_objective, and threshold.

Return type:

dict

Raises:

IncompatibleMethodException – If method is not UnsupervisedMethodInterface.

Modules