pdmlabs.experiment.experiment#

Functions

process_data(current_data, header, data_type)

Process and normalize data inputs to a standardized format.

Classes

PdMExperiment(experiment_name, pipeline, ...)

Base abstract class for all predictive maintenance experiment flavors.

class pdmlabs.experiment.experiment.PdMExperiment(experiment_name: str, pipeline: PdMPipeline, param_space: dict, constraint_function: Callable = None, target_data: list[DataFrame] = None, target_sources: list[str] = None, historic_data: list[DataFrame] = [], historic_sources: list[str] = [], optimization_param: str = 'AD1_AUC', initial_random: int = 2, num_iteration: int = 20, batch_size: int = 1, n_jobs: int = 1, random_state: int = 42, random_n_tries: int = 3, constraint_max_retries: int = 10, historic_data_header: str = 'infer', target_data_header: str = 'infer', artifacts: str = 'artifacts', debug: bool = False, delay: float = None, log_best_scores: bool = False, maximize: bool = True, custom_evaluators: list = None)#

Bases: ABC

Base abstract class for all predictive maintenance experiment flavors.

This class orchestrates the automated execution of anomaly detection experiments using Bayesian optimization (via Mango) to search parameter spaces and MLflow for run tracking and reproducibility.

An experiment combines a PdMPipeline (which defines the processing steps) with a parameter space to search over. It performs hyperparameter optimization by:

  1. Registering an MLflow experiment

  2. Running objective evaluations with different parameter combinations

  3. Training, predicting, and evaluating across train/test splits

  4. Returning the best found parameters and their performance metrics

Concrete implementations (e.g., AutoProfileSemiSupervisedPdMExperiment, SupervisedPdMExperiment) override the abstract execute() method to implement experiment-specific logic (e.g., semi-supervised, supervised, RUL prediction).

experiment_name#

Name of the experiment (MLflow experiment identifier).

Type:

str

pipeline#

Pipeline defining dataset, preprocessing, method, postprocessing, and thresholding steps.

Type:

PdMPipeline

param_space#

Parameter space for Mango optimization. Keys are parameter names (e.g., ‘method_alpha’, ‘preprocessor_scale’), values are parameter ranges.

Type:

dict

optimization_param#

Metric to optimize (‘AD1_AUC’, ‘AD2_AUC’, ‘AD3_AUC’, etc).

Type:

str

initial_random#

Number of initial random exploration steps before Bayesian optimization.

Type:

int

num_iteration#

Total number of optimization iterations.

Type:

int

n_jobs#

Number of parallel jobs for optimization.

Type:

int

random_state#

Random seed for reproducibility.

Type:

int

maximize#

Whether to maximize (True) or minimize (False) optimization_param.

Type:

bool

debug#

If True, generates debug plots and logs them to MLflow.

Type:

bool

event_data#

Event mappings from the pipeline (failures, resets, sources).

Raises:
  • ValueError – If required dataset keys are missing (e.g., ‘anomaly_labels’ for supervised).

  • IncompatibleMethodException – If the selected method is incompatible with the experiment flavor.

abstract execute() dict#

Execute the parameter optimization loop and return results.

This method must be implemented by subclasses to define experiment-specific logic (e.g., semi-supervised, supervised, RUL prediction). It typically:

  1. Uses Mango tuner to search the parameter space

  2. For each parameter combination: - Creates pipeline components (method, preprocessor, postprocessor, thresholder) - Fits on historic/training data - Predicts on target/test data - Evaluates using PdM metrics

  3. Returns the best parameters and their performance

Returns:

Result dictionary containing:
  • ’best_params’: dict of best found parameters

  • ’best_objective’: best optimization metric value

  • ’th’: best threshold value

  • Additional experiment-specific results (e.g., ‘per_method’ for batch flavors)

Return type:

dict

Raises:

NotImplementedError – This is an abstract method and must be overridden.

Examples

See subclasses like SemiSupervisedPdMExperiment.execute() for concrete examples.

plot_SA_of_RUL(plot_test_preds, result_labels, is_rtf)#

Generate and log RUL survival analysis plots with predictions vs labels.

For each test set, overlays predicted RUL trajectories against ground-truth labels. Color indicates predicted status (red for failure, black for normal).

Parameters:
  • plot_test_preds (list) – List of prediction arrays, one per target source.

  • result_labels (list) – Corresponding ground-truth RUL label arrays.

  • is_rtf (list) – Run-to-failure flags, one per array (1=RTF scenario, 0=otherwise).

Examples

>>> preds = [[[10, 20, 15], [100, 101, 102]], ...]
>>> labels = [[[5, 4, 3], [98, 97, 96]], ...]
>>> flags = [1, 0, ...]
>>> experiment.plot_SA_of_RUL(preds, labels, flags)
pdmlabs.experiment.experiment.process_data(current_data, header, data_type) list[DataFrame]#

Process and normalize data inputs to a standardized format.

Converts various input formats (DataFrame, CSV file/directory, or list) into a list of DataFrames for consistent handling throughout the framework.

Parameters:
  • current_data – The data to process. Can be: - pd.DataFrame: Single DataFrame, wrapped in a list - str: Path to single CSV file or directory containing CSV files - list: List of DataFrames

  • header (str or int) – Row number(s) to use as column names when reading CSV. Passed directly to pd.read_csv. Use ‘infer’ for automatic detection.

  • data_type (str) – Name of the parameter (for error messages).

Returns:

List of DataFrames ready for processing.

Return type:

list[pd.DataFrame]

Raises:

Exception – If input type is not supported or list contains non-DataFrame elements.

Examples

>>> import pandas as pd
>>> df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})
>>> result = process_data(df, 'infer', 'test_data')
>>> print(type(result)); print(len(result))
<class 'list'>
1
>>> result = process_data('/path/to/data.csv', 'infer', 'test_data')
>>> print(type(result[0]))
<class 'pandas.core.frame.DataFrame'>
>>> result = process_data([df, df], 'infer', 'test_data')
>>> print(len(result))
2