pdmlabs.experiment.experiment#
Functions
|
Process and normalize data inputs to a standardized format. |
Classes
|
Base abstract class for all predictive maintenance experiment flavors. |
- class pdmlabs.experiment.experiment.PdMExperiment(experiment_name: str, pipeline: PdMPipeline, param_space: dict, constraint_function: Callable = None, target_data: list[DataFrame] = None, target_sources: list[str] = None, historic_data: list[DataFrame] = [], historic_sources: list[str] = [], optimization_param: str = 'AD1_AUC', initial_random: int = 2, num_iteration: int = 20, batch_size: int = 1, n_jobs: int = 1, random_state: int = 42, random_n_tries: int = 3, constraint_max_retries: int = 10, historic_data_header: str = 'infer', target_data_header: str = 'infer', artifacts: str = 'artifacts', debug: bool = False, delay: float = None, log_best_scores: bool = False, maximize: bool = True, custom_evaluators: list = None)#
Bases:
ABCBase abstract class for all predictive maintenance experiment flavors.
This class orchestrates the automated execution of anomaly detection experiments using Bayesian optimization (via Mango) to search parameter spaces and MLflow for run tracking and reproducibility.
An experiment combines a PdMPipeline (which defines the processing steps) with a parameter space to search over. It performs hyperparameter optimization by:
Registering an MLflow experiment
Running objective evaluations with different parameter combinations
Training, predicting, and evaluating across train/test splits
Returning the best found parameters and their performance metrics
Concrete implementations (e.g., AutoProfileSemiSupervisedPdMExperiment, SupervisedPdMExperiment) override the abstract execute() method to implement experiment-specific logic (e.g., semi-supervised, supervised, RUL prediction).
- experiment_name#
Name of the experiment (MLflow experiment identifier).
- Type:
str
- pipeline#
Pipeline defining dataset, preprocessing, method, postprocessing, and thresholding steps.
- Type:
- param_space#
Parameter space for Mango optimization. Keys are parameter names (e.g., ‘method_alpha’, ‘preprocessor_scale’), values are parameter ranges.
- Type:
dict
- optimization_param#
Metric to optimize (‘AD1_AUC’, ‘AD2_AUC’, ‘AD3_AUC’, etc).
- Type:
str
- initial_random#
Number of initial random exploration steps before Bayesian optimization.
- Type:
int
- num_iteration#
Total number of optimization iterations.
- Type:
int
- n_jobs#
Number of parallel jobs for optimization.
- Type:
int
- random_state#
Random seed for reproducibility.
- Type:
int
- maximize#
Whether to maximize (True) or minimize (False) optimization_param.
- Type:
bool
- debug#
If True, generates debug plots and logs them to MLflow.
- Type:
bool
- event_data#
Event mappings from the pipeline (failures, resets, sources).
- Raises:
ValueError – If required dataset keys are missing (e.g., ‘anomaly_labels’ for supervised).
IncompatibleMethodException – If the selected method is incompatible with the experiment flavor.
- abstract execute() dict#
Execute the parameter optimization loop and return results.
This method must be implemented by subclasses to define experiment-specific logic (e.g., semi-supervised, supervised, RUL prediction). It typically:
Uses Mango tuner to search the parameter space
For each parameter combination: - Creates pipeline components (method, preprocessor, postprocessor, thresholder) - Fits on historic/training data - Predicts on target/test data - Evaluates using PdM metrics
Returns the best parameters and their performance
- Returns:
- Result dictionary containing:
’best_params’: dict of best found parameters
’best_objective’: best optimization metric value
’th’: best threshold value
Additional experiment-specific results (e.g., ‘per_method’ for batch flavors)
- Return type:
dict
- Raises:
NotImplementedError – This is an abstract method and must be overridden.
Examples
See subclasses like SemiSupervisedPdMExperiment.execute() for concrete examples.
- plot_SA_of_RUL(plot_test_preds, result_labels, is_rtf)#
Generate and log RUL survival analysis plots with predictions vs labels.
For each test set, overlays predicted RUL trajectories against ground-truth labels. Color indicates predicted status (red for failure, black for normal).
- Parameters:
plot_test_preds (list) – List of prediction arrays, one per target source.
result_labels (list) – Corresponding ground-truth RUL label arrays.
is_rtf (list) – Run-to-failure flags, one per array (1=RTF scenario, 0=otherwise).
Examples
>>> preds = [[[10, 20, 15], [100, 101, 102]], ...] >>> labels = [[[5, 4, 3], [98, 97, 96]], ...] >>> flags = [1, 0, ...] >>> experiment.plot_SA_of_RUL(preds, labels, flags)
- pdmlabs.experiment.experiment.process_data(current_data, header, data_type) list[DataFrame]#
Process and normalize data inputs to a standardized format.
Converts various input formats (DataFrame, CSV file/directory, or list) into a list of DataFrames for consistent handling throughout the framework.
- Parameters:
current_data – The data to process. Can be: - pd.DataFrame: Single DataFrame, wrapped in a list - str: Path to single CSV file or directory containing CSV files - list: List of DataFrames
header (str or int) – Row number(s) to use as column names when reading CSV. Passed directly to pd.read_csv. Use ‘infer’ for automatic detection.
data_type (str) – Name of the parameter (for error messages).
- Returns:
List of DataFrames ready for processing.
- Return type:
list[pd.DataFrame]
- Raises:
Exception – If input type is not supported or list contains non-DataFrame elements.
Examples
>>> import pandas as pd >>> df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]}) >>> result = process_data(df, 'infer', 'test_data') >>> print(type(result)); print(len(result)) <class 'list'> 1
>>> result = process_data('/path/to/data.csv', 'infer', 'test_data') >>> print(type(result[0])) <class 'pandas.core.frame.DataFrame'>
>>> result = process_data([df, df], 'infer', 'test_data') >>> print(len(result)) 2