Analysis of PdMLabs Experiment Flavors

Analysis of PdMLabs Experiment Flavors#

This document provides a detailed architectural breakdown of the different experiment flavors in PdMLabs, highlighting their unique execution flows, data dependencies, and structural differences.

1. AutoProfileSemiSupervisedPdMExperiment#

Goal: Adapt to normal behavior dynamically by profiling the initial period of each target scenario.

Data Dependencies: Focuses heavily on target_data and event logs (specifically reset and failure events). The historic_data is essentially ignored because the profile is built on the fly.
Execution Flow:
- Iterates over each target scenario.
- Slices the scenario based on reset_dates.
- For each slice, takes the first N timestamps (where N = profile_size determined by hyperparameter search) to act as the “normal profile”.
- Fits the preprocessor and method on this profile, then calls predict() on the remainder of the slice.
Structural Uniqueness: The fit() step happens inside the target prediction loop and inside the reset-date slicing loop. It dynamically splits dataframes on the fly.

2. IncrementalSemiSupervisedPdMExperiment#

Goal: Simulate online/streaming execution by updating the model incrementally over rolling windows.

Data Dependencies: target_data, plus specific hyperparameter keys: initial_incremental_window_length, incremental_window_length, and incremental_slide.
Execution Flow:
- Employs a row-by-row iteration (or chunk-by-chunk) using pandas iterrows().
- Maintains a current_window_buffer (for fitting) and a current_slide_buffer (for predicting).
- Conditionally calls method.fit() when buffers fill up, and periodically destroys/re-instantiates or updates the method/preprocessor based on the refit_new_method_object flag.
Structural Uniqueness: This flavor has a highly complex state-machine loop simulating streaming data, requiring variables like executed_initial_fit and tracking indices dynamically. It fundamentally breaks the standard “fit all, then predict all” batch paradigm.

3. SemiSupervisedPdMExperiment#

Goal: Train on clean historic data, predict on new target data.

Data Dependencies: Requires both historic_data and target_data, relying on the match_sources dictionary to map target sources to the correct trained historic model.
Execution Flow:
- Fits the preprocessor and method globally on all historic_data at the start.
- Loops over target_data and calls predict() using the corresponding mapped source.
- Breaks predictions into episodes based on failure/reset dates.
Structural Uniqueness: While this is the most “standard” machine learning flow, its reliance on match_sources to map source A to source B during the prediction phase is unique compared to unsupervised or auto-profile flows.

4. UnsupervisedPdMExperiment#

Goal: Detect anomalies without any labeled data or clean historical baselines.

Data Dependencies: Purely relies on target_data.
Execution Flow:
- Completely skips the fit() phase for the method.
- Iterates over target_data and directly calls preprocessor.transform() and method.predict().
Structural Uniqueness: The complete absence of the fit() step.

5. SupervisedPdMExperiment#

Goal: Standard classification workflow using explicit anomaly labels.

Data Dependencies: Hard dependency on anomaly_labels existing in the pipeline dataset, matching the length and shape of historic_data.
Execution Flow:
- Similar to Semi-Supervised, but explicitly passes anomaly_labels to the fit() methods of the preprocessor, method, and postprocessor.
Structural Uniqueness: The fit() signature requires the anomaly_labels argument. It also includes strict assertions to guarantee label array lengths match the dataframe lengths before starting the optimization objective.

6. SupervisedRULPdMExperiment#

Goal: Predict Remaining Useful Life (RUL) as a continuous regression task.

Data Dependencies: Requires anomaly_labels (which in this context act as historic RUL labels) and target_labels (ground truth RUL for evaluation). Checks for is_failure flags to identify run-to-failure scenarios.
Execution Flow:
- Fits using labels (like Supervised).
- During prediction, it extracts the predictions and bypasses the standard episode-splitting logic, grouping entire scenarios together.
- Uses rul_evaluate for evaluation (which calculates regression metrics like MAE/RMSE) instead of classification/AUC metrics.
Structural Uniqueness: Replaces the evaluation engine and bypasses episode splitting, tracking run-to-failure (rtf) metadata for plotting and metrics.

7. Supervised_SA_PdMExperiment#

Goal: Frame predictive maintenance as a Survival Analysis problem.

Data Dependencies: Same as RUL, requires target labels and run-to-failure metadata.
Execution Flow:
- Fits models with labels.
- Generates survival predictions.
- Crucially, it calls thresholder.fit() after generating all target predictions, essentially using the target set (or a validation set) to learn a mapping from survival scores to RUL predictions.
- Uses surv_evaluate for computing concordance and other survival metrics.
Structural Uniqueness: The thresholder is fitted post-prediction on the aggregated target scores, unlike anomaly detection where thresholding is often determined per-episode or inferred statically. Evaluates using a completely different mathematical domain (survival metrics).

Conclusion on Modularity#

Pushing the entire execution loop into PdMExperiment is practically impossible because:

Loop Paradigms: Incremental uses row-by-row streaming simulation, AutoProfile slices dataframes on the fly, while the rest process full batches.
Fitting Timings: Unsupervised never fits, Semi-Supervised fits globally at the start, AutoProfile fits inside the target loop, and Survival Analysis fits the thresholder at the very end.
Data Signatures: Supervised flavors require passing labels to fit(); others do not.
Evaluation Routines: Flavors branch into standard AUC evaluation, RUL evaluation, or Survival evaluation, with different data formatting requirements.