Analysis of PdMLabs Experiment Flavors#
This document provides a detailed architectural breakdown of the different experiment flavors in PdMLabs, highlighting their unique execution flows, data dependencies, and structural differences.
1. AutoProfileSemiSupervisedPdMExperiment#
Goal: Adapt to normal behavior dynamically by profiling the initial period of each target scenario.
Data Dependencies: Focuses heavily on
target_dataand event logs (specificallyresetandfailureevents). Thehistoric_datais essentially ignored because the profile is built on the fly.Execution Flow:
Iterates over each target scenario.
Slices the scenario based on
reset_dates.For each slice, takes the first N timestamps (where N =
profile_sizedetermined by hyperparameter search) to act as the βnormal profileβ.Fits the
preprocessorandmethodon this profile, then callspredict()on the remainder of the slice.
Structural Uniqueness: The
fit()step happens inside the target prediction loop and inside the reset-date slicing loop. It dynamically splits dataframes on the fly.
2. IncrementalSemiSupervisedPdMExperiment#
Goal: Simulate online/streaming execution by updating the model incrementally over rolling windows.
Data Dependencies:
target_data, plus specific hyperparameter keys:initial_incremental_window_length,incremental_window_length, andincremental_slide.Execution Flow:
Employs a row-by-row iteration (or chunk-by-chunk) using pandas
iterrows().Maintains a
current_window_buffer(for fitting) and acurrent_slide_buffer(for predicting).Conditionally calls
method.fit()when buffers fill up, and periodically destroys/re-instantiates or updates the method/preprocessor based on therefit_new_method_objectflag.
Structural Uniqueness: This flavor has a highly complex state-machine loop simulating streaming data, requiring variables like
executed_initial_fitand tracking indices dynamically. It fundamentally breaks the standard βfit all, then predict allβ batch paradigm.
3. SemiSupervisedPdMExperiment#
Goal: Train on clean historic data, predict on new target data.
Data Dependencies: Requires both
historic_dataandtarget_data, relying on thematch_sourcesdictionary to map target sources to the correct trained historic model.Execution Flow:
Fits the
preprocessorandmethodglobally on allhistoric_dataat the start.Loops over
target_dataand callspredict()using the corresponding mapped source.Breaks predictions into episodes based on failure/reset dates.
Structural Uniqueness: While this is the most βstandardβ machine learning flow, its reliance on
match_sourcesto map source A to source B during the prediction phase is unique compared to unsupervised or auto-profile flows.
4. UnsupervisedPdMExperiment#
Goal: Detect anomalies without any labeled data or clean historical baselines.
Data Dependencies: Purely relies on
target_data.Execution Flow:
Completely skips the
fit()phase for the method.Iterates over
target_dataand directly callspreprocessor.transform()andmethod.predict().
Structural Uniqueness: The complete absence of the
fit()step.
5. SupervisedPdMExperiment#
Goal: Standard classification workflow using explicit anomaly labels.
Data Dependencies: Hard dependency on
anomaly_labelsexisting in the pipeline dataset, matching the length and shape ofhistoric_data.Execution Flow:
Similar to Semi-Supervised, but explicitly passes
anomaly_labelsto thefit()methods of the preprocessor, method, and postprocessor.
Structural Uniqueness: The
fit()signature requires theanomaly_labelsargument. It also includes strict assertions to guarantee label array lengths match the dataframe lengths before starting the optimization objective.
6. SupervisedRULPdMExperiment#
Goal: Predict Remaining Useful Life (RUL) as a continuous regression task.
Data Dependencies: Requires
anomaly_labels(which in this context act as historic RUL labels) andtarget_labels(ground truth RUL for evaluation). Checks foris_failureflags to identify run-to-failure scenarios.Execution Flow:
Fits using labels (like Supervised).
During prediction, it extracts the predictions and bypasses the standard episode-splitting logic, grouping entire scenarios together.
Uses
rul_evaluatefor evaluation (which calculates regression metrics like MAE/RMSE) instead of classification/AUC metrics.
Structural Uniqueness: Replaces the evaluation engine and bypasses episode splitting, tracking run-to-failure (
rtf) metadata for plotting and metrics.
7. Supervised_SA_PdMExperiment#
Goal: Frame predictive maintenance as a Survival Analysis problem.
Data Dependencies: Same as RUL, requires target labels and run-to-failure metadata.
Execution Flow:
Fits models with labels.
Generates survival predictions.
Crucially, it calls
thresholder.fit()after generating all target predictions, essentially using the target set (or a validation set) to learn a mapping from survival scores to RUL predictions.Uses
surv_evaluatefor computing concordance and other survival metrics.
Structural Uniqueness: The thresholder is fitted post-prediction on the aggregated target scores, unlike anomaly detection where thresholding is often determined per-episode or inferred statically. Evaluates using a completely different mathematical domain (survival metrics).
Conclusion on Modularity#
Pushing the entire execution loop into PdMExperiment is practically impossible because:
Loop Paradigms:
Incrementaluses row-by-row streaming simulation,AutoProfileslices dataframes on the fly, while the rest process full batches.Fitting Timings:
Unsupervisednever fits,Semi-Supervisedfits globally at the start,AutoProfilefits inside the target loop, andSurvival Analysisfits the thresholder at the very end.Data Signatures: Supervised flavors require passing labels to
fit(); others do not.Evaluation Routines: Flavors branch into standard AUC evaluation, RUL evaluation, or Survival evaluation, with different data formatting requirements.