📖 User Guide#
This guide focuses on practical usage of PdMLabs once you already understand the core concepts.
Use this page to decide what to run, how to configure experiments, and how to interpret outputs in a consistent way.
Who This Guide Is For#
This guide is intended for users who want to:
run repeatable predictive maintenance experiments,
compare different methods and experiment flavors fairly,
tune configurations with clear constraints,
log and analyze results with MLflow,
extend the framework with custom components.
Recommended Workflow#
For most projects, this sequence works well:
Prepare your dataset dictionary (including events and source metadata).
Choose the experiment flavor based on data assumptions.
Select one or more methods and parameter spaces.
Start with default pre/post/threshold components.
Run a small search budget first, then scale up.
Inspect AD/RUL metrics and qualitative plots.
Iterate on parameter spaces and component choices.
Choosing The Right Experiment Flavor#
Use the following rule of thumb:
AutoProfileSemiSupervisedPdMExperimentUse when you want profile-based learning with event-aware resets.
IncrementalSemiSupervisedPdMExperimentUse when behavior changes over time and rolling re-training is important.
SemiSupervisedPdMExperimentUse when you have representative historical data and a stable deployment setting.
UnsupervisedPdMExperimentUse when fitting on historical normal data is not available or not desired.
SupervisedPdMExperimentUse for labeled classification-style workflows.
SupervisedRULPdMExperimentUse for remaining useful life prediction workflows.
Supervised_SA_PdMExperimentUse for survival-analysis-oriented time-to-event workflows.
Dataset Configuration Notes#
Most runtime issues come from dataset contract mismatches. Verify these early:
target_dataandtarget_sourceshave matching lengths.historic_dataandhistoric_sourceshave matching lengths.datespoints to a valid timestamp column (or pre-indexed datetime index).event_datahas expected columns: date, type, source, description.event_preferencescorrectly map failures and resets.
For supervised flows, also verify:
anomaly_labelslength matches each historic scenario length.target_labelsis provided where required (RUL / survival flows).
Parameter Search Strategy#
PdMLabs uses Mango for optimization with optional constraints.
Practical guidance:
Start with small
MAX_RUNSto validate setup.Keep
INITIAL_RANDOM> 0 for robust initialization.Use constraints to eliminate invalid combinations early.
Increase budget only after reviewing first-run logs and plots.
When available, pdmlabs.utils.automatic_parameter_generation provides
good initial spaces for common methods.
Pipeline Component Choices#
Start simple, then specialize:
Preprocessor: begin with
DefaultPreProcessor.Postprocessor: begin with
DefaultPostProcessor.Thresholder: begin with
ConstantThresholder.
Then progressively test alternatives (smoothing, dynamic thresholding, survival-to-RUL conversion) when baseline behavior is understood.
Interpreting Results#
PdMLabs reports metrics designed for predictive maintenance behavior, not only generic binary classification quality.
Focus on:
whether alarms occur in useful predictive-horizon windows,
precision/recall trade-offs for operations,
stability across sources and episodes,
robustness across parameter configurations.
Use MLflow logs and plots to combine numeric and visual inspection.
Common Pitfalls#
Watch for these frequent issues:
date handling problems (wrong column, unsorted timestamps),
source mapping mismatches (especially with
match_sources),label length mismatch in supervised workflows,
unrealistic parameter ranges causing invalid runs,
over-interpreting one metric without episode-level inspection.
Reproducibility Checklist#
To keep experiments reproducible:
set random seeds where possible,
keep parameter spaces versioned,
log all runs to MLflow,
document dataset version and preprocessing assumptions,
compare methods under the same dataset split and evaluation settings.
Implementing Custom Methods#
Want to create your own anomaly detection method? The framework provides clear interfaces for each experiment flavor:
🔍 Implementing Unsupervised Methods — Online detection without training
📊 Implementing Semi-Supervised Methods — Learn from clean profiles
🎯 Implementing Classification Methods — Binary classification approach
⏱️ Implementing RUL Regression Methods — Predict remaining useful life
🧬 Implementing Survival Analysis Methods — Model time-to-failure with censoring
Start with 🛠️ Implementing Custom Methods for an overview.