π Getting Started#
Welcome to PdMLabs (pdmlabs)! This section guides you through installing the framework, understanding core concepts, and running your first experiment.
Quick Paths#
Choose your learning style:
For the impatient: Prepare data with utils.dataset.Dataset and run each flavor with one common experiment handler.
Before experiments: Learn pipelines, experiment flavors, datasets, and evaluation semantics. Mental models first.
Going deeper: Experiment selection, parameter tuning, component choices, troubleshooting, and best practices.
Why PdMLabs exists: Problem statement, design principles, scope, and extensibility points.
Installation#
pip install pdmlabs
Or in development mode from source:
git clone <repo-url>
cd PdM-Evaluation
pip install -e .
Requirements: - Python 3.8+ - scikit-learn, numpy, pandas - MLflow (optional, for experiment tracking) - PyTorch (optional, for neural network methods like TranAD, USAD)
Next Steps#
Already familiar with ML experimentation? β Jump to Quickstart for a hands-on 5-minute walk-through.
New to predictive maintenance or anomaly detection? β Start with π Our Manifesto for context, then π Concepts for mental models.
Ready to build your first experiment? β Head to π User Guide for decision trees and real-world guidance.
Need full API documentation? β See π API Reference.
Workflow#
At a high level, every experiment follows this pipeline:
Raw Data
β
Preprocess (optional: scaling, feature engineering, windowing)
β
AD Method (isolation forest, neural net, statistical, etc.)
β
Postprocess (optional: smoothing, aggregation, source fusion)
β
Threshold (fixed, adaptive, or auto-tuned)
β
Evaluate (PdM-aware metrics: lead time, episode-aware recall)
The framework handles this pipeline for you via the PdMPipeline and PdMExperiment abstractions. You provide:
A dataset (dict with events, sources, timesteps, preferences)
An experiment flavor (e.g.,
AutoProfileSemiSupervisedExperiment)Pipeline component choices (preprocessor, method, postprocessor, thresholder)
The framework runs cross-validation, logs results to MLflow, and returns performance metrics.
Key Concepts#
Dataset: A Python dict with time-indexed events, labeled failures, sources (sensors/subsystems), and optional preferences for event interpretation.
Experiment Flavor: Batch experiments designed for different settings (multiple methods, semi-supervised, RUL prediction, survival analysis). Streaming experiments are also available but less mature.
Pipeline: Sequence of transformers (preprocessor β method β postprocessor β thresholder) applied to each fold.
Evaluation: Predictive maintenanceβfocused metrics (lead time, episode-aware recall, VUS) that measure practical usefulness, not just statistical performance.
Reproducibility: Seed management, MLflow logging, and manual parameter overrides ensure deterministic results.
Glossary#
- Method#
Statistical or ML model that detects anomalies (e.g., Isolation Forest, LOF, neural nets).
- Preprocessor#
Optional pipeline stage that transforms raw features (e.g., scaling, feature engineering, windowing).
- Postprocessor#
Optional pipeline stage that refines predictions (e.g., smoothing, source fusion, aggregation).
- Thresholder#
Converts anomaly scores into binary predictions via fixed/adaptive/learned decision boundaries.
- Episode#
Contiguous time window bounded by reset events; used for episode-aware evaluation.
- Lead Time#
Time from anomaly detection to actual failure; core objective for PdM systems.
- AD1_AUC#
Area under curve for recall-at-false-positive-rate trade-off, accounting for lead time penalty.
- Fold#
Train/test split used in cross-validation; PdMLabs uses temporal folding strategies.
Next Up#
β Quickstart to run your first experiment in 5 minutes.