πŸš€ Getting Started#

Welcome to PdMLabs (pdmlabs)! This section guides you through installing the framework, understanding core concepts, and running your first experiment.

Quick Paths#

Choose your learning style:

⚑ 5-Minute Quickstart

For the impatient: Prepare data with utils.dataset.Dataset and run each flavor with one common experiment handler.

quickstart
πŸ“š Understand Concepts

Before experiments: Learn pipelines, experiment flavors, datasets, and evaluation semantics. Mental models first.

../concepts/index
πŸ“– Step-by-Step Guide

Going deeper: Experiment selection, parameter tuning, component choices, troubleshooting, and best practices.

../user-guide/index
πŸ“œ Project Philosophy

Why PdMLabs exists: Problem statement, design principles, scope, and extensibility points.

../introduction

Installation#

pip install pdmlabs

Or in development mode from source:

git clone <repo-url>
cd PdM-Evaluation
pip install -e .

Requirements: - Python 3.8+ - scikit-learn, numpy, pandas - MLflow (optional, for experiment tracking) - PyTorch (optional, for neural network methods like TranAD, USAD)

Next Steps#

  1. Already familiar with ML experimentation? β†’ Jump to Quickstart for a hands-on 5-minute walk-through.

  2. New to predictive maintenance or anomaly detection? β†’ Start with πŸ“œ Our Manifesto for context, then πŸ“š Concepts for mental models.

  3. Ready to build your first experiment? β†’ Head to πŸ“– User Guide for decision trees and real-world guidance.

  4. Need full API documentation? β†’ See πŸ“– API Reference.

Workflow#

At a high level, every experiment follows this pipeline:

Raw Data
   ↓
Preprocess (optional: scaling, feature engineering, windowing)
   ↓
AD Method (isolation forest, neural net, statistical, etc.)
   ↓
Postprocess (optional: smoothing, aggregation, source fusion)
   ↓
Threshold (fixed, adaptive, or auto-tuned)
   ↓
Evaluate (PdM-aware metrics: lead time, episode-aware recall)

The framework handles this pipeline for you via the PdMPipeline and PdMExperiment abstractions. You provide:

  • A dataset (dict with events, sources, timesteps, preferences)

  • An experiment flavor (e.g., AutoProfileSemiSupervisedExperiment)

  • Pipeline component choices (preprocessor, method, postprocessor, thresholder)

The framework runs cross-validation, logs results to MLflow, and returns performance metrics.

Key Concepts#

Dataset: A Python dict with time-indexed events, labeled failures, sources (sensors/subsystems), and optional preferences for event interpretation.

Experiment Flavor: Batch experiments designed for different settings (multiple methods, semi-supervised, RUL prediction, survival analysis). Streaming experiments are also available but less mature.

Pipeline: Sequence of transformers (preprocessor β†’ method β†’ postprocessor β†’ thresholder) applied to each fold.

Evaluation: Predictive maintenance–focused metrics (lead time, episode-aware recall, VUS) that measure practical usefulness, not just statistical performance.

Reproducibility: Seed management, MLflow logging, and manual parameter overrides ensure deterministic results.

Glossary#

Method#

Statistical or ML model that detects anomalies (e.g., Isolation Forest, LOF, neural nets).

Preprocessor#

Optional pipeline stage that transforms raw features (e.g., scaling, feature engineering, windowing).

Postprocessor#

Optional pipeline stage that refines predictions (e.g., smoothing, source fusion, aggregation).

Thresholder#

Converts anomaly scores into binary predictions via fixed/adaptive/learned decision boundaries.

Episode#

Contiguous time window bounded by reset events; used for episode-aware evaluation.

Lead Time#

Time from anomaly detection to actual failure; core objective for PdM systems.

AD1_AUC#

Area under curve for recall-at-false-positive-rate trade-off, accounting for lead time penalty.

Fold#

Train/test split used in cross-validation; PdMLabs uses temporal folding strategies.

Next Up#

β†’ Quickstart to run your first experiment in 5 minutes.