pdmlabs.RunExperiment#
Experiment orchestration and execution for predictive maintenance anomaly detection.
This module provides the main entry point for running predictive maintenance experiments across different learning paradigms (supervised, unsupervised, semi-supervised) with automatic hyperparameter optimization.
- Key Features:
Multi-method, multi-experiment orchestration
Hyperparameter optimization using MANGO (Bayesian optimization)
Pipeline construction (preprocessing → method → postprocessing → thresholding)
MLflow integration for experiment tracking
Support for 7 experiment types with different learning strategies
Constraint-based parameter validation for different experiment types
Parallel execution for hyperparameter search
- Core Functions:
run_experiment: Main entry point to execute predictive maintenance experiments get_method_type: Map experiment type to appropriate method interface is_port_in_use: Check port availability for MLflow tracking server run_mlflow_server: Start or verify MLflow tracking server
Example
>>> from pdmlabs.RunExperiment import run_experiment
>>> from pdmlabs.utils.dataset import Dataset
>>> dataset = Dataset(data, datetime_column='timestamp')
>>> methods = [IsolationForest()]
>>> param_spaces = [{'n_estimators': [100, 200]}]
>>> results = run_experiment(
... dataset=dataset.get_rul_dataset()[0],
... methods=methods,
... param_space_dict_per_method=param_spaces,
... method_names=['IF'],
... experiments=[UnsupervisedPdMExperiment],
... experiment_names=['Baseline'],
... MAX_RUNS=20
... )
Functions
|
Map experiment class to corresponding method interface. |
|
Check if a given port is in use on the specified host. |
|
Execute predictive maintenance anomaly detection experiments with hyperparameter optimization. |
|
Start or verify MLflow tracking server for experiment logging. |
- pdmlabs.RunExperiment.get_method_type(experiment)#
Map experiment class to corresponding method interface.
Determines which method interface (supervised, unsupervised, or semi-supervised) is required for a given experiment type. This ensures the correct method base class is used during method instantiation.
- Parameters:
experiment (type) – Experiment class (not instance). One of: - AutoProfileSemiSupervisedPdMExperiment - IncrementalSemiSupervisedPdMExperiment - SemiSupervisedPdMExperiment - UnsupervisedPdMExperiment - SupervisedPdMExperiment - SupervisedRULPdMExperiment - Supervised_SA_PdMExperiment
- Returns:
Method interface class: - SemiSupervisedMethodInterface for semi-supervised experiments - UnsupervisedMethodInterface for unsupervised experiments - SupervisedMethodInterface for supervised/RUL/SA experiments
- Return type:
type
- Raises:
ValueError – If experiment type is not recognized.
Examples
>>> from pdmlabs.RunExperiment import get_method_type >>> from pdmlabs.experiment.batch.unsupervised_experiment import UnsupervisedPdMExperiment >>> interface = get_method_type(UnsupervisedPdMExperiment) >>> print(interface.__name__) 'UnsupervisedMethodInterface'
- pdmlabs.RunExperiment.is_port_in_use(host, port)#
Check if a given port is in use on the specified host.
Attempts a socket connection to verify if a port is listening. Useful for checking if a server (e.g., MLflow UI) is already running.
- Parameters:
host (str) – Host IP address or hostname (e.g., ‘127.0.0.1’, ‘localhost’, ‘0.0.0.0’).
port (int) – Port number to check (0-65535).
- Returns:
True if port is in use (connection succeeds), False if available.
- Return type:
bool
Examples
>>> is_port_in_use('127.0.0.1', 5000) False # Port 5000 is free
>>> is_port_in_use('127.0.0.1', 8080) True # Port 8080 is in use
Notes
Quick check: returns result once connection is attempted
Safe: uses context manager to ensure socket is properly closed
Non-blocking: does not hang on connection refused
- pdmlabs.RunExperiment.run_experiment(dataset, methods, param_space_dict_per_method, method_names, experiments, experiment_names, additional_parameters={}, MAX_RUNS=1, MAX_JOBS=1, INITIAL_RANDOM=1, profile_size=2, fit_size=None, postprocessor=<class 'pdmlabs.postprocessing.default.DefaultPostProcessor'>, preprocessor=<class 'pdmlabs.preprocessing.record_level.default.DefaultPreProcessor'>, thresholder=<class 'pdmlabs.thresholding.constant.ConstantThresholder'>, mlflow_port=None, debug=True, optimization_param='AD1_AUC', maximize=True, custom_evaluators=None)#
Execute predictive maintenance anomaly detection experiments with hyperparameter optimization.
Orchestrates complete experiments: constructs pipelines, performs hyperparameter search, logs results to MLflow, and returns optimal parameters. Supports multiple methods and experiments with cross-product evaluation (each method × each experiment combination).
- Parameters:
dataset (dict) – Dataset configuration dictionary (typically from Dataset.get_*_dataset() methods). Must contain: - ‘target_data’: List of test feature dataframes - ‘target_sources’: List of test source identifiers - Additional dataset metadata (see loadAnomalyDetectionDataset.py)
methods (list) – List of instantiated method classes. Each method should inherit from MethodInterface. Order must correspond to param_space_dict_per_method and method_names.
param_space_dict_per_method (list[dict]) – Hyperparameter search spaces for each method. Each dict maps parameter names to lists of candidate values. Example: [{‘n_estimators’: [50, 100], ‘max_samples’: [256, 512]}]
method_names (list[str]) – Human-readable names for each method (for logging). Used in MLflow experiment naming and artifact paths.
experiments (list[type]) –
List of experiment class types (not instances) to execute. Supported: AutoProfileSemiSupervisedPdMExperiment, IncrementalSemiSupervisedPdMExperiment,
UnsupervisedPdMExperiment, SemiSupervisedPdMExperiment, SupervisedPdMExperiment, SupervisedRULPdMExperiment, Supervised_SA_PdMExperiment
Each method will be evaluated on all experiments (cross-product).
experiment_names (list[str]) – Human-readable names for each experiment (for logging/identification).
additional_parameters (dict, default={}) – Extra hyperparameters for pipeline components (preprocessing, postprocessing, thresholding). Key format: ‘{component}_{param_name}’ (e.g., ‘postprocessor_window_length’, ‘preprocessor_features’). Values should be lists of candidate values (for grid search).
MAX_RUNS (int, default=1) – Maximum number of hyperparameter configurations to evaluate per method-experiment pair. Higher values allow more thorough exploration but increase computation.
MAX_JOBS (int, default=1) – Number of parallel processes for hyperparameter search (via MANGO). Typically 1-8 depending on system CPU cores.
INITIAL_RANDOM (int, default=1) – Number of initial random hyperparameter samples before Bayesian optimization. Provides diversity in the exploration phase.
profile_size (int or list[int], default=2) – Historical buffer size (number of samples) used by online methods. If list: allows multiple buffer sizes to be tested. For online/streaming evaluation: how much historical data to keep.
fit_size (int or list[int], optional) – Initial profile size (“warm-up” buffer) before evaluation begins. If None: defaults to profile_size. Used in AutoProfile and Incremental experiments for initialization.
postprocessor (type, default=DefaultPostProcessor) – Post-processing class for score smoothing/normalization. Will be instantiated with appropriate parameters during pipeline construction. Options: DefaultPostProcessor, MovingAveragePostProcessor, MinMaxPostProcessor, etc.
preprocessor (type, default=DefaultPreProcessor) – Pre-processing class for data preparation/transformation. Will be instantiated with appropriate parameters. Options: DefaultPreProcessor, FeatureSelector, MinMaxScaler, etc.
thresholder (type, default=ConstantThresholder) – Thresholding class to convert anomaly scores to binary labels. Will be instantiated with appropriate parameters. Options: ConstantThresholder, SurvToRUL, DynamicThresholder, etc.
mlflow_port (int, optional) – Port number for MLflow tracking UI. If provided, starts/verifies MLflow server. If None: skips MLflow setup (no experiment logging).
debug (bool, default=True) – If True: enables verbose logging and debug messages during experiment execution. If False: minimal logging, only results and warnings.
optimization_param (str, default="AD1_AUC") – Metric to optimize during hyperparameter search. Options: “AD1_AUC”, “AD2_AUC”, “avg_time_to_alarm”, “false_alarm_rate”, etc. See evaluation module for complete list of available metrics.
maximize (bool, default=True) – If True: MANGO maximizes optimization_param. If False: MANGO minimizes optimization_param. Typically True for AUC, F1-score; False for error rate, false alarms.
- Returns:
list[dict] – Best hyperparameters for each method-experiment combination (in execution order). Each dict maps parameter names to optimal values discovered by MANGO. Length = len(methods) × len(experiments)
Pipeline Construction – For each method-experiment pair: 1. Create PdMPipeline with: preprocessor → method → postprocessor → thresholder 2. Configure AUC resolution=30 (granularity of ROC curve) 3. Assign experiment type (Supervised/Unsupervised/SemiSupervised) 4. Build search space: pipeline params + method params + additional params
Constraint Functions – Different experiments apply parameter constraints: - AutoProfileSemiSupervisedPdMExperiment: max_wait_time constraints - IncrementalSemiSupervisedPdMExperiment: incremental + max_wait constraints - UnsupervisedPdMExperiment: SAND or distance-based constraints (KNN) or max_wait constraints
Examples
>>> from pdmlabs.RunExperiment import run_experiment >>> from pdmlabs.utils.automatic_parameter_generation import online_technique >>> from pdmlabs.method.unsupervised_method import IF
>>> # Load dataset >>> dataset_obj = Dataset(data, 'timestamp', failure_column='failure') >>> train_data, val_data = dataset_obj.get_rul_dataset()
>>> # Configure experiment >>> methods = [IF] >>> param_spaces = [online_technique('IF', maximum_profile=500)] >>> best_params = run_experiment( ... dataset=train_data, ... methods=methods, ... param_space_dict_per_method=param_spaces, ... method_names=['IF'], ... experiments=[UnsupervisedPdMExperiment], ... experiment_names=['Online'], ... MAX_RUNS=20, ... MAX_JOBS=4, ... INITIAL_RANDOM=2, ... mlflow_port=5000, ... optimization_param='AD1_AUC' ... ) >>> print(best_params[0]) # Best parameters for IF in Online experiment
Notes
Important: fit_size defaults to profile_size if not specified
MANGO parameters (num, jobs, initial_random) calculated from space size and MAX_RUNS
Constraint functions prevent invalid hyperparameter combinations
MLflow artifacts saved to: ./artifacts/{experiment_name} artifacts/
All experiments execute in sequence (not parallel), but internal MANGO uses parallelization
Method-experiment cross-product: if 2 methods × 3 experiments = 6 total runs
- pdmlabs.RunExperiment.run_mlflow_server(mlflow_port)#
Start or verify MLflow tracking server for experiment logging.
Checks if MLflow UI server is already running on the specified port. If not, starts a new MLflow UI process listening on localhost.
- Parameters:
mlflow_port (int) – Port number for MLflow UI server (e.g., 5000, 8080).
- Returns:
Prints status messages but returns nothing.
- Return type:
None
Notes
Host is hardcoded to 127.0.0.1 (localhost)
Starts MLflow in non-blocking mode (subprocess)
Assumes ‘mlflow’ command is available in system PATH
Multiple calls with same port: only first actually starts server
Server can be accessed at http://127.0.0.1:{mlflow_port}
Examples
>>> run_mlflow_server(5000) MLflow server started at http://127.0.0.1:5000.
>>> run_mlflow_server(5000) # Second call MLflow server is already running at http://127.0.0.1:5000.