pdmlabs.RunExperiment#

Experiment orchestration and execution for predictive maintenance anomaly detection.

This module provides the main entry point for running predictive maintenance experiments across different learning paradigms (supervised, unsupervised, semi-supervised) with automatic hyperparameter optimization.

Key Features:
  • Multi-method, multi-experiment orchestration

  • Hyperparameter optimization using MANGO (Bayesian optimization)

  • Pipeline construction (preprocessing → method → postprocessing → thresholding)

  • MLflow integration for experiment tracking

  • Support for 7 experiment types with different learning strategies

  • Constraint-based parameter validation for different experiment types

  • Parallel execution for hyperparameter search

Core Functions:

run_experiment: Main entry point to execute predictive maintenance experiments get_method_type: Map experiment type to appropriate method interface is_port_in_use: Check port availability for MLflow tracking server run_mlflow_server: Start or verify MLflow tracking server

Example

>>> from pdmlabs.RunExperiment import run_experiment
>>> from pdmlabs.utils.dataset import Dataset
>>> dataset = Dataset(data, datetime_column='timestamp')
>>> methods = [IsolationForest()]
>>> param_spaces = [{'n_estimators': [100, 200]}]
>>> results = run_experiment(
...     dataset=dataset.get_rul_dataset()[0],
...     methods=methods,
...     param_space_dict_per_method=param_spaces,
...     method_names=['IF'],
...     experiments=[UnsupervisedPdMExperiment],
...     experiment_names=['Baseline'],
...     MAX_RUNS=20
... )

Functions

get_method_type(experiment)

Map experiment class to corresponding method interface.

is_port_in_use(host, port)

Check if a given port is in use on the specified host.

run_experiment(dataset, methods, ...[, ...])

Execute predictive maintenance anomaly detection experiments with hyperparameter optimization.

run_mlflow_server(mlflow_port)

Start or verify MLflow tracking server for experiment logging.

pdmlabs.RunExperiment.get_method_type(experiment)#

Map experiment class to corresponding method interface.

Determines which method interface (supervised, unsupervised, or semi-supervised) is required for a given experiment type. This ensures the correct method base class is used during method instantiation.

Parameters:

experiment (type) – Experiment class (not instance). One of: - AutoProfileSemiSupervisedPdMExperiment - IncrementalSemiSupervisedPdMExperiment - SemiSupervisedPdMExperiment - UnsupervisedPdMExperiment - SupervisedPdMExperiment - SupervisedRULPdMExperiment - Supervised_SA_PdMExperiment

Returns:

Method interface class: - SemiSupervisedMethodInterface for semi-supervised experiments - UnsupervisedMethodInterface for unsupervised experiments - SupervisedMethodInterface for supervised/RUL/SA experiments

Return type:

type

Raises:

ValueError – If experiment type is not recognized.

Examples

>>> from pdmlabs.RunExperiment import get_method_type
>>> from pdmlabs.experiment.batch.unsupervised_experiment import UnsupervisedPdMExperiment
>>> interface = get_method_type(UnsupervisedPdMExperiment)
>>> print(interface.__name__)
'UnsupervisedMethodInterface'
pdmlabs.RunExperiment.is_port_in_use(host, port)#

Check if a given port is in use on the specified host.

Attempts a socket connection to verify if a port is listening. Useful for checking if a server (e.g., MLflow UI) is already running.

Parameters:
  • host (str) – Host IP address or hostname (e.g., ‘127.0.0.1’, ‘localhost’, ‘0.0.0.0’).

  • port (int) – Port number to check (0-65535).

Returns:

True if port is in use (connection succeeds), False if available.

Return type:

bool

Examples

>>> is_port_in_use('127.0.0.1', 5000)
False  # Port 5000 is free
>>> is_port_in_use('127.0.0.1', 8080)
True   # Port 8080 is in use

Notes

  • Quick check: returns result once connection is attempted

  • Safe: uses context manager to ensure socket is properly closed

  • Non-blocking: does not hang on connection refused

pdmlabs.RunExperiment.run_experiment(dataset, methods, param_space_dict_per_method, method_names, experiments, experiment_names, additional_parameters={}, MAX_RUNS=1, MAX_JOBS=1, INITIAL_RANDOM=1, profile_size=2, fit_size=None, postprocessor=<class 'pdmlabs.postprocessing.default.DefaultPostProcessor'>, preprocessor=<class 'pdmlabs.preprocessing.record_level.default.DefaultPreProcessor'>, thresholder=<class 'pdmlabs.thresholding.constant.ConstantThresholder'>, mlflow_port=None, debug=True, optimization_param='AD1_AUC', maximize=True, custom_evaluators=None)#

Execute predictive maintenance anomaly detection experiments with hyperparameter optimization.

Orchestrates complete experiments: constructs pipelines, performs hyperparameter search, logs results to MLflow, and returns optimal parameters. Supports multiple methods and experiments with cross-product evaluation (each method × each experiment combination).

Parameters:
  • dataset (dict) – Dataset configuration dictionary (typically from Dataset.get_*_dataset() methods). Must contain: - ‘target_data’: List of test feature dataframes - ‘target_sources’: List of test source identifiers - Additional dataset metadata (see loadAnomalyDetectionDataset.py)

  • methods (list) – List of instantiated method classes. Each method should inherit from MethodInterface. Order must correspond to param_space_dict_per_method and method_names.

  • param_space_dict_per_method (list[dict]) – Hyperparameter search spaces for each method. Each dict maps parameter names to lists of candidate values. Example: [{‘n_estimators’: [50, 100], ‘max_samples’: [256, 512]}]

  • method_names (list[str]) – Human-readable names for each method (for logging). Used in MLflow experiment naming and artifact paths.

  • experiments (list[type]) –

    List of experiment class types (not instances) to execute. Supported: AutoProfileSemiSupervisedPdMExperiment, IncrementalSemiSupervisedPdMExperiment,

    UnsupervisedPdMExperiment, SemiSupervisedPdMExperiment, SupervisedPdMExperiment, SupervisedRULPdMExperiment, Supervised_SA_PdMExperiment

    Each method will be evaluated on all experiments (cross-product).

  • experiment_names (list[str]) – Human-readable names for each experiment (for logging/identification).

  • additional_parameters (dict, default={}) – Extra hyperparameters for pipeline components (preprocessing, postprocessing, thresholding). Key format: ‘{component}_{param_name}’ (e.g., ‘postprocessor_window_length’, ‘preprocessor_features’). Values should be lists of candidate values (for grid search).

  • MAX_RUNS (int, default=1) – Maximum number of hyperparameter configurations to evaluate per method-experiment pair. Higher values allow more thorough exploration but increase computation.

  • MAX_JOBS (int, default=1) – Number of parallel processes for hyperparameter search (via MANGO). Typically 1-8 depending on system CPU cores.

  • INITIAL_RANDOM (int, default=1) – Number of initial random hyperparameter samples before Bayesian optimization. Provides diversity in the exploration phase.

  • profile_size (int or list[int], default=2) – Historical buffer size (number of samples) used by online methods. If list: allows multiple buffer sizes to be tested. For online/streaming evaluation: how much historical data to keep.

  • fit_size (int or list[int], optional) – Initial profile size (“warm-up” buffer) before evaluation begins. If None: defaults to profile_size. Used in AutoProfile and Incremental experiments for initialization.

  • postprocessor (type, default=DefaultPostProcessor) – Post-processing class for score smoothing/normalization. Will be instantiated with appropriate parameters during pipeline construction. Options: DefaultPostProcessor, MovingAveragePostProcessor, MinMaxPostProcessor, etc.

  • preprocessor (type, default=DefaultPreProcessor) – Pre-processing class for data preparation/transformation. Will be instantiated with appropriate parameters. Options: DefaultPreProcessor, FeatureSelector, MinMaxScaler, etc.

  • thresholder (type, default=ConstantThresholder) – Thresholding class to convert anomaly scores to binary labels. Will be instantiated with appropriate parameters. Options: ConstantThresholder, SurvToRUL, DynamicThresholder, etc.

  • mlflow_port (int, optional) – Port number for MLflow tracking UI. If provided, starts/verifies MLflow server. If None: skips MLflow setup (no experiment logging).

  • debug (bool, default=True) – If True: enables verbose logging and debug messages during experiment execution. If False: minimal logging, only results and warnings.

  • optimization_param (str, default="AD1_AUC") – Metric to optimize during hyperparameter search. Options: “AD1_AUC”, “AD2_AUC”, “avg_time_to_alarm”, “false_alarm_rate”, etc. See evaluation module for complete list of available metrics.

  • maximize (bool, default=True) – If True: MANGO maximizes optimization_param. If False: MANGO minimizes optimization_param. Typically True for AUC, F1-score; False for error rate, false alarms.

Returns:

  • list[dict] – Best hyperparameters for each method-experiment combination (in execution order). Each dict maps parameter names to optimal values discovered by MANGO. Length = len(methods) × len(experiments)

  • Pipeline Construction – For each method-experiment pair: 1. Create PdMPipeline with: preprocessor → method → postprocessor → thresholder 2. Configure AUC resolution=30 (granularity of ROC curve) 3. Assign experiment type (Supervised/Unsupervised/SemiSupervised) 4. Build search space: pipeline params + method params + additional params

  • Constraint Functions – Different experiments apply parameter constraints: - AutoProfileSemiSupervisedPdMExperiment: max_wait_time constraints - IncrementalSemiSupervisedPdMExperiment: incremental + max_wait constraints - UnsupervisedPdMExperiment: SAND or distance-based constraints (KNN) or max_wait constraints

Examples

>>> from pdmlabs.RunExperiment import run_experiment
>>> from pdmlabs.utils.automatic_parameter_generation import online_technique
>>> from pdmlabs.method.unsupervised_method import IF
>>> # Load dataset
>>> dataset_obj = Dataset(data, 'timestamp', failure_column='failure')
>>> train_data, val_data = dataset_obj.get_rul_dataset()
>>> # Configure experiment
>>> methods = [IF]
>>> param_spaces = [online_technique('IF', maximum_profile=500)]
>>> best_params = run_experiment(
...     dataset=train_data,
...     methods=methods,
...     param_space_dict_per_method=param_spaces,
...     method_names=['IF'],
...     experiments=[UnsupervisedPdMExperiment],
...     experiment_names=['Online'],
...     MAX_RUNS=20,
...     MAX_JOBS=4,
...     INITIAL_RANDOM=2,
...     mlflow_port=5000,
...     optimization_param='AD1_AUC'
... )
>>> print(best_params[0])  # Best parameters for IF in Online experiment

Notes

  • Important: fit_size defaults to profile_size if not specified

  • MANGO parameters (num, jobs, initial_random) calculated from space size and MAX_RUNS

  • Constraint functions prevent invalid hyperparameter combinations

  • MLflow artifacts saved to: ./artifacts/{experiment_name} artifacts/

  • All experiments execute in sequence (not parallel), but internal MANGO uses parallelization

  • Method-experiment cross-product: if 2 methods × 3 experiments = 6 total runs

pdmlabs.RunExperiment.run_mlflow_server(mlflow_port)#

Start or verify MLflow tracking server for experiment logging.

Checks if MLflow UI server is already running on the specified port. If not, starts a new MLflow UI process listening on localhost.

Parameters:

mlflow_port (int) – Port number for MLflow UI server (e.g., 5000, 8080).

Returns:

Prints status messages but returns nothing.

Return type:

None

Notes

  • Host is hardcoded to 127.0.0.1 (localhost)

  • Starts MLflow in non-blocking mode (subprocess)

  • Assumes ‘mlflow’ command is available in system PATH

  • Multiple calls with same port: only first actually starts server

  • Server can be accessed at http://127.0.0.1:{mlflow_port}

Examples

>>> run_mlflow_server(5000)
MLflow server started at http://127.0.0.1:5000.
>>> run_mlflow_server(5000)  # Second call
MLflow server is already running at http://127.0.0.1:5000.