pdmlabs.experiment.streaming#

Streaming experiment classes for online/real-time PdM evaluation (experimental).

Status: Early-Stage / Stubs

Streaming experiments are designed for real-time, online scenarios where: - Data arrives continuously (not all available upfront) - Models must adapt or update as new data is seen - Predictions are needed immediately (not retrospectively)

Current State: This module contains placeholder implementations. Streaming support is planned for future versions. For production use, prefer batch experiments.

Available Classes:

StreamingSemiSupervisedPdMExperiment

Placeholder for online semi-supervised anomaly detection. Status: Stub (not implemented)

StreamingUnsupervisedPdMExperiment

Placeholder for online unsupervised anomaly detection. Status: Stub (not implemented)

Future Roadmap:

Phase 1 (Future)
  • Per-sample prediction interface

  • Streaming parameter tuning

  • Automated concept drift detection

Phase 2 (Future)
  • Online model adaptation (no retraining needed)

  • Memory-efficient windoring strategies

  • Real-time MLflow integration

Phase 3 (Future)
  • Ensemble methods for streaming

  • Anomaly score confidence intervals

  • Multi-source fusion

Recommendation:

For now, use batch experiments (pdmlabs.experiment.batch) for all production PdM applications. Revisit streaming when fully implemented.

Alternative: Use temporal cross-validation in batch experiments to simulate streaming performance (train on early data, test on later data).

See also

  • pdmlabs.experiment.batch: Production-ready batch experiments

  • pdmlabs.experiment.experiment: PdMExperiment base class

class pdmlabs.experiment.streaming.StreamingSemiSupervisedPdMExperiment(experiment_name: str, pipeline: PdMPipeline, param_space: dict, constraint_function: Callable = None, target_data: list[DataFrame] = None, target_sources: list[str] = None, historic_data: list[DataFrame] = [], historic_sources: list[str] = [], optimization_param: str = 'AD1_AUC', initial_random: int = 2, num_iteration: int = 20, batch_size: int = 1, n_jobs: int = 1, random_state: int = 42, random_n_tries: int = 3, constraint_max_retries: int = 10, historic_data_header: str = 'infer', target_data_header: str = 'infer', artifacts: str = 'artifacts', debug: bool = False, delay: float = None, log_best_scores: bool = False, maximize: bool = True, custom_evaluators: list = None)#

Bases: PdMExperiment

Streaming (online) semi-supervised anomaly detection.

Status: Experimental/Stub Implementation

This experiment flavor is designed for streaming data scenarios: - Processes data continuously as it arrives (row-by-row or in small batches) - Adapts models online without batch retraining - Produces predictions in real-time

Current Implementation: This is an early-stage stub that iterates over target data but does not yet implement full streaming evaluation logic. Use batch experiments for production.

Future Work: - Streaming parameter tuning - Online model adaptation - Concept drift detection - Memory-efficient processing

Raises:

NotImplementedError – Full streaming functionality not yet implemented.

Examples

>>> experiment = StreamingSemiSupervisedPdMExperiment(...)
>>> # Note: streaming experiments are currently stubs
>>> # Use batch experiments instead for now
execute() None#

Execute placeholder streaming experiment (not fully implemented).

Returns:

Streaming experiments are currently stubs.

Return type:

None

class pdmlabs.experiment.streaming.StreamingUnsupervisedPdMExperiment(experiment_name: str, pipeline: PdMPipeline, param_space: dict, constraint_function: Callable = None, target_data: list[DataFrame] = None, target_sources: list[str] = None, historic_data: list[DataFrame] = [], historic_sources: list[str] = [], optimization_param: str = 'AD1_AUC', initial_random: int = 2, num_iteration: int = 20, batch_size: int = 1, n_jobs: int = 1, random_state: int = 42, random_n_tries: int = 3, constraint_max_retries: int = 10, historic_data_header: str = 'infer', target_data_header: str = 'infer', artifacts: str = 'artifacts', debug: bool = False, delay: float = None, log_best_scores: bool = False, maximize: bool = True, custom_evaluators: list = None)#

Bases: PdMExperiment

Streaming (online) unsupervised anomaly detection.

Status: Stub Implementation

This experiment flavor is designed for unsupervised streaming data: - Processes continuous data streams without labels - Adapts models in real-time - Produces anomaly scores online

Current Implementation: This is a placeholder stub with no execution logic. Use batch experiments for full functionality. Streaming support is planned for future versions.

Design Goals: - Minimal memory footprint for long-running applications - Per-sample or mini-batch prediction - Automatic concept drift handling - No offline/batch retraining required

Raises:

NotImplementedError – Streaming functionality not yet implemented.

Examples

>>> # Streaming experiments are not yet implemented
>>> # Use UnsupervisedPdMExperiment (batch) instead
execute() None#

Execute placeholder unsupervised streaming experiment.

Returns:

Not implemented.

Return type:

None

Modules