pdmlabs.experiment.streaming

pdmlabs.experiment.streaming#

Streaming experiment classes for online/real-time PdM evaluation (experimental).

Status: Early-Stage / Stubs

Streaming experiments are designed for real-time, online scenarios where: - Data arrives continuously (not all available upfront) - Models must adapt or update as new data is seen - Predictions are needed immediately (not retrospectively)

Current State: This module contains placeholder implementations. Streaming support is planned for future versions. For production use, prefer batch experiments.

Available Classes:

StreamingSemiSupervisedPdMExperiment
Placeholder for online semi-supervised anomaly detection. Status: Stub (not implemented)

StreamingUnsupervisedPdMExperiment
Placeholder for online unsupervised anomaly detection. Status: Stub (not implemented)

Future Roadmap:

Phase 1 (Future)

Per-sample prediction interface

Streaming parameter tuning

Automated concept drift detection

Phase 2 (Future)

Online model adaptation (no retraining needed)

Memory-efficient windoring strategies

Real-time MLflow integration

Phase 3 (Future)

Ensemble methods for streaming

Anomaly score confidence intervals

Multi-source fusion

Recommendation:

For now, use batch experiments (pdmlabs.experiment.batch) for all production PdM applications. Revisit streaming when fully implemented.

Alternative: Use temporal cross-validation in batch experiments to simulate streaming performance (train on early data, test on later data).

See also

pdmlabs.experiment.batch: Production-ready batch experiments
pdmlabs.experiment.experiment: PdMExperiment base class

class pdmlabs.experiment.streaming.StreamingSemiSupervisedPdMExperiment(experiment_name: str, pipeline: PdMPipeline, param_space: dict, constraint_function: Callable = None, target_data: list[DataFrame] = None, target_sources: list[str] = None, historic_data: list[DataFrame] = [], historic_sources: list[str] = [], optimization_param: str = 'AD1_AUC', initial_random: int = 2, num_iteration: int = 20, batch_size: int = 1, n_jobs: int = 1, random_state: int = 42, random_n_tries: int = 3, constraint_max_retries: int = 10, historic_data_header: str = 'infer', target_data_header: str = 'infer', artifacts: str = 'artifacts', debug: bool = False, delay: float = None, log_best_scores: bool = False, maximize: bool = True, custom_evaluators: list = None)#

Bases: PdMExperiment

Streaming (online) semi-supervised anomaly detection.

Status: Experimental/Stub Implementation

This experiment flavor is designed for streaming data scenarios: - Processes data continuously as it arrives (row-by-row or in small batches) - Adapts models online without batch retraining - Produces predictions in real-time

Current Implementation: This is an early-stage stub that iterates over target data but does not yet implement full streaming evaluation logic. Use batch experiments for production.

Future Work: - Streaming parameter tuning - Online model adaptation - Concept drift detection - Memory-efficient processing

Raises:: NotImplementedError – Full streaming functionality not yet implemented.

Examples

>>> experiment = StreamingSemiSupervisedPdMExperiment(...)
>>> # Note: streaming experiments are currently stubs
>>> # Use batch experiments instead for now

execute() → None#

Execute placeholder streaming experiment (not fully implemented).

Returns:: Streaming experiments are currently stubs.
Return type:: None

class pdmlabs.experiment.streaming.StreamingUnsupervisedPdMExperiment(experiment_name: str, pipeline: PdMPipeline, param_space: dict, constraint_function: Callable = None, target_data: list[DataFrame] = None, target_sources: list[str] = None, historic_data: list[DataFrame] = [], historic_sources: list[str] = [], optimization_param: str = 'AD1_AUC', initial_random: int = 2, num_iteration: int = 20, batch_size: int = 1, n_jobs: int = 1, random_state: int = 42, random_n_tries: int = 3, constraint_max_retries: int = 10, historic_data_header: str = 'infer', target_data_header: str = 'infer', artifacts: str = 'artifacts', debug: bool = False, delay: float = None, log_best_scores: bool = False, maximize: bool = True, custom_evaluators: list = None)#

Bases: PdMExperiment

Streaming (online) unsupervised anomaly detection.

Status: Stub Implementation

This experiment flavor is designed for unsupervised streaming data: - Processes continuous data streams without labels - Adapts models in real-time - Produces anomaly scores online

Current Implementation: This is a placeholder stub with no execution logic. Use batch experiments for full functionality. Streaming support is planned for future versions.

Design Goals: - Minimal memory footprint for long-running applications - Per-sample or mini-batch prediction - Automatic concept drift handling - No offline/batch retraining required

Raises:: NotImplementedError – Streaming functionality not yet implemented.

Examples

>>> # Streaming experiments are not yet implemented
>>> # Use UnsupervisedPdMExperiment (batch) instead

execute() → None#

Execute placeholder unsupervised streaming experiment.

Returns:: Not implemented.
Return type:: None

Modules

`semi_supervised_experiment`
`unsupervised_experiment`

pdmlabs.experiment.streaming

Contents

pdmlabs.experiment.streaming#