pdmlabs.thresholding.thresholder

pdmlabs.thresholding.thresholder#

Abstract base interface for thresholders.

Thresholders convert anomaly scores to threshold values for decision-making. This differs from post-processors which convert scores to binary labels directly.

Thresholders: - Accept anomaly scores (float values) - Return threshold value(s) - the boundary between normal and anomalous - Support both batch and online modes - Can be adaptive (threshold varies per sample) or static (fixed threshold)

Use cases: - Fixed threshold: Simple baseline (threshold=0.5, any score > threshold = anomaly) - Adaptive threshold: Adjusts per time period or based on local statistics - Survival analysis: Converts survival probabilities to RUL (Remaining Useful Life) - Context-aware: Different thresholds for different sources/times

Typical pipeline: anomaly_scores -> thresholder -> threshold -> binary_labels Or directly with post-processors: anomaly_scores -> post_processor -> binary_labels

Classes

ThresholderInterface(event_preferences)

Abstract base class for threshold determination methods.

class pdmlabs.thresholding.thresholder.ThresholderInterface(event_preferences: EventPreferences)#

Bases: ABC

Abstract base class for threshold determination methods.

Thresholders determine the boundary value(s) between normal and anomalous scores. This enables converting continuous anomaly scores to binary decisions.

Two usage patterns: 1. Single threshold: Apply same threshold to all scores 2. Adaptive threshold: Different threshold per sample/time

event_preferences#

Event configuration dict.

Type:: EventPreferences

abstract fit(historic_data: list, historic_sources: list[str], event_data: DataFrame, anomaly_ranges=None) → None#

Fit thresholder on training data (optional for some thresholders).

Some thresholders are stateless (e.g., constant threshold), others learn thresholds from training data or anomaly labels.

Parameters:

historic_data (list) – Training data DataFrames (one per source).
historic_sources (list[str]) – Source identifiers.
event_data (pd.DataFrame) – Event log with ‘date’, ‘type’, etc.
anomaly_ranges (list, optional) – Labels marking anomalous time periods. Used by supervised thresholders to learn optimal threshold.

abstract get_params()#

Return thresholder hyperparameters.

Returns:: Configuration parameters (e.g., {‘threshold’: 0.5}).
Return type:: dict

abstract infer_threshold(scores: list[float], source: str, event_data: DataFrame, scores_dates: list[Timestamp]) → list[float]#

Determine threshold value(s) for batch of scores (offline mode).

Returns threshold value for each score. Can be: - Single value repeated: [0.5, 0.5, 0.5, …] (static threshold) - Varying values: [0.4, 0.45, 0.5, 0.55, …] (adaptive threshold)

Parameters:

scores (list[float]) – Anomaly scores to threshold.
source (str) – Source identifier for source-specific thresholds.
event_data (pd.DataFrame) – Event log for context-aware thresholds.
scores_dates (list[pd.Timestamp]) – Timestamps of scores (enables time-based adaptive thresholds).

Returns:

Threshold value(s). Same length as scores.: Compare: anomaly_detected = (score > threshold)

Return type:

list[float]

abstract infer_threshold_one(score: float, source: str, event_data: DataFrame) → float#

Determine threshold for single score (online/streaming mode).

Parameters:

score (float) – Single anomaly score.
source (str) – Source identifier.
event_data (pd.DataFrame) – Event log (unused by most thresholders).

Returns:

Threshold value for this score.

Return type:

float

pdmlabs.thresholding.thresholder

Contents

pdmlabs.thresholding.thresholder#