pdmlabs.thresholding.thresholder#
Abstract base interface for thresholders.
Thresholders convert anomaly scores to threshold values for decision-making. This differs from post-processors which convert scores to binary labels directly.
Thresholders: - Accept anomaly scores (float values) - Return threshold value(s) - the boundary between normal and anomalous - Support both batch and online modes - Can be adaptive (threshold varies per sample) or static (fixed threshold)
Use cases: - Fixed threshold: Simple baseline (threshold=0.5, any score > threshold = anomaly) - Adaptive threshold: Adjusts per time period or based on local statistics - Survival analysis: Converts survival probabilities to RUL (Remaining Useful Life) - Context-aware: Different thresholds for different sources/times
Typical pipeline: anomaly_scores -> thresholder -> threshold -> binary_labels Or directly with post-processors: anomaly_scores -> post_processor -> binary_labels
Classes
|
Abstract base class for threshold determination methods. |
- class pdmlabs.thresholding.thresholder.ThresholderInterface(event_preferences: EventPreferences)#
Bases:
ABCAbstract base class for threshold determination methods.
Thresholders determine the boundary value(s) between normal and anomalous scores. This enables converting continuous anomaly scores to binary decisions.
Two usage patterns: 1. Single threshold: Apply same threshold to all scores 2. Adaptive threshold: Different threshold per sample/time
- event_preferences#
Event configuration dict.
- Type:
- abstract fit(historic_data: list, historic_sources: list[str], event_data: DataFrame, anomaly_ranges=None) None#
Fit thresholder on training data (optional for some thresholders).
Some thresholders are stateless (e.g., constant threshold), others learn thresholds from training data or anomaly labels.
- Parameters:
historic_data (list) β Training data DataFrames (one per source).
historic_sources (list[str]) β Source identifiers.
event_data (pd.DataFrame) β Event log with βdateβ, βtypeβ, etc.
anomaly_ranges (list, optional) β Labels marking anomalous time periods. Used by supervised thresholders to learn optimal threshold.
- abstract get_params()#
Return thresholder hyperparameters.
- Returns:
Configuration parameters (e.g., {βthresholdβ: 0.5}).
- Return type:
dict
- abstract infer_threshold(scores: list[float], source: str, event_data: DataFrame, scores_dates: list[Timestamp]) list[float]#
Determine threshold value(s) for batch of scores (offline mode).
Returns threshold value for each score. Can be: - Single value repeated: [0.5, 0.5, 0.5, β¦] (static threshold) - Varying values: [0.4, 0.45, 0.5, 0.55, β¦] (adaptive threshold)
- Parameters:
scores (list[float]) β Anomaly scores to threshold.
source (str) β Source identifier for source-specific thresholds.
event_data (pd.DataFrame) β Event log for context-aware thresholds.
scores_dates (list[pd.Timestamp]) β Timestamps of scores (enables time-based adaptive thresholds).
- Returns:
- Threshold value(s). Same length as scores.
Compare: anomaly_detected = (score > threshold)
- Return type:
list[float]
- abstract infer_threshold_one(score: float, source: str, event_data: DataFrame) float#
Determine threshold for single score (online/streaming mode).
- Parameters:
score (float) β Single anomaly score.
source (str) β Source identifier.
event_data (pd.DataFrame) β Event log (unused by most thresholders).
- Returns:
Threshold value for this score.
- Return type:
float