pdmlabs.postprocessing.self_tuning#
Self-tuning score normalization post-processor (adaptive z-score).
SelfTuningPostProcessor normalizes anomaly scores using an adaptive z-score transformation based on a sliding window of historical scores:
z = (score - mean) / std_dev
Uses initial window_length scores to estimate mean/std, then applies normalization to all scores. This adapts the scale to the actual score distribution.
Useful when: - Anomaly score ranges vary across different datasets/models - Want to normalize to a standard normal-like distribution - Thresholding at 0 or fixed values (e.g., threshold=2.0 for 2-sigma)
Classes
|
Normalize scores using adaptive z-score (mean and std from window). |
- class pdmlabs.postprocessing.self_tuning.SelfTuningPostProcessor(event_preferences: EventPreferences, window_length: int)#
Bases:
PostProcessorInterfaceNormalize scores using adaptive z-score (mean and std from window).
Computes mean and standard deviation from the first window_length scores, then normalizes all scores: (score - mean) / std. Handles edge case where std=0 by returning only (score - mean).
- window_length#
Number of initial scores to use for computing mean and std.
- Type:
int
- scores_buffer_per_source#
Maintains recent scores per source for online/streaming mode.
- Type:
dict
Examples
>>> from pdmlabs.postprocessing.self_tuning import SelfTuningPostProcessor >>> processor = SelfTuningPostProcessor( ... event_preferences={'failure': [], 'reset': []}, ... window_length=10 ... ) >>> processor.fit([df_train], ['bearing_1'], events_df) >>> >>> scores = [0.5, 0.6, 0.7, 0.8, 0.9, 1.0, 1.1, 1.2, 1.3, 1.4, 2.0, 3.0] >>> normalized = processor.transform(scores, 'bearing_1', events_df) >>> # First 10 scores used to compute mean/std, then all normalized
- fit(historic_data: list[DataFrame], historic_sources: list[str], event_data: DataFrame, anomaly_ranges=None) None#
No-op fit (normalization is computed from score window).
- Parameters:
historic_data (list[pd.DataFrame]) β Ignored.
historic_sources (list[str]) β Ignored.
event_data (pd.DataFrame) β Ignored.
anomaly_ranges β Ignored.
- get_params()#
Return hyperparameters.
- Returns:
{βwindow_lengthβ: number of scores to use for computing mean/std}
- Return type:
dict
- transform(scores: list[float], source: str, event_data: DataFrame) list[float]#
Normalize scores using z-score from initial window.
Computes mean and std from first window_length scores (removing duplicates). Then normalizes all scores: (score - mean) / std. If std=0, returns (score - mean) instead.
- Parameters:
scores (list[float]) β Anomaly scores to normalize.
source (str) β Source identifier (unused).
event_data (pd.DataFrame) β Event log (unused).
- Returns:
Normalized scores (same length as input).
- Return type:
list[float]
Examples
>>> scores = [1.0, 1.1, 1.2, 1.3, 1.4, 2.0, 3.0] # Mean ~1.2, some outliers >>> normalized = processor.transform(scores, 'bearing_1', events_df) >>> # Normalized so mean of first 5 = 0, std = 1
- transform_one(score_point: float, source: str, is_event: bool) float#
Normalize single score using buffered window (online mode).
Maintains a buffer of the first window_length scores. Once buffer is full, computes mean/std from buffer and normalizes the incoming score.
- Parameters:
score_point (float) β Single anomaly score to normalize.
source (str) β Source identifier (used to maintain separate buffers).
is_event (bool) β Event flag (unused).
- Returns:
- If buffer < window_length: returns score unchanged.
Otherwise: returns normalized score using buffered mean/std.
- Return type:
float