pdmlabs.postprocessing.dynamicth#
Dynamic adaptive thresholding post-processor (NASA LSTM Anomaly Detection).
DynamicThresholder implements an advanced adaptive thresholding algorithm adapted from NASA’s “Detecting Spacecraft Anomalies Using LSTMs and Nonparametric Dynamic Thresholding” paper. It converts scores to binary labels using statistical methods combined with anomaly sequence detection and pruning.
The algorithm: 1. Finds optimal threshold by maximizing impact on normal vs anomalous distributions 2. Groups detected anomalies into sequences 3. Prunes false positives using percentage difference criteria 4. Returns binary labels (0=normal, 1=anomaly)
Useful when: - Need sophisticated multi-pass anomaly detection - Baseline shifts significantly over time - Want to filter out isolated false positives (pruning)
Functions
|
Adaptive thresholding with anomaly sequence detection and pruning. |
Classes
|
Advanced adaptive thresholding using statistical and sequence analysis. |
- class pdmlabs.postprocessing.dynamicth.DynamicThresholder(event_preferences: EventPreferences, epsilon: float = 0.05, history_window=None)#
Bases:
PostProcessorInterfaceAdvanced adaptive thresholding using statistical and sequence analysis.
Implements multi-pass thresholding algorithm that: - Tests multiple threshold candidates (in range mean ± [3-5]*std) - Scores each threshold based on impact on mean/std of normal vs anomaly groups - Selects threshold that best separates normal from anomalous - Groups anomalies into sequences and evaluates their statistical significance - Prunes anomalies with small impact on distribution
- epsilon#
Pruning threshold. Percentage difference between consecutive anomaly impacts above which to keep it. Filters out small fluctuations.
- Type:
float
- history_window#
Number of recent scores for threshold calculation. None = use all history (set to 1 with alldata=True).
- Type:
int
- alldata#
If True, use entire history (not just recent window).
- Type:
bool
- anomaly_scores_dict#
Maintains score history per source.
- Type:
dict
References
“Detecting Spacecraft Anomalies Using LSTMs and Nonparametric Dynamic Thresholding” - Provides the core algorithm and evaluation metrics.
Examples
>>> from pdmlabs.postprocessing.dynamicth import DynamicThresholder >>> processor = DynamicThresholder( ... event_preferences={'failure': [], 'reset': []}, ... epsilon=0.05, # Prune if < 5% difference ... history_window=1000, # Use last 1000 scores ... ) >>> processor.fit([df_train], ['bearing_1'], events_df) >>> >>> scores = [0.5, 0.6, 0.55, 1.5, 0.7, 3.5, 0.8] >>> labels = processor.transform(scores, 'bearing_1', events_df) >>> # Returns [0, 0, 0, 1, 0, 1, 0] (thresholds adapt as history grows)
- fit(historic_data: list[DataFrame], historic_sources: list[str], event_data: DataFrame, anomaly_ranges=None) None#
No-op fit (thresholds computed on-the-fly during transform).
- Parameters:
historic_data (list[pd.DataFrame]) – Ignored.
historic_sources (list[str]) – Ignored.
event_data (pd.DataFrame) – Ignored.
anomaly_ranges – Ignored.
- get_params()#
Return hyperparameters.
- Returns:
- {‘epsilon’: pruning threshold, ‘history_window’: window size,
’All data in history’: whether using entire history}
- Return type:
dict
- transform(scores: list[float], source: str, event_data: DataFrame) list[float]#
Convert scores to binary labels with dynamic thresholding.
Processes scores sequentially, computing adaptive threshold for each point based on distribution of all previous scores. Uses sophisticated algorithm to find optimal threshold and prune false positives.
- Parameters:
scores (list[float]) – Anomaly scores to threshold.
source (str) – Source identifier (used to maintain separate histories).
event_data (pd.DataFrame) – Event log (unused).
- Returns:
Binary anomaly labels (0 or 1).
- Return type:
list[float]
Examples
>>> scores = [0.5, 0.6, 0.55, 1.2, 0.7, 2.5, 0.8] >>> labels = processor.transform(scores, 'bearing_1', events_df) >>> # Returns adaptive binary labels accounting for distribution changes
- transform_one(score_point: float, source: str, is_event: bool) float#
Threshold single score using dynamic thresholding (online mode).
- Parameters:
score_point (float) – Single anomaly score to threshold.
source (str) – Source identifier (used to maintain separate histories).
is_event (bool) – Event flag (unused).
- Returns:
1 if score is flagged as anomaly, 0 otherwise.
- Return type:
float
- pdmlabs.postprocessing.dynamicth.dynamicThresholding(MAerror, DesentThreshold=0.02, hscaleCount=1000, alldata=False)#
Adaptive thresholding with anomaly sequence detection and pruning.
Advanced algorithm from NASA’s spacecraft anomaly detection research. Uses multi-pass approach: 1. Test multiple threshold candidates (mean ± 3-5 stds) 2. Score each candidate by impact on distribution separation (Δμ/μ + Δσ/σ) 3. Group detected anomalies into temporal sequences 4. Prune weak anomalies based on percentage change threshold
This makes detection robust to: - Isolated false positives (pruned if impact < epsilon) - Score distribution shifts (adaptive threshold per point) - Clustered anomalies (treats as sequence, not individuals)
- Parameters:
MAerror (list[float]) – All anomaly scores observed so far.
DesentThreshold (float, optional) – Pruning parameter. Minimum percentage difference between consecutive anomaly impacts to keep anomaly. Range [0, 1]. Lower = more aggressive pruning. Defaults to 0.02 (2%).
hscaleCount (int, optional) – History window size (recent scores to consider). Defaults to 1000. Used only if alldata=False.
alldata (bool, optional) – If True, use entire history instead of window. Defaults to False.
- Returns:
- (success_bool, threshold_value)
success_bool: True if anomaly detected and passed all filters, False if threshold couldn’t be computed or anomaly was pruned
threshold_value: Calculated threshold value
- Return type:
tuple
- Algorithm details:
z-vector: [3, 3.17, 3.33, …, 4.83] sigma multiples for threshold search
Δμ/μ: relative change in mean if anomalies excluded
Δσ/σ: relative change in std if anomalies excluded
Maximization: (Δμ/μ + Δσ/σ) / (num_anomalies + num_sequences * num_sequences)
Pruning: Sorts anomalies by impact, finds elbow point > epsilon
- Edge cases:
len(history) == 1: Returns False (need more data)
No scores above threshold: Returns False
All scores are anomalies: Returns False (can’t prune reliably)
Degenerate std (all same values): Returns False
Time complexity: O(n*m) where n=candidates tested (12), m=history_length
References
“Detecting Spacecraft Anomalies Using LSTMs and Nonparametric Dynamic Thresholding” - Provides full algorithm with spacecraft telemetry examples
Examples
>>> scores = [0.5, 0.6, 0.55, 0.7, 0.8, 2.5, 3.0] >>> success, thresh = dynamicThresholding(scores, DesentThreshold=0.05) >>> # Evaluates ~12 thresholds, selects best separator >>> # Groups 2.5, 3.0 as sequence, prunes if together they have low impact