pdmlabs.utils.utils#
Utility functions for time-series processing and event preference handling.
This module provides helper functions for: - Sliding window feature generation from time-series data - Automatic sliding window length detection using autocorrelation - Event preference expansion with wildcard matching - Parameter calculation for MANGO optimization
- Key Functions:
sliding_window: Convert a time-series into sliding windows find_length: Automatically determine window length using ACF Window: Class for rolling window feature mapping process_event_preferences_key: Expand event preferences with wildcards expand_event_preferences: Process all event preferences calculate_mango_parameters: Compute MANGO optimization parameters
Example
>>> import pandas as pd
>>> from pdmlabs.utils.utils import sliding_window, find_length
>>> data = pd.Series([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
>>> windows = sliding_window(data, window_len=3, step=1)
>>> # Estimate window length from ACF
>>> window_len = find_length(data.values)
>>> print(f"Recommended window length: {window_len}")
Functions
Calculate MANGO optimization parameters (num, jobs, initial_random) based on the current parameter space and constraints. |
|
|
Expand failure and reset event preferences by resolving wildcards. |
|
Automatically determine optimal sliding window length using autocorrelation. |
Expand an event preference rule with one wildcard dimension. |
|
Expand an event preference rule with two wildcard dimensions. |
|
|
Expand event preferences by resolving all wildcards. |
|
Convert a time-series column into sliding windows. |
Classes
|
Rolling window feature mapping for time-series data. |
- class pdmlabs.utils.utils.Window(window=100)#
Bases:
objectRolling window feature mapping for time-series data.
Converts a time-series into a matrix of consecutive overlapping windows, where each row represents a window of consecutive timesteps. This is useful for creating features for deep learning models and time-series analysis.
The transformation creates lagged features by shifting the series by n steps to create n sequential features for each window.
- Parameters:
window (int, default=100) – The size of each rolling window (number of timesteps to include). Use window=0 for no windowing (returns original series).
- detector#
Reference to an anomaly detector (for compatibility).
- Type:
object, optional
Examples
>>> import numpy as np >>> from pdmlabs.utils.utils import Window >>> data = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10]) >>> windower = Window(window=3) >>> windowed = windower.convert(data) >>> print(windowed) 0 1 2 0 NaN NaN 1.0 1 NaN 2.0 2.0 2 3.0 3.0 3.0 3 4.0 4.0 4.0 ...
Notes
The first (window-1) rows will contain NaN values due to the shifting operation.
- convert(X)#
Convert a time-series into rolling windows.
- Parameters:
X (array-like) – 1D time-series data to convert.
- Returns:
DataFrame where each row is a window of consecutive values. Shape is (n-window+1, window) where n is the input length. First (window-1) rows contain NaN values.
- Return type:
pd.DataFrame
Examples
>>> windower = Window(window=3) >>> result = windower.convert([1, 2, 3, 4, 5]) >>> # Returns dataframe with lagged features
- pdmlabs.utils.utils.calculate_mango_parameters(current_param_space_dict, MAX_JOBS, INITIAL_RANDOM, MAX_RUNS)#
Calculate MANGO optimization parameters (num, jobs, initial_random) based on the current parameter space and constraints.
- pdmlabs.utils.utils.expand_event_preferences(event_data: DataFrame, event_preferences: EventPreferences) EventPreferences#
Expand failure and reset event preferences by resolving wildcards.
Convenience wrapper around process_event_preferences_key that expands both ‘failure’ and ‘reset’ event preference categories.
- Parameters:
event_data (pd.DataFrame) – Available event data with columns: [‘type’, ‘source’, ‘description’]
event_preferences (EventPreferences) – Dictionary with keys ‘failure’ and ‘reset’, each containing lists of EventPreferencesTuple objects with potential wildcards.
- Returns:
Dictionary with same structure, but all wildcards expanded into concrete rules.
- Return type:
Examples
>>> event_prefs = { ... 'failure': [EventPreferencesTuple('*', 'critical', 'pump1', ['target1'])], ... 'reset': [EventPreferencesTuple('maintenance', '*', '*', ['target2'])] ... } >>> expanded = expand_event_preferences(event_data, event_prefs)
- pdmlabs.utils.utils.find_length(data)#
Automatically determine optimal sliding window length using autocorrelation.
Analyzes the autocorrelation function (ACF) to find the first major periodicity in the time-series. Uses local maxima detection to identify natural periods.
- Parameters:
data (array-like) – 1D time-series data. Multidimensional arrays return 0.
- Returns:
Recommended window length based on ACF periodicity. Returns 125 as default if no clear period is detected or period is outside acceptable range [3, 300].
- Return type:
int
Notes
Uses first 20,000 samples for efficiency
Analyzes up to 400 lags of ACF
Returns full period including the base offset
Default fallback is 125 samples
Examples
>>> import numpy as np >>> # Seasonal data with period ~50 >>> data = np.sin(np.arange(1000) * 2 * np.pi / 50) >>> window = find_length(data) >>> print(f"Detected window length: {window}")
- pdmlabs.utils.utils.process_event_preference_with_one_dont_care_bit(event_preference: EventPreferencesTuple, event_data: DataFrame, dont_care_bit_index: int) list[EventPreferencesTuple]#
Expand an event preference rule with one wildcard dimension.
Replaces one wildcard ‘*’ in an event preference with all matching values from event_data, generating multiple concrete preference rules.
- Parameters:
event_preference (EventPreferencesTuple) – Base event preference with one field set to ‘*’ (wildcard). Fields: description, type, source, target_sources.
event_data (pd.DataFrame) – Available event data with columns: [‘type’, ‘source’, ‘description’]
dont_care_bit_index (int) – Which field is the wildcard: 0=type, 1=description, 2=source
- Returns:
List of expanded concrete preferences for each matching event value.
- Return type:
list[EventPreferencesTuple]
Examples
>>> # Wildcard in 'type' field (dont_care_bit_index=0) >>> base_pref = EventPreferencesTuple(type='*', description='failure', source='pump1', target_sources=['target1']) >>> expanded = process_event_preference_with_one_dont_care_bit(base_pref, events_df, 0) >>> # Results in preferences for each type matching ('failure', source='pump1')
- pdmlabs.utils.utils.process_event_preference_with_two_dont_care_bits(event_preference: EventPreferencesTuple, event_data: DataFrame, dont_care_bit_1_index: int, dont_care_bit_2_index: int) list[EventPreferencesTuple]#
Expand an event preference rule with two wildcard dimensions.
Replaces two wildcards ‘*’ in an event preference with all matching value combinations from event_data, generating multiple concrete preference rules.
- Parameters:
event_preference (EventPreferencesTuple) – Base event preference with two fields set to ‘*’ (wildcards).
event_data (pd.DataFrame) – Available event data with columns: [‘type’, ‘source’, ‘description’]
dont_care_bit_1_index (int) – First wildcard position: 0=type, 1=description, 2=source
dont_care_bit_2_index (int) – Second wildcard position: 0=type, 1=description, 2=source
- Returns:
List of expanded concrete preferences for each matching combination.
- Return type:
list[EventPreferencesTuple]
Examples
>>> # Wildcards in 'type' and 'source' fields >>> base = EventPreferencesTuple(type='*', description='anomaly', source='*', target_sources=['pump1']) >>> expanded = process_event_preference_with_two_dont_care_bits(base, events_df, 0, 2)
- pdmlabs.utils.utils.process_event_preferences_key(event_data: DataFrame, event_preferences: list[EventPreferencesTuple]) list[EventPreferencesTuple]#
Expand event preferences by resolving all wildcards.
Processes a list of event preference rules containing wildcards (‘*’) and expands them into concrete rules by matching against available event data. Supports wildcard patterns with 0, 1, 2, or 3 don’t-care bits.
- Parameters:
event_data (pd.DataFrame) – Available event data with columns: [‘type’, ‘source’, ‘description’]
event_preferences (list[EventPreferencesTuple]) – Event preference rules with potential wildcards (*). Rules are processed for specific patterns: - 0 wildcards: Returned as-is (concrete rule) - 1 wildcard: Expanded using one don’t-care dimension - 2 wildcards: Expanded using two don’t-care dimensions - 3 wildcards: Expands to all events (stops processing remaining rules)
- Returns:
List of fully expanded concrete event preferences with duplicates removed.
- Return type:
list[EventPreferencesTuple]
Examples
>>> events = pd.DataFrame({ ... 'type': ['critical', 'warning'], ... 'source': ['pump1', 'pump2'], ... 'description': ['failure', 'anomaly'] ... }) >>> prefs = [EventPreferencesTuple('*', 'failure', 'pump1', ['target1'])] >>> concrete_prefs = process_event_preferences_key(events, prefs) >>> # Returns preferences for all types matching ('failure', 'pump1')
- pdmlabs.utils.utils.sliding_window(dfcol, window_len, step)#
Convert a time-series column into sliding windows.
Creates non-overlapping (or optionally overlapping based on step) windows from a time-series, transforming 1D data into 2D array suitable for ML.
- Parameters:
dfcol (pd.Series) – The input time-series data.
window_len (int) – Length of each window.
step (int) – Step size between consecutive windows. If step < window_len, windows overlap.
- Returns:
DataFrame where each row is a sliding window, with columns named ‘col_1’, ‘col_2’, etc.
- Return type:
pd.DataFrame
Examples
>>> import pandas as pd >>> data = pd.Series([1, 2, 3, 4, 5, 6, 7, 8]) >>> windows = sliding_window(data, window_len=3, step=2) >>> print(windows) col_1 col_2 col_3 0 1 2 3 1 3 4 5 2 5 6 7