pdmlabs.preprocessing.record_level.feature_selector#
Feature selection preprocessor for dimensionality reduction.
FeatureSelector filters a DataFrame to keep only selected columns/features. Useful for: - Removing noisy/irrelevant features - Dimensionality reduction - Domain-specific feature sets
Classes
|
Select a subset of features for anomaly detection. |
- class pdmlabs.preprocessing.record_level.feature_selector.FeatureSelector(event_preferences: EventPreferences, selected_features: list[str])#
Bases:
RecordLevelPreProcessorInterfaceSelect a subset of features for anomaly detection.
This preprocessor is stateless and deterministic: given a fixed list of feature names, it returns only those columns from input data. Useful for: - Domain knowledge: select physically meaningful sensors - Dimensionality reduction: drop highly correlated features - Noise filtering: remove sensors with too much noise - Feature importance: keep top-K features from prior analysis
- selected_features#
Column names to keep from input DataFrames.
- Type:
list[str]
Examples
>>> from pdmlabs.preprocessing.record_level.feature_selector import FeatureSelector >>> import pandas as pd >>> >>> # Data with 5 features, but we only want 3 >>> df_train = pd.DataFrame({ ... 'vibration': [1, 2, 3], ... 'temp': [50, 60, 70], ... 'pressure': [100, 102, 104], ... 'noise1': [0.1, 0.2, 0.3], ... 'noise2': [0.05, 0.07, 0.09] ... }) >>> df_test = pd.DataFrame({...}) # Similar structure >>> >>> selector = FeatureSelector( ... event_preferences={'failure': [], 'reset': []}, ... selected_features=['vibration', 'temp', 'pressure'] ... ) >>> selector.fit([df_train], ['bearing_1'], events_df) >>> df_test_selected = selector.transform(df_test, 'bearing_1', events_df) >>> # df_test_selected now has only 3 columns: vibration, temp, pressure
- fit(historic_data: list[DataFrame], historic_sources: list[str], event_data: DataFrame, anomaly_ranges=None) None#
Fit feature selector (no-op, just placeholder).
Feature selection is stateless, so fit() does nothing. The selected features are fixed at initialization.
- Parameters:
historic_data (list[pd.DataFrame]) β Ignored.
historic_sources (list[str]) β Ignored.
event_data (pd.DataFrame) β Ignored.
anomaly_ranges β Ignored.
- get_params()#
Return hyperparameters.
- Returns:
{βfeaturesβ: list of selected feature names}
- Return type:
dict
Examples
>>> print(selector.get_params()) {'features': ['vibration', 'temp', 'pressure']}
- transform(target_data: DataFrame, source: str, event_data: DataFrame) DataFrame#
Select subset of features from target data.
- Parameters:
target_data (pd.DataFrame) β Test data to filter.
source (str) β Source identifier (unused).
event_data (pd.DataFrame) β Event log (unused).
- Returns:
- Subset of target_data with only selected_features columns.
If selected_features is empty, returns input unchanged (fallback).
- Return type:
pd.DataFrame
- Raises:
KeyError β If any selected feature not in target_data.columns.
Examples
>>> df_test_selected = selector.transform(df_test, 'bearing_1', events_df) >>> print(df_test_selected.columns.tolist()) ['vibration', 'temp', 'pressure'] >>> print(df_test_selected.shape) (100, 3) # 100 rows, 3 columns
- transform_one(new_sample: Series, source: str, is_event: bool) Series#
Select features from a single sample.
- Parameters:
new_sample (pd.Series) β Single row (Series with feature names as index).
source (str) β Source identifier (unused).
is_event (bool) β Event flag (unused).
- Returns:
Subset of new_sample with only selected_features.
- Return type:
pd.Series
Examples
>>> new_row = pd.Series({'vibration': 1.5, 'temp': 65, 'pressure': 103, 'noise1': 0.15, 'noise2': 0.06}) >>> selected = selector.transform_one(new_row, 'bearing_1', False) >>> print(selected.tolist()) [1.5, 65, 103]