pdmlabs.method.lof_semi#
Classes
|
- class pdmlabs.method.lof_semi.LocalOutlierFactor(event_preferences: EventPreferences, *args, **kwargs)#
Bases:
SemiSupervisedMethodInterface- fit(historic_data: list[DataFrame], historic_sources: list[str], event_data: DataFrame) None#
This method is used to fit a anomaly detection model in (relative) normal data, where the data are passed in form of Dataframes along with their respected source.
- Parameters:
historic_data – a list of Dataframes (used to fit a semi-supervised model). The historic_data list parameter elements should be copied if a corresponding method needs to store them for future processing
historic_sources – a list with strings (names) of the different sources
event_data – event data that are produced from the different sources
- Returns:
None.
- get_all_models()#
Return reference to internal model(s).
Returns model instances for inspection, export, or further processing. Structure depends on method - may return single model or dict of models.
- Returns:
- Underlying model object(s). Examples:
Single model: sklLearn model instance
Multiple models: {‘bearing_1’: model1, ‘bearing_2’: model2}
None: If model is not accessible/applicable
- Return type:
model or dict
- get_library() str#
Return the underlying library/framework name.
- Returns:
- Name of library used (e.g., ‘sklearn’, ‘torch’, ‘custom’).
Used for dependency tracking and method categorization.
- Return type:
str
- get_params() dict#
Return hyperparameters and configuration.
- Returns:
- Dictionary of hyperparameters (e.g.,
{‘n_neighbors’: 5, ‘contamination’: 0.1}). Useful for logging, model comparison, and reproducibility.
- Return type:
dict
- predict(target_data: DataFrame, source: str, event_data: DataFrame) list[float]#
Predict anomaly scores for batch of samples (offline mode).
Computes anomaly score for each row in target_data independently. Higher scores indicate more anomalous behavior.
- Parameters:
target_data (pd.DataFrame) – Feature matrix with dates in index, features in columns. Must have same features as training data.
source (str) – Source identifier (e.g., ‘bearing_1’). Used to select source-specific model if method maintains multiple models.
event_data (pd.DataFrame) – Event log with columns ‘date’, ‘type’, ‘source’, ‘description’. Can be used for context-aware scoring.
- Returns:
- Anomaly scores (float) with length = target_data.shape[0].
Score range and semantics depend on method: - Distance-based: typically [0, ∞) where higher = more anomalous - Probability-based: typically [0, 1] or (-∞, 0] log-likelihood - Reconstruction-based: typically [0, ∞) reconstruction error
- Return type:
list
Examples
>>> method = SomeAnomalyDetector(event_preferences={...}) >>> method.fit([df_train], ['bearing_1'], events_df, labels) >>> df_test = pd.DataFrame([feature values], index=[dates]) >>> scores = method.predict(df_test, 'bearing_1', events_df) >>> print(len(scores), scores[0]) # (100, 0.45)
- Raises:
NotImplementedError – If method hasn’t been fit (for supervised methods).
- predict_one(new_sample: Series, source: str, is_event: bool) float#
Predict anomaly score for single sample (online/streaming mode).
Computes anomaly score for one observation at a time. Useful for: - Real-time anomaly detection - Online learning scenarios - Memory-efficient processing
May maintain internal state (buffers, windows) for context-aware scoring.
- Parameters:
new_sample (pd.Series) – Single observation with feature values. Index should contain feature names matching training data.
source (str) – Source identifier for source-specific models.
is_event (bool) – Whether this sample is from an event timestamp. Can affect how method processes the sample (e.g., special handling for known events vs normal operation).
- Returns:
Anomaly score for this single sample (same scale as predict()).
- Return type:
float
Examples
>>> method = SomeAnomalyDetector(event_preferences={...}) >>> method.fit([df_train], ['bearing_1'], events_df, labels) >>> >>> # Online scoring >>> for idx, row in df_test.iterrows(): ... is_event = idx in event_timestamps ... score = method.predict_one(row, 'bearing_1', is_event) ... print(f'{idx}: {score}')