⏱️ Implementing RUL Regression Methods#
RUL (Remaining Useful Life) regression methods predict how much time or cycles remain before failure. Instead of binary classification, they predict a continuous target: time-to-failure.
When to Use: When you have training data with exact failure times/cycles. The method learns to predict RUL as a regression target. Works only with SupervisedRULPdMExperiment.
Key Requirement: Training labels are continuous values (days, cycles, hours until failure) not binary classes.
Interface Overview#
RUL methods inherit from SupervisedMethodInterface (same as classification):
from pdmlabs.method.supervised_method import SupervisedMethodInterface
from pdmlabs.pdm_evaluation_types.types import EventPreferences
class MyRULMethod(SupervisedMethodInterface):
def __init__(self, event_preferences: EventPreferences, **kwargs):
super().__init__(event_preferences=event_preferences)
# Your initialization
Key Characteristics:
- Has fit() method (training on labeled data)
- Regression target: continuous RUL values (not 0/1 classification)
- Returns RUL predictions as scores
- Works with SupervisedRULPdMExperiment only
Required Methods#
All RUL methods implement the same interface as classification, but:
fit()receives continuous RUL labels, not binarypredict()returns continuous RUL values, not probabilitiesThe evaluation metrics differ (MAE, MSE instead of ROC-AUC)
Example: XGBoost RUL Regression#
The XGBoostRUL is a reference implementation of RUL regression.
File Location: pdmlabs/method/xgboostRUL.py
What It Does: - Trains XGBoost regressor to predict remaining useful life - Maintains separate regressors per data source - Returns RUL predictions as scores
Implementation Details:
class XGBoostRUL(SupervisedMethodInterface):
def __init__(self, event_preferences: EventPreferences,
save_model=False, *args, **kwargs):
super().__init__(event_preferences=event_preferences)
self.model_per_source = {}
self.initial_args = args
self.initial_kwargs = kwargs
self.save_model = save_model
Training Phase:
def fit(self, historic_data: list[pd.DataFrame],
historic_sources: list[str],
event_data: pd.DataFrame,
anomaly_ranges: list[list]) -> None:
"""
Train XGBoost regressor on RUL data.
Args:
historic_data: Training features (one DataFrame per source)
historic_sources: Source identifiers
event_data: Event log
anomaly_ranges: **RUL values** (not binary labels!)
- list per source
- each element is continuous RUL value
"""
for data, source, rul_labels in zip(historic_data, historic_sources, anomaly_ranges):
# Create REGRESSOR (not classifier!)
model = xgb.XGBRegressor(*self.initial_args, **self.initial_kwargs)
model.fit(data, rul_labels)
self.model_per_source[source] = model
# Optional: Save model to disk
if self.save_model:
import pickle
with open(f"model_{source}.pkl", "wb") as f:
pickle.dump(model, f)
Key Difference: Uses XGBRegressor not XGBClassifier
Prediction Phase:
def predict(self, target_data: pd.DataFrame, source: str, event_data: pd.DataFrame) -> list[float]:
"""Score test data as RUL predictions."""
model = self.model_per_source[source]
# predict() returns continuous values (not probabilities!)
predictions = model.predict(target_data)
return predictions.tolist()
Creating Your Own RUL Regression Method#
Follow this template:
Step 1: Create File
Create pdmlabs/method/my_rul_regressor.py:
import pandas as pd
import numpy as np
from sklearn.ensemble import RandomForestRegressor # Regression algorithm
from pdmlabs.method.supervised_method import SupervisedMethodInterface
from pdmlabs.pdm_evaluation_types.types import EventPreferences
class MyRULRegressor(SupervisedMethodInterface):
"""RUL regression for predictive maintenance.
This method learns to predict Remaining Useful Life (RUL) as a
continuous regression target.
"""
def __init__(self,
event_preferences: EventPreferences,
n_estimators: int = 100,
max_depth: int = 15,
*args,
**kwargs):
super().__init__(event_preferences=event_preferences)
self.n_estimators = n_estimators
self.max_depth = max_depth
self.initial_args = args
self.initial_kwargs = kwargs
self.model_per_source = {}
Step 2: Implement fit()
Train regressors on RUL data:
def fit(self, historic_data: list[pd.DataFrame],
historic_sources: list[str],
event_data: pd.DataFrame,
anomaly_ranges: list[list]) -> None:
"""
Train RUL regressor for each source.
Args:
historic_data: Training features (one per source)
historic_sources: Source names
event_data: Event log (optional reference)
anomaly_ranges: **Continuous RUL values** (e.g., [100, 200, 50, 30, ...])
The labels represent time (days, cycles, hours) until failure.
A sample with RUL=100 means it has 100 time units remaining.
"""
for data, source, rul_values in zip(historic_data, historic_sources, anomaly_ranges):
# Create REGRESSOR (not classifier)
regressor = RandomForestRegressor(
n_estimators=self.n_estimators,
max_depth=self.max_depth,
*self.initial_args,
**self.initial_kwargs
)
# Train on RUL targets
regressor.fit(data, rul_values)
self.model_per_source[source] = regressor
Step 3: Implement predict()
Return RUL predictions:
def predict(self, target_data: pd.DataFrame, source: str, event_data: pd.DataFrame) -> list[float]:
"""
Predict RUL for test data.
Args:
target_data: Test features
source: Source identifier
event_data: Event log (optional)
Returns:
List of RUL predictions (continuous values)
"""
if source not in self.model_per_source:
raise ValueError(f"No model for source '{source}'")
regressor = self.model_per_source[source]
# predict() returns continuous values
rul_predictions = regressor.predict(target_data)
# Ensure positive RUL values (clamp at 0 if needed)
rul_predictions = np.maximum(rul_predictions, 0)
return rul_predictions.tolist()
Step 4: Implement predict_one()
Predict RUL for single sample:
def predict_one(self, new_sample: pd.Series, source: str, is_event: bool) -> float:
"""
Predict RUL for a single sample.
Args:
new_sample: Single features as Series
source: Source identifier
is_event: Event flag (context)
Returns:
RUL prediction (continuous value)
"""
if source not in self.model_per_source:
raise ValueError(f"No model for source '{source}'")
regressor = self.model_per_source[source]
sample_array = new_sample.to_numpy().reshape(1, -1)
rul_prediction = regressor.predict(sample_array)[0]
# Ensure positive RUL
rul_prediction = max(rul_prediction, 0.0)
return float(rul_prediction)
Step 5: Implement get_params() and other methods
def get_params(self) -> dict:
"""Return hyperparameters."""
first_source = list(self.model_per_source.keys())[0]
model = self.model_per_source[first_source]
return {
**model.get_params(),
'n_estimators': self.n_estimators,
'max_depth': self.max_depth,
}
def __str__(self) -> str:
return 'MyRULRegressor'
def get_library(self) -> str:
return 'no_save'
def get_all_models(self):
return self.model_per_source
RUL Data Preparation#
Computing RUL from Time-to-Failure:
If you have time-to-failure information, compute RUL for each sample:
def compute_rul(failure_time, sample_time):
"""
Compute RUL as time until failure.
Args:
failure_time: Time when failure occurs (datetime or numeric)
sample_time: Time when sample was collected
Returns:
RUL in time units (days, cycles, hours)
"""
rul = failure_time - sample_time
return max(rul, 0) # RUL cannot be negative
# Example: Compute RUL for Beijing dataset
df['rul'] = df.groupby('bearing_id')['time_to_failure'].transform(
lambda x: range(len(x), 0, -1)
)
RUL Transformation Patterns:
Linear degradation:
df['rul'] = df.groupby('source').cumcount(ascending=False)
Exponential weighting (early samples more important):
df['rul'] = df.groupby('source').cumcount(ascending=False).pow(1.5)
Truncated RUL (cap at maximum):
max_rul = 100 # Cap RUL at 100 units df['rul'] = df.groupby('source').cumcount(ascending=False).clip(upper=max_rul)
Testing Your Implementation#
With your RUL dataset prepared, test your custom regressor using run_experiment:
from pdmlabs.utils.dataset import Dataset
from pdmlabs.experiment.batch.RUL_experiment import SupervisedRULPdMExperiment
from pdmlabs.RunExperiment import run_experiment
from my_rul_regressor import MyRULRegressor
from pdmlabs.pdm_evaluation_types.types import EventPreferences
# 1. Load data (must have continuous RUL labels)
df = pd.read_csv('your_rul_data.csv')
dataset_handler = Dataset(
data=df,
datetime_column="timestamp",
source_column="source",
train_sources=0.6,
val_sources=0.2,
test_sources=0.2
)
ds_rul, _ = dataset_handler.get_rul_dataset()
# 2. Define hyperparameters for your RUL regressor
method_param_space = {
'n_estimators': [50, 100, 150],
'max_depth': [10, 15, 20],
}
# 3. Define event preferences
event_prefs = EventPreferences(
preprocess_target_events=True,
postprocess_target_events=True,
keep_internal_target_events=False,
keep_internal_nontarget_events=False
)
# 4. Run experiment with run_experiment
best_params = run_experiment(
dataset=ds_rul,
methods=[MyRULRegressor(event_preferences=event_prefs)],
param_space_dict_per_method=[method_param_space],
method_names=['MyRULRegressor'],
experiments=[SupervisedRULPdMExperiment],
experiment_names=['RUL Regression'],
MAX_RUNS=15,
MAX_JOBS=2,
INITIAL_RANDOM=2,
profile_size=10,
optimization_param='MAE',
maximize=False
)
# 5. Check results
print(f"Best parameters: {best_params[0]}")
Next Steps#
Review
XGBoostRULimplementation inpdmlabs/method/xgboostRUL.pyCheck RUL transformations in
pdmlabs/utils/rul_transformations.pyExplore
SupervisedRULPdMExperimentinpdmlabs/experiment/batch/Review dataset RUL preparation in
pdmlabs/utils/dataset.py::get_rul_dataset()