API-reference
This part of the project documentation focuses on
an information-oriented approach. Use it as a
reference for the technical implementation of the
calculator project code.
Bases: ArgValidationMixin, PlottingMixin
best_params: dict
property
Best LGBM parameters found during the optimization process.
Returns:
| Name | Type | Description |
|---|---|---|
dict |
dict
|
Best LGBM parameters |
feature_importances: pd.Series
property
Feature importances of the fitted model.
Returns:
| Type | Description |
|---|---|
Series
|
pd.Series: Feature importances |
fitted_model: lgb.Booster
property
Fitted LGBM model object.
Returns:
| Type | Description |
|---|---|
Booster
|
lgb.Booster: Fitted LGBM model |
init_params: dict
property
Initial LGBM parameters inferred based on data statistics and built-in logic.
Returns:
| Name | Type | Description |
|---|---|---|
dict |
dict
|
Initial LGBM parameters |
study: optuna.Study
property
Optuna study object.
Returns:
| Type | Description |
|---|---|
Study
|
optuna.Study: Optuna study |
fit(X, y, optuna_study_params=None)
Fits the LightGBM model by finding optimized parameters based on the training data and metric.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
X |
DataFrame
|
The training features. |
required |
y |
Series
|
The training labels. |
required |
optuna_study_params |
dict
|
Parameters for the Optuna study. Defaults to None. |
None
|
Returns:
| Type | Description |
|---|---|
None
|
None |
Raises:
| Type | Description |
|---|---|
ValueError
|
If the features or target arguments are invalid. |
Example
from rapidgbm import RapidGBMTuner
tuner = RapidGBMTuner(metric='log_loss', trials=100, refit=True, verbosity=1)
tuner.fit(X_train, y_train)
Notes
- The optimization metric is determined based on the metric specified during initialization.
- For regression, the optimization metric is selected based on the
eval_metricparameter, except for 'r2'. Ifeval_metricis 'r2', then the optimization metric is 'mean_squared_error'. - For classification, the optimization metric is always 'log_loss'.
- The LGB Classifier/Regressor is inferred based on the metric and target variable statistics.
- Initial LGBM parameters are inferred based on data statistics and built-in logic, and can be accessed
using
self._init_params. - The parameter grid for hyperparameter search is inferred based on data statistics and built-in logic.
- The
optuna_study_paramsparameter allows for customization of the Optuna study. Refer to the documentation foroptuna.study.create_studyfor more details.
fit_optimized(X, y)
Train model with tuned params on whole train data
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
X |
ndarray
|
Train features. |
required |
y |
ndarray
|
Train target. |
required |
Returns:
| Type | Description |
|---|---|
None
|
None |
Example
from rapidgbm import RapidGBMTuner
tuner = RapidGBMTuner(metric='log_loss', trials=100, refit=False, verbosity=1)
tuner.fit(X_train, y_train)
tuner.fit_optimized(np.array(X_train), np.array(y_train))
plot_importances(feature_importances, n_features=15, figsize=(10, 6), display=True, dark=True, save=True)
Plots the feature importances.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
feature_importances |
Series
|
The feature importances. |
required |
n_features |
int
|
Number of features to plot. Defaults to 15. |
15
|
figsize |
tuple
|
Figure size. Defaults to (10,6). |
(10, 6)
|
display |
bool
|
Display the plot in a browser. If False, the plot will be saved in the current working directory. Defaults to True. |
True
|
dark |
bool
|
Display the dark or light version of the plot. Defaults to True. |
True
|
save |
bool
|
Save the plot to the current working directory. Defaults to True. |
True
|
Returns:
| Type | Description |
|---|---|
None
|
None |
plot_intermediate_values(study, legend=False, save=False, display=True)
Plots the intermediate values of the study.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
study |
Study
|
The Optuna study containing the intermediate values. |
required |
legend |
bool
|
Display the legend. Defaults to False. |
False
|
save |
bool
|
Save the plot as an image. Defaults to False. |
False
|
display |
bool
|
Display the plot. Defaults to True. |
True
|
Returns:
| Type | Description |
|---|---|
None
|
None |
plot_optimization_history(study, save=False, display=True)
Plots the optimization history of the parameters in the given Optuna study.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
study |
Study
|
The Optuna study containing the optimization history. |
required |
save |
bool
|
Whether to save the plot as a PNG file. Defaults to False. |
False
|
display |
bool
|
Whether to display the plot. Defaults to True. |
True
|
Returns:
| Type | Description |
|---|---|
None
|
None |
plot_param_importances(study, save=False, display=True)
Plots the parameter importances in the given Optuna study.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
study |
Study
|
The Optuna study containing the parameter importances. |
required |
save |
bool
|
Whether to save the plot as an image file. Defaults to False. |
False
|
display |
bool
|
Whether to display the plot. Defaults to True. |
True
|
Returns:
| Type | Description |
|---|---|
None
|
None |
predict(test, threshold=0.5)
Predicts the target variable for the given test set using the fitted model.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
test |
DataFrame
|
The test features. |
required |
threshold |
float
|
The binary classification probability threshold. Defaults to 0.5. |
0.5
|
Returns:
| Type | Description |
|---|---|
ndarray
|
np.ndarray: The predicted values. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If the model has not been fitted yet. |
TypeError
|
If the test features are not a pandas DataFrame. |
ValueError
|
If the threshold is not a float. |
Example
from rapidgbm import RapidGBMTuner
tuner = RapidGBMTuner(metric='log_loss', trials=100, refit=True, verbosity=1)
tuner.fit(X_train, y_train)
predictions = tuner.predict(X_test, threshold=0.5)
predict_proba(test)
Predict probabilities for classification problems.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
test |
DataFrame
|
The test features. |
required |
Raises:
| Type | Description |
|---|---|
TypeError
|
If self._fitted_model.params['objective'] == 'regression', indicating that predict_proba() is only applicable for classification objectives. |
Returns:
| Type | Description |
|---|---|
ndarray
|
np.ndarray: The predicted probabilities. |
Example
from rapidgbm import RapidGBMTuner
tuner = RapidGBMTuner(metric='log_loss', trials=100, refit=True, verbosity=1)
tuner.fit(X_train, y_train)
predictions = tuner.predict_proba(X_test)
update_grid(lgb_key, params)
Change the grid for a specific LightGBM parameter.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
lgb_key |
str
|
The key of the LightGBM parameter. |
required |
params |
Union[list, tuple, dict]
|
The new grid for the parameter. It can be passed as a list, tuple, or dict. |
required |
Raises:
| Type | Description |
|---|---|
ValueError
|
If params is not a list, tuple, or dict. |
Returns:
| Type | Description |
|---|---|
None
|
None |
Notes
- list (will be used for a random search)
- tuple (will be used to define the uniform grid range between the min(tuple), max(tuple))
- dict with keywords 'choice'/'low'/'high'
Example
tuner = Tuner()
tuner.update_grid('boosting_type', ['gbdt', 'rf']) # random search
tuner.update_grid('learning_rate', (0.001, 0.1)) # uniform grid range between the min(tuple), max(tuple))
tuner.update_grid('num_leaves', {'low': 0.1, 'high': 5}) # uniform grid range between the low and high values
tuner.update_grid('max_data_in_leaf', {'choice' : [40, 50, 70]}) # random search