Skip to content

API-reference

This part of the project documentation focuses on an information-oriented approach. Use it as a reference for the technical implementation of the calculator project code.

Bases: ArgValidationMixin, PlottingMixin

best_params: dict property

Best LGBM parameters found during the optimization process.

Returns:

Name Type Description
dict dict

Best LGBM parameters

feature_importances: pd.Series property

Feature importances of the fitted model.

Returns:

Type Description
Series

pd.Series: Feature importances

fitted_model: lgb.Booster property

Fitted LGBM model object.

Returns:

Type Description
Booster

lgb.Booster: Fitted LGBM model

init_params: dict property

Initial LGBM parameters inferred based on data statistics and built-in logic.

Returns:

Name Type Description
dict dict

Initial LGBM parameters

study: optuna.Study property

Optuna study object.

Returns:

Type Description
Study

optuna.Study: Optuna study

fit(X, y, optuna_study_params=None)

Fits the LightGBM model by finding optimized parameters based on the training data and metric.

Parameters:

Name Type Description Default
X DataFrame

The training features.

required
y Series

The training labels.

required
optuna_study_params dict

Parameters for the Optuna study. Defaults to None.

None

Returns:

Type Description
None

None

Raises:

Type Description
ValueError

If the features or target arguments are invalid.

Example
from rapidgbm import RapidGBMTuner
tuner = RapidGBMTuner(metric='log_loss', trials=100, refit=True, verbosity=1)
tuner.fit(X_train, y_train)
Notes
  • The optimization metric is determined based on the metric specified during initialization.
  • For regression, the optimization metric is selected based on the eval_metric parameter, except for 'r2'. If eval_metric is 'r2', then the optimization metric is 'mean_squared_error'.
  • For classification, the optimization metric is always 'log_loss'.
  • The LGB Classifier/Regressor is inferred based on the metric and target variable statistics.
  • Initial LGBM parameters are inferred based on data statistics and built-in logic, and can be accessed using self._init_params.
  • The parameter grid for hyperparameter search is inferred based on data statistics and built-in logic.
  • The optuna_study_params parameter allows for customization of the Optuna study. Refer to the documentation for optuna.study.create_study for more details.

fit_optimized(X, y)

Train model with tuned params on whole train data

Parameters:

Name Type Description Default
X ndarray

Train features.

required
y ndarray

Train target.

required

Returns:

Type Description
None

None

Example
from rapidgbm import RapidGBMTuner
tuner = RapidGBMTuner(metric='log_loss', trials=100, refit=False, verbosity=1)
tuner.fit(X_train, y_train)
tuner.fit_optimized(np.array(X_train), np.array(y_train))

plot_importances(feature_importances, n_features=15, figsize=(10, 6), display=True, dark=True, save=True)

Plots the feature importances.

Parameters:

Name Type Description Default
feature_importances Series

The feature importances.

required
n_features int

Number of features to plot. Defaults to 15.

15
figsize tuple

Figure size. Defaults to (10,6).

(10, 6)
display bool

Display the plot in a browser. If False, the plot will be saved in the current working directory. Defaults to True.

True
dark bool

Display the dark or light version of the plot. Defaults to True.

True
save bool

Save the plot to the current working directory. Defaults to True.

True

Returns:

Type Description
None

None

plot_intermediate_values(study, legend=False, save=False, display=True)

Plots the intermediate values of the study.

Parameters:

Name Type Description Default
study Study

The Optuna study containing the intermediate values.

required
legend bool

Display the legend. Defaults to False.

False
save bool

Save the plot as an image. Defaults to False.

False
display bool

Display the plot. Defaults to True.

True

Returns:

Type Description
None

None

plot_optimization_history(study, save=False, display=True)

Plots the optimization history of the parameters in the given Optuna study.

Parameters:

Name Type Description Default
study Study

The Optuna study containing the optimization history.

required
save bool

Whether to save the plot as a PNG file. Defaults to False.

False
display bool

Whether to display the plot. Defaults to True.

True

Returns:

Type Description
None

None

plot_param_importances(study, save=False, display=True)

Plots the parameter importances in the given Optuna study.

Parameters:

Name Type Description Default
study Study

The Optuna study containing the parameter importances.

required
save bool

Whether to save the plot as an image file. Defaults to False.

False
display bool

Whether to display the plot. Defaults to True.

True

Returns:

Type Description
None

None

predict(test, threshold=0.5)

Predicts the target variable for the given test set using the fitted model.

Parameters:

Name Type Description Default
test DataFrame

The test features.

required
threshold float

The binary classification probability threshold. Defaults to 0.5.

0.5

Returns:

Type Description
ndarray

np.ndarray: The predicted values.

Raises:

Type Description
ValueError

If the model has not been fitted yet.

TypeError

If the test features are not a pandas DataFrame.

ValueError

If the threshold is not a float.

Example
from rapidgbm import RapidGBMTuner
tuner = RapidGBMTuner(metric='log_loss', trials=100, refit=True, verbosity=1)
tuner.fit(X_train, y_train)
predictions = tuner.predict(X_test, threshold=0.5)

predict_proba(test)

Predict probabilities for classification problems.

Parameters:

Name Type Description Default
test DataFrame

The test features.

required

Raises:

Type Description
TypeError

If self._fitted_model.params['objective'] == 'regression', indicating that predict_proba() is only applicable for classification objectives.

Returns:

Type Description
ndarray

np.ndarray: The predicted probabilities.

Example
from rapidgbm import RapidGBMTuner
tuner = RapidGBMTuner(metric='log_loss', trials=100, refit=True, verbosity=1)
tuner.fit(X_train, y_train)
predictions = tuner.predict_proba(X_test)

update_grid(lgb_key, params)

Change the grid for a specific LightGBM parameter.

Parameters:

Name Type Description Default
lgb_key str

The key of the LightGBM parameter.

required
params Union[list, tuple, dict]

The new grid for the parameter. It can be passed as a list, tuple, or dict.

required

Raises:

Type Description
ValueError

If params is not a list, tuple, or dict.

Returns:

Type Description
None

None

Notes
  • list (will be used for a random search)
  • tuple (will be used to define the uniform grid range between the min(tuple), max(tuple))
  • dict with keywords 'choice'/'low'/'high'
Example
tuner = Tuner()
tuner.update_grid('boosting_type', ['gbdt', 'rf'])  # random search
tuner.update_grid('learning_rate', (0.001, 0.1))  # uniform grid range between the min(tuple), max(tuple))
tuner.update_grid('num_leaves', {'low': 0.1, 'high': 5})  # uniform grid range between the low and high values
tuner.update_grid('max_data_in_leaf', {'choice' : [40, 50, 70]})  # random search