API-reference

This part of the project documentation focuses on an information-oriented approach. Use it as a reference for the technical implementation of the calculator project code.

Bases: ArgValidationMixin, PlottingMixin

`best_params: dict` `property`

Best LGBM parameters found during the optimization process.

Returns:

Name	Type	Description
`dict`	`dict`	Best LGBM parameters

`feature_importances: pd.Series` `property`

Feature importances of the fitted model.

Returns:

Type	Description
`Series`	pd.Series: Feature importances

`fitted_model: lgb.Booster` `property`

Fitted LGBM model object.

Returns:

Type	Description
`Booster`	lgb.Booster: Fitted LGBM model

`init_params: dict` `property`

Initial LGBM parameters inferred based on data statistics and built-in logic.

Returns:

Name	Type	Description
`dict`	`dict`	Initial LGBM parameters

`study: optuna.Study` `property`

Optuna study object.

Returns:

Type	Description
`Study`	optuna.Study: Optuna study

`fit(X, y, optuna_study_params=None)`

Fits the LightGBM model by finding optimized parameters based on the training data and metric.

Parameters:

Name	Type	Description	Default
`X`	`DataFrame`	The training features.	required
`y`	`Series`	The training labels.	required
`optuna_study_params`	`dict`	Parameters for the Optuna study. Defaults to None.	`None`

Returns:

Type	Description
`None`	None

Raises:

Type	Description
`ValueError`	If the features or target arguments are invalid.

Example

from rapidgbm import RapidGBMTuner
tuner = RapidGBMTuner(metric='log_loss', trials=100, refit=True, verbosity=1)
tuner.fit(X_train, y_train)

Notes

The optimization metric is determined based on the metric specified during initialization.
For regression, the optimization metric is selected based on the eval_metric parameter, except for 'r2'. If eval_metric is 'r2', then the optimization metric is 'mean_squared_error'.
For classification, the optimization metric is always 'log_loss'.
The LGB Classifier/Regressor is inferred based on the metric and target variable statistics.
Initial LGBM parameters are inferred based on data statistics and built-in logic, and can be accessed using self._init_params.
The parameter grid for hyperparameter search is inferred based on data statistics and built-in logic.
The optuna_study_params parameter allows for customization of the Optuna study. Refer to the documentation for optuna.study.create_study for more details.

`fit_optimized(X, y)`

Train model with tuned params on whole train data

Parameters:

Name	Type	Description	Default
`X`	`ndarray`	Train features.	required
`y`	`ndarray`	Train target.	required

Returns:

Type	Description
`None`	None

Example

from rapidgbm import RapidGBMTuner
tuner = RapidGBMTuner(metric='log_loss', trials=100, refit=False, verbosity=1)
tuner.fit(X_train, y_train)
tuner.fit_optimized(np.array(X_train), np.array(y_train))

`plot_importances(feature_importances, n_features=15, figsize=(10, 6), display=True, dark=True, save=True)`

Plots the feature importances.

Parameters:

Name	Type	Description	Default
`feature_importances`	`Series`	The feature importances.	required
`n_features`	`int`	Number of features to plot. Defaults to 15.	`15`
`figsize`	`tuple`	Figure size. Defaults to (10,6).	`(10, 6)`
`display`	`bool`	Display the plot in a browser. If False, the plot will be saved in the current working directory. Defaults to True.	`True`
`dark`	`bool`	Display the dark or light version of the plot. Defaults to True.	`True`
`save`	`bool`	Save the plot to the current working directory. Defaults to True.	`True`

Returns:

Type	Description
`None`	None

`plot_intermediate_values(study, legend=False, save=False, display=True)`

Plots the intermediate values of the study.

Parameters:

Name	Type	Description	Default
`study`	`Study`	The Optuna study containing the intermediate values.	required
`legend`	`bool`	Display the legend. Defaults to False.	`False`
`save`	`bool`	Save the plot as an image. Defaults to False.	`False`
`display`	`bool`	Display the plot. Defaults to True.	`True`

Returns:

Type	Description
`None`	None

`plot_optimization_history(study, save=False, display=True)`

Plots the optimization history of the parameters in the given Optuna study.

Parameters:

Name	Type	Description	Default
`study`	`Study`	The Optuna study containing the optimization history.	required
`save`	`bool`	Whether to save the plot as a PNG file. Defaults to False.	`False`
`display`	`bool`	Whether to display the plot. Defaults to True.	`True`

Returns:

Type	Description
`None`	None

`plot_param_importances(study, save=False, display=True)`

Plots the parameter importances in the given Optuna study.

Parameters:

Name	Type	Description	Default
`study`	`Study`	The Optuna study containing the parameter importances.	required
`save`	`bool`	Whether to save the plot as an image file. Defaults to False.	`False`
`display`	`bool`	Whether to display the plot. Defaults to True.	`True`

Returns:

Type	Description
`None`	None

`predict(test, threshold=0.5)`

Predicts the target variable for the given test set using the fitted model.

Parameters:

Name	Type	Description	Default
`test`	`DataFrame`	The test features.	required
`threshold`	`float`	The binary classification probability threshold. Defaults to 0.5.	`0.5`

Returns:

Type	Description
`ndarray`	np.ndarray: The predicted values.

Raises:

Type	Description
`ValueError`	If the model has not been fitted yet.
`TypeError`	If the test features are not a pandas DataFrame.
`ValueError`	If the threshold is not a float.

Example

from rapidgbm import RapidGBMTuner
tuner = RapidGBMTuner(metric='log_loss', trials=100, refit=True, verbosity=1)
tuner.fit(X_train, y_train)
predictions = tuner.predict(X_test, threshold=0.5)

`predict_proba(test)`

Predict probabilities for classification problems.

Parameters:

Name	Type	Description	Default
`test`	`DataFrame`	The test features.	required

Raises:

Type	Description
`TypeError`	If self._fitted_model.params['objective'] == 'regression', indicating that predict_proba() is only applicable for classification objectives.

Returns:

Type	Description
`ndarray`	np.ndarray: The predicted probabilities.

Example

from rapidgbm import RapidGBMTuner
tuner = RapidGBMTuner(metric='log_loss', trials=100, refit=True, verbosity=1)
tuner.fit(X_train, y_train)
predictions = tuner.predict_proba(X_test)

`update_grid(lgb_key, params)`

Change the grid for a specific LightGBM parameter.

Parameters:

Name	Type	Description	Default
`lgb_key`	`str`	The key of the LightGBM parameter.	required
`params`	`Union[list, tuple, dict]`	The new grid for the parameter. It can be passed as a list, tuple, or dict.	required

Raises:

Type	Description
`ValueError`	If params is not a list, tuple, or dict.

Returns:

Type	Description
`None`	None

Notes

list (will be used for a random search)
tuple (will be used to define the uniform grid range between the min(tuple), max(tuple))
dict with keywords 'choice'/'low'/'high'

Example

tuner = Tuner()
tuner.update_grid('boosting_type', ['gbdt', 'rf'])  # random search
tuner.update_grid('learning_rate', (0.001, 0.1))  # uniform grid range between the min(tuple), max(tuple))
tuner.update_grid('num_leaves', {'low': 0.1, 'high': 5})  # uniform grid range between the low and high values
tuner.update_grid('max_data_in_leaf', {'choice' : [40, 50, 70]})  # random search

API-reference

best_params: dict property

feature_importances: pd.Series property

fitted_model: lgb.Booster property

init_params: dict property

study: optuna.Study property

fit(X, y, optuna_study_params=None)

fit_optimized(X, y)

plot_importances(feature_importances, n_features=15, figsize=(10, 6), display=True, dark=True, save=True)

plot_intermediate_values(study, legend=False, save=False, display=True)

plot_optimization_history(study, save=False, display=True)

plot_param_importances(study, save=False, display=True)

predict(test, threshold=0.5)

predict_proba(test)

update_grid(lgb_key, params)

`best_params: dict` `property`

`feature_importances: pd.Series` `property`

`fitted_model: lgb.Booster` `property`

`init_params: dict` `property`

`study: optuna.Study` `property`

`fit(X, y, optuna_study_params=None)`

`fit_optimized(X, y)`

`plot_importances(feature_importances, n_features=15, figsize=(10, 6), display=True, dark=True, save=True)`

`plot_intermediate_values(study, legend=False, save=False, display=True)`

`plot_optimization_history(study, save=False, display=True)`

`plot_param_importances(study, save=False, display=True)`

`predict(test, threshold=0.5)`

`predict_proba(test)`

`update_grid(lgb_key, params)`