pythonintermediate
LightGBM Feature Importance Analysis
Train a LightGBM model and analyse feature importance using split, gain, and permutation methods.
pythonPress ⌘/Ctrl + Shift + C to copy
import lightgbm as lgb
import numpy as np
import pandas as pd
from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
X, y = make_regression(n_samples=1000, n_features=20, n_informative=10, random_state=42)
feature_names = [f'feat_{i}' for i in range(20)]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
train_data = lgb.Dataset(X_train, label=y_train, feature_name=feature_names)
params = {'objective': 'regression', 'num_leaves': 31, 'learning_rate': 0.05, 'n_estimators': 200, 'verbosity': -1}
model = lgb.train(params, train_data, num_boost_round=200, valid_sets=[lgb.Dataset(X_test, y_test)], callbacks=[lgb.early_stopping(20), lgb.log_evaluation(0)])
fi_gain = pd.Series(model.feature_importance(importance_type='gain'), index=feature_names)
fi_split = pd.Series(model.feature_importance(importance_type='split'), index=feature_names)
print('Top 5 by gain:', fi_gain.nlargest(5).to_dict())
print('Top 5 by split:', fi_split.nlargest(5).to_dict())Use Cases
- feature selection
- model interpretation
- regression analysis
Tags
Related Snippets
Similar patterns you can reuse in the same workflow.
pythonintermediate
SHAP Model Explainability in Python
Explain ML model predictions globally and locally using SHAP values with tree-based models.
Best for: model interpretability
#shap#explainability
pythonintermediate
Custom sklearn Pipeline with Transformer
Build a custom scikit-learn Pipeline with a custom BaseEstimator Transformer for data preprocessing.
Best for: custom preprocessing
#sklearn#pipeline
pythonintermediate
AutoML with FLAML for Fast Tuning
Run automated machine learning with FLAML to find the best model and hyperparameters efficiently.
Best for: automated ML
#automl#flaml
pythonintermediate
CatBoost Gradient Boosting Training
Train a CatBoost model with automatic categorical feature handling and built-in cross-validation.
Best for: categorical ML
#catboost#gradient-boosting