Register sklearn Style Models

Assume we have sklearn style models fitted outside PiML workflow

For demonstration, we fit a model using XGBoost’s sklearn API

from xgboost import XGBRegressor
from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split

data = fetch_california_housing()
train_x, test_x, train_y, test_y = train_test_split(data.data, data.target, test_size=0.2)
feature_names = data.feature_names
target_name = data.target_names[0]

xgb2 = XGBRegressor(max_depth=2, n_estimators=100)
xgb2.fit(train_x, train_y)

xgb7 = XGBRegressor(max_depth=7, n_estimators=100)
xgb7.fit(train_x, train_y)
XGBRegressor(base_score=None, booster=None, callbacks=None,
             colsample_bylevel=None, colsample_bynode=None,
             colsample_bytree=None, early_stopping_rounds=None,
             enable_categorical=False, eval_metric=None, feature_types=None,
             gamma=None, gpu_id=None, grow_policy=None, importance_type=None,
             interaction_constraints=None, learning_rate=None, max_bin=None,
             max_cat_threshold=None, max_cat_to_onehot=None,
             max_delta_step=None, max_depth=7, max_leaves=None,
             min_child_weight=None, missing=nan, monotone_constraints=None,
             n_estimators=100, n_jobs=None, num_parallel_tree=None,
             predictor=None, random_state=None, ...)
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.


Load PiML

from piml import Experiment
exp = Experiment(highcode_only=True)

Register the fitted model into PiML (please make sure the datasets of different pipelines are the same)

pipeline_xgb2 = exp.make_pipeline(model=xgb2,
                                  train_x=train_x,
                                  train_y=train_y.ravel(),
                                  test_x=test_x,
                                  test_y=test_y.ravel(),
                                  feature_names=feature_names,
                                  target_name=target_name)
exp.register(pipeline_xgb2, "XGB-External-2")

pipeline_xgb7 = exp.make_pipeline(model=xgb7,
                                  train_x=train_x,
                                  train_y=train_y.ravel(),
                                  test_x=test_x,
                                  test_y=test_y.ravel(),
                                  feature_names=feature_names,
                                  target_name=target_name)
exp.register(pipeline_xgb7, "XGB-External-7")

Check model performance

exp.model_diagnose(model="XGB-External-2", show="accuracy_table")
          MSE     MAE       R2

Train  0.2509  0.3531   0.8112
Test   0.2857  0.3691   0.7872
Gap    0.0348  0.0160  -0.0240

Check model performance

exp.model_diagnose(model="XGB-External-7", show="accuracy_table")
          MSE     MAE       R2

Train  0.0508  0.1594   0.9617
Test   0.2363  0.3158   0.8240
Gap    0.1855  0.1563  -0.1377

Compare model robustness

exp.model_compare(models=["XGB-External-2", "XGB-External-7"], show="robustness_perf", figsize=(5, 4))
Model Performance: Perturb on All Features

Compare model resilience

exp.model_compare(models=["XGB-External-7", "XGB-External-2"], show="resilience_perf", figsize=(5, 4))
Resilience Test

Total running time of the script: ( 0 minutes 51.249 seconds)

Estimated memory usage: 40 MB

Gallery generated by Sphinx-Gallery