Overfit: Classification

Experiment initialization and data preparation

from piml import Experiment
from piml.models import XGB2Classifier

Load data and train models.

exp = Experiment()
exp.data_loader(data="TaiwanCredit", silent=True)
exp.data_summary(feature_exclude=["LIMIT_BAL", "SEX", "EDUCATION", "MARRIAGE", "AGE"], silent=True)
exp.data_prepare(target="FlagDefault", task_type="classification", silent=True)

Train Model

exp.model_train(XGB2Classifier(), name="XGB2")

Histogram-based overfit test for a single feature

results = exp.model_diagnose(model="XGB2", show="overfit", slice_method="histogram",
                             slice_features=["BILL_AMT1"], threshold=1.05, min_samples=20,
                             original_scale=True, return_data=True, figsize=(5, 4))
results.data
Overfit Regions
[BILL_AMT1 BILL_AMT1) #Test #Train test_ACC train_ACC Gap
0 0.2879 0.3781 41 131 0.7561 0.8473 -0.0912


Histogram-based overfit test for two features

results = exp.model_diagnose(model="XGB2", show="overfit", slice_method="histogram",
                             slice_features=["PAY_1", "BILL_AMT1"], threshold=1.05, min_samples=20,
                             original_scale=True, return_data=True, figsize=(5, 4))
results.data
Overfit Regions
[PAY_1 PAY_1) [BILL_AMT1 BILL_AMT1) #Test #Train test_ACC train_ACC Gap
0 0.2222 0.3333 0.7167 0.8037 30 117 0.6333 0.7094 -0.0761
1 0.3333 0.4444 0.7311 0.7953 40 160 0.6000 0.7000 -0.1000
2 0.3333 0.4444 0.9237 0.9558 69 338 0.5797 0.6834 -0.1037
3 0.2222 0.3333 0.2814 0.3685 24 75 0.7083 0.8133 -0.1050
4 0.0000 0.1111 0.6213 0.6715 20 73 0.6500 0.7808 -0.1308


Histogram-based overfit test for a single feature on test set

results = exp.model_diagnose(model="XGB2", show="overfit", slice_method="histogram",
                             slice_features=["BILL_AMT1"], threshold=1.05, min_samples=20,
                             use_test=True, original_scale=True, return_data=True, figsize=(5, 4))
results.data
Overfit Regions
[BILL_AMT1 BILL_AMT1) #Test #Train test_ACC train_ACC Gap
0 0.2879 0.3781 41 131 0.7561 0.8473 -0.0912


Total running time of the script: ( 0 minutes 35.038 seconds)

Estimated memory usage: 40 MB

Gallery generated by Sphinx-Gallery