WeakSpot: Classification

Experiment initialization and data preparation

from piml import Experiment
from piml.models import XGB2Classifier

exp = Experiment()
exp.data_loader(data="TaiwanCredit", silent=True)
exp.data_summary(feature_exclude=["LIMIT_BAL", "SEX", "EDUCATION", "MARRIAGE", "AGE"], silent=True)
exp.data_prepare(target="FlagDefault", task_type="classification", silent=True)

Train Model

exp.model_train(XGB2Classifier(), name="XGB2")

Histogram-based weakspot for a single feature

results = exp.model_diagnose(model="XGB2", show="weakspot", slice_method="histogram",
                             slice_features=["PAY_1"], threshold=1.1, min_samples=100,
                             return_data=True, figsize=(5, 4))
results.data
Weak Regions
[PAY_1 PAY_1) #Test #Train test_ACC train_ACC Gap
0 0.2 0.8 1350 5440 0.7052 0.6888 0.0164


Histogram-based weakspot for two features

results = exp.model_diagnose(model="XGB2", show="weakspot", slice_method="histogram",
                             slice_features=["PAY_1", "PAY_2"], threshold=1.1, min_samples=100,
                             return_data=True, figsize=(5, 4))
results.data
Weak Regions
[PAY_1 PAY_1) [PAY_2 PAY_2) #Test #Train test_ACC train_ACC Gap
0 0.4 0.5 0.3750 0.5625 69 244 0.8551 0.7336 0.1215
1 0.3 0.4 0.0000 0.2500 191 800 0.6963 0.6675 0.0288
2 0.2 0.3 0.1111 0.2222 268 956 0.7276 0.7207 0.0069
3 0.3 0.4 0.3750 0.6250 322 1354 0.7205 0.7164 0.0041
4 0.2 0.3 0.3333 0.5556 351 1430 0.6154 0.6119 0.0035
5 0.0 0.1 0.3750 0.6250 77 357 0.6234 0.6303 -0.0069


Histogram-based weakspot for a single feature on test set

results = exp.model_diagnose(model="XGB2", show="weakspot", slice_method="histogram",
                             slice_features=["PAY_1"], threshold=1.1, min_samples=100,
                             use_test=True, return_data=True, figsize=(5, 4))
results.data
Weak Regions
[PAY_1 PAY_1) #Test #Train test_ACC train_ACC Gap
0 0.2222 0.4444 1265 5090 0.6964 0.6876 0.0088


Histogram-based weakspot for a single feature using AUC metric

results = exp.model_diagnose(model="XGB2", show="weakspot", slice_method="histogram",
                             slice_features=["PAY_1"], threshold=1.1, min_samples=100,
                             metric="AUC", return_data=True, figsize=(5, 4))
results.data
Weak Regions
[PAY_1 PAY_1) #Test #Train test_AUC train_AUC Gap
0 0.2 0.3 752 2936 0.6478 0.7071 -0.0593
1 0.1 0.2 3521 13975 0.6350 0.6970 -0.0620
2 0.3 0.4 513 2154 0.5481 0.6325 -0.0844
3 0.5 0.6 14 62 0.5250 0.7179 -0.1929
4 0.6 0.7 1 25 NaN 0.6314 NaN
5 0.7 0.8 1 10 NaN 0.4400 NaN


Tree-based weakspot for a single feature using ACC metric

results = exp.model_diagnose(model="XGB2", show="weakspot", slice_method="tree",
                             slice_features=["PAY_1"], threshold=1.1, min_samples=100,
                             metric="ACC", return_data=True, figsize=(5, 4))
results.data
Weak Regions
[PAY_1 PAY_1) #Test #Train test_ACC train_ACC Gap
0 0.2778 1.0000 602 2528 0.7276 0.6994 0.0282
1 0.1667 0.2778 752 2936 0.6862 0.6798 0.0063


Total running time of the script: ( 0 minutes 44.136 seconds)

Estimated memory usage: 43 MB

Gallery generated by Sphinx-Gallery