Note

Go to the end to download the full example code or to run this example in your browser via Binder

WeakSpot: Classification¶

Experiment initialization and data preparation

from piml import Experiment
from piml.models import XGB2Classifier

exp = Experiment()
exp.data_loader(data="TaiwanCredit", silent=True)
exp.data_summary(feature_exclude=["LIMIT_BAL", "SEX", "EDUCATION", "MARRIAGE", "AGE"], silent=True)
exp.data_prepare(target="FlagDefault", task_type="classification", silent=True)

Train Model

exp.model_train(XGB2Classifier(), name="XGB2")

Histogram-based weakspot for a single feature

results = exp.model_diagnose(model="XGB2", show="weakspot", slice_method="histogram",
                             slice_features=["PAY_1"], threshold=1.1, min_samples=100,
                             return_data=True, figsize=(5, 4))
results.data

	[PAY_1	PAY_1)	#Test	#Train	test_ACC	train_ACC	Gap
0	0.2	0.8	1350	5440	0.7052	0.6888	0.0164

Histogram-based weakspot for two features

results = exp.model_diagnose(model="XGB2", show="weakspot", slice_method="histogram",
                             slice_features=["PAY_1", "PAY_2"], threshold=1.1, min_samples=100,
                             return_data=True, figsize=(5, 4))
results.data

	[PAY_1	PAY_1)	[PAY_2	PAY_2)	#Test	#Train	test_ACC	train_ACC	Gap
0	0.4	0.5	0.3750	0.5625	69	244	0.8551	0.7336	0.1215
1	0.3	0.4	0.0000	0.2500	191	800	0.6963	0.6675	0.0288
2	0.2	0.3	0.1111	0.2222	268	956	0.7276	0.7207	0.0069
3	0.3	0.4	0.3750	0.6250	322	1354	0.7205	0.7164	0.0041
4	0.2	0.3	0.3333	0.5556	351	1430	0.6154	0.6119	0.0035
5	0.0	0.1	0.3750	0.6250	77	357	0.6234	0.6303	-0.0069

Histogram-based weakspot for a single feature on test set

results = exp.model_diagnose(model="XGB2", show="weakspot", slice_method="histogram",
                             slice_features=["PAY_1"], threshold=1.1, min_samples=100,
                             use_test=True, return_data=True, figsize=(5, 4))
results.data

	[PAY_1	PAY_1)	#Test	#Train	test_ACC	train_ACC	Gap
0	0.2222	0.4444	1265	5090	0.6964	0.6876	0.0088

Histogram-based weakspot for a single feature using AUC metric

results = exp.model_diagnose(model="XGB2", show="weakspot", slice_method="histogram",
                             slice_features=["PAY_1"], threshold=1.1, min_samples=100,
                             metric="AUC", return_data=True, figsize=(5, 4))
results.data

	[PAY_1	PAY_1)	#Test	#Train	test_AUC	train_AUC	Gap
0	0.2	0.3	752	2936	0.6478	0.7071	-0.0593
1	0.1	0.2	3521	13975	0.6350	0.6970	-0.0620
2	0.3	0.4	513	2154	0.5481	0.6325	-0.0844
3	0.5	0.6	14	62	0.5250	0.7179	-0.1929
4	0.6	0.7	1	25	NaN	0.6314	NaN
5	0.7	0.8	1	10	NaN	0.4400	NaN

Tree-based weakspot for a single feature using ACC metric

results = exp.model_diagnose(model="XGB2", show="weakspot", slice_method="tree",
                             slice_features=["PAY_1"], threshold=1.1, min_samples=100,
                             metric="ACC", return_data=True, figsize=(5, 4))
results.data

	[PAY_1	PAY_1)	#Test	#Train	test_ACC	train_ACC	Gap
0	0.2778	1.0000	602	2528	0.7276	0.6994	0.0282
1	0.1667	0.2778	752	2936	0.6862	0.6798	0.0063

Total running time of the script: ( 0 minutes 44.136 seconds)

Estimated memory usage: 43 MB

Gallery generated by Sphinx-Gallery