Note

Go to the end to download the full example code or to run this example in your browser via Binder

Segmented Diagnose (Regression)¶

Experiment initialization and data preparation

from piml import Experiment
from piml.models import XGB2Regressor

exp = Experiment()
exp.data_loader(data="BikeSharing", silent=True)
exp.data_summary(feature_exclude=["yr", "mnth", "temp"], silent=True)
exp.data_prepare(target="cnt", task_type="regression", silent=True)

Train Model

exp.model_train(XGB2Regressor(), name="XGB2")

Summary of all segments (top 10 with the worst performance)

result = exp.segmented_diagnose(model='XGB2', show='segment_table',
                                segment_method='uniform', segment_bins=5, return_data=True)
result.data.head(10)

<pandas.io.formats.style.Styler object at 0x00000203177A7EE0>

	Segment ID	Feature	Segment	Size	MSE
0	0	hr	[0.6, 0.8)	735	0.018380
1	1	hr	[0.2, 0.4)	712	0.014116
2	2	hum	[0.0, 0.2)	15	0.013335
3	3	hum	[0.2, 0.4)	441	0.013161
4	4	atemp	[0.5909, 0.7878)	1026	0.012643
5	5	season	2.0	912	0.011245
6	6	windspeed	[0.7298, 0.9123]	12	0.011149
7	7	workingday	0.0	1131	0.010764
8	8	windspeed	[0.1825, 0.3649)	1598	0.010634
9	9	season	3.0	874	0.010363

Summary of all segments of a given feature (top 10 with the worst performance)

result = exp.segmented_diagnose(model="XGB2", show="segment_table",
                                segment_method="uniform", segment_feature="hr", segment_bins=5, return_data=True)
result.data

<pandas.io.formats.style.Styler object at 0x00000203177B2CA0>

	Segment ID	Feature	Segment	Size	MSE
0	0	hr	[0.6, 0.8)	735	0.018380
1	1	hr	[0.2, 0.4)	712	0.014116
2	2	hr	[0.4, 0.6)	585	0.007795
3	3	hr	[0.8, 1.0]	732	0.005194
4	4	hr	[0.0, 0.2)	712	0.001655

Residual analysis of the samples in that segment

exp.segmented_diagnose(model="XGB2", show="accuracy_residual",
                       segment_method="uniform", segment_feature="hr", segment_bins=5, segment_id=0,
                       show_feature="atemp", figsize=(5, 4))

Weakspot analysis of the samples in that segment

exp.segmented_diagnose(model="XGB2", show="weakspot",
                       segment_method="uniform", segment_feature="hr", segment_bins=5, segment_id=0,
                       slice_features=["atemp"], metric="MSE", figsize=(5, 4))

Distributional distance comparison between the specificed segment and the remaining (feature-by-feature)

res = exp.segmented_diagnose(model="XGB2", show="distribution_shift",
                             segment_method="uniform", segment_feature="hr", segment_bins=5, segment_id=0,
                             figsize=(5, 4), return_data=True)

Data distance(In segment vs. out of segment)

Distributional distance comparison between the specificed segment and the remaining (density of one selected feature)

res = exp.segmented_diagnose(model="XGB2", show="distribution_shift",
                             segment_method="uniform", segment_feature="hr", segment_bins=5,
                             segment_id=0, show_feature="hum", figsize=(5, 4), return_data=True)

Total running time of the script: ( 0 minutes 57.184 seconds)

Estimated memory usage: 27 MB

Gallery generated by Sphinx-Gallery