Segmented Diagnose (Regression)

Experiment initialization and data preparation

from piml import Experiment
from piml.models import XGB2Regressor

exp = Experiment()
exp.data_loader(data="BikeSharing", silent=True)
exp.data_summary(feature_exclude=["yr", "mnth", "temp"], silent=True)
exp.data_prepare(target="cnt", task_type="regression", silent=True)

Train Model

exp.model_train(XGB2Regressor(), name="XGB2")

Summary of all segments (top 10 with the worst performance)

result = exp.segmented_diagnose(model='XGB2', show='segment_table',
                                segment_method='uniform', segment_bins=5, return_data=True)
result.data.head(10)
<pandas.io.formats.style.Styler object at 0x00000203177A7EE0>
Segment ID Feature Segment Size MSE
0 0 hr [0.6, 0.8) 735 0.018380
1 1 hr [0.2, 0.4) 712 0.014116
2 2 hum [0.0, 0.2) 15 0.013335
3 3 hum [0.2, 0.4) 441 0.013161
4 4 atemp [0.5909, 0.7878) 1026 0.012643
5 5 season 2.0 912 0.011245
6 6 windspeed [0.7298, 0.9123] 12 0.011149
7 7 workingday 0.0 1131 0.010764
8 8 windspeed [0.1825, 0.3649) 1598 0.010634
9 9 season 3.0 874 0.010363


Summary of all segments of a given feature (top 10 with the worst performance)

result = exp.segmented_diagnose(model="XGB2", show="segment_table",
                                segment_method="uniform", segment_feature="hr", segment_bins=5, return_data=True)
result.data
<pandas.io.formats.style.Styler object at 0x00000203177B2CA0>
Segment ID Feature Segment Size MSE
0 0 hr [0.6, 0.8) 735 0.018380
1 1 hr [0.2, 0.4) 712 0.014116
2 2 hr [0.4, 0.6) 585 0.007795
3 3 hr [0.8, 1.0] 732 0.005194
4 4 hr [0.0, 0.2) 712 0.001655


Residual analysis of the samples in that segment

exp.segmented_diagnose(model="XGB2", show="accuracy_residual",
                       segment_method="uniform", segment_feature="hr", segment_bins=5, segment_id=0,
                       show_feature="atemp", figsize=(5, 4))
Residual Plot

Weakspot analysis of the samples in that segment

exp.segmented_diagnose(model="XGB2", show="weakspot",
                       segment_method="uniform", segment_feature="hr", segment_bins=5, segment_id=0,
                       slice_features=["atemp"], metric="MSE", figsize=(5, 4))
Weak Regions

Distributional distance comparison between the specificed segment and the remaining (feature-by-feature)

res = exp.segmented_diagnose(model="XGB2", show="distribution_shift",
                             segment_method="uniform", segment_feature="hr", segment_bins=5, segment_id=0,
                             figsize=(5, 4), return_data=True)
Data distance(In segment vs. out of segment)

Distributional distance comparison between the specificed segment and the remaining (density of one selected feature)

res = exp.segmented_diagnose(model="XGB2", show="distribution_shift",
                             segment_method="uniform", segment_feature="hr", segment_bins=5,
                             segment_id=0, show_feature="hum", figsize=(5, 4), return_data=True)
Distribution plot

Total running time of the script: ( 0 minutes 57.184 seconds)

Estimated memory usage: 27 MB

Gallery generated by Sphinx-Gallery