8.2. CaliforniaHousing Data¶
This example notebook demonstrates how to use PiML in its low-code mode for developing machine learning models for the CaliforniaHousing data, which consists of 20,640 samples and 9 features, fetched by sklearn.datasets (see details here). PiML can load three versions of this data, including _raw, _trim1 (trimming only AveOccup) and _trim2 (trimming AveRooms, AveBedrms, Population and AveOccup). The _trim2 version is used in this example.
The response MedHouseVal (median house price per block in log scale) is continuous and it is a regression problem.
Click the ipynb links to run examples in Google Colab.
8.2.1. Load and Prepare Data¶
[1]:
from piml import Experiment
exp = Experiment()
[3]:
# Choose CaliforniaHousing_trim2
exp.data_loader()
[4]:
exp.data_summary()
[5]:
# Prepare dataset with default settings
exp.data_prepare()
[6]:
exp.feature_select()
[7]:
# Exploratory data analysis, check distribution and correlation
exp.eda()
8.2.2. Train Intepretable Models¶
[8]:
# First, choose GLM and ReLU-DNN with default settings, click run;
# Then, choose only ReLU-DNN and customize it with L1=0.0005; Reigster the three models
exp.model_train()
8.2.3. Interpretability and Explainability¶
[9]:
# Model-specific inherent interpretation including feature importance, main effects and pairwise interactions.
exp.model_interpret()
[10]:
# Model-agnostic post-hoc explanation by Permutation Feature Importance, PDP (1D and 2D) vs. ALE (1D and 2D), LIME vs. SHAP
exp.model_explain()
8.2.4. Model Diagnostics and Outcome Testing¶
[11]:
exp.model_diagnose()
8.2.5. Model Comparison and Benchmarking¶
[12]:
exp.model_compare()