piml
.Experiment¶
- class piml.Experiment(mode='external', highcode_only=False)¶
The workflow management class.
- Parameters:
- modestr, default=’external’
The path of built-in demo datasets.
- highcode_onlybool, default=False
Whether to use high code mode only. If True, then the css ingestion is disabled, and low-code interface may look bad.
Methods
data_loader
([data, silent, spark, ...])Load data for experimentation.
data_prepare
([target, sample_weight, ...])Prepare data for model fitting.
data_quality
([method, dataset, show, ...])Check the data quality and remove outliers.
data_summary
([feature_type, ...])Summarize basic data statistics.
eda
([show, uni_feature, bi_features, ...])Run exploratory data analysis.
feature_select
([method, threshold, ...])Select features that are important for modeling.
get_data
([x, y, sample_weight, train, test])Get the preprocessed train-test data.
Get the input feature names.
Get the data type of each input feature.
Get the list of names of all registered interpretable models.
get_leaderboard
([metric])Show the performance comparison table of all trained models.
get_leaderboard_registered
([metric])Show the performance comparison table of all registered models.
get_model
(model)Get a registered pipeline.
get_model_config
(model)Get the configuration of a model.
Get the list of names of all registered models.
Get the raw train-test data.
Get the target feature name.
make_pipeline
([model, task_type, ...])Customize a pipeline.
model_compare
([models, show, metric, ...])Compare the diagnostic results of multiple models.
model_diagnose
([model, show, metric, ...])Test model performance using various diagnostic tools.
model_explain
([model, show, uni_feature, ...])Explain an arbitrary fitted model using post-hoc explanation tools.
model_fairness
([model, show, metric, ...])Test model fairness.
model_fairness_compare
([models, show, ...])Compare the fairness results of multiple models.
model_fairness_solas
([model, show, metric, ...])Test model fairness based on solas-ai.
model_interpret
([model, show, uni_feature, ...])Interpret inherently interpretable models.
model_save
(model[, path])Save a PiML-trained model as a pickle file.
model_train
([model, name])Fit interpretable models.
model_tune
([model, method, parameters, ...])Refit a model with new parameters.
register
(pipeline, name)Register a pipeline.
segmented_diagnose
([show, model, ...])Test model performance using various diagnostic tools after bucketing.
- data_loader(data: Union[str, DataFrame] = None, silent: bool = False, spark: bool = False, spark_sample_size: int = 100000, spark_sample_by_feature: unicode = None, spark_sample_fractions: dict = None, spark_random_state: int = 0)¶
Load data for experimentation.
If no parameter is passed into the method, the program will run in low code mode, otherwise high code mode.
Note that loading new data will reset all data, model and experimental results to None.
- Parameters:
- data{‘CoCircles’, ‘Friedman’, ‘BikeSharing’, ‘TaiwanCredit’, ‘CaliforniaHousing_raw’, ‘CaliforniaHousing_trim1’, ‘CaliforniaHousing_trim2’, ‘SimuCredit’, ‘SolasSimu1’, ‘SolasHMDA’, pd.DataFrame], default=None
The supported inputs:
‘CoCircles’: Gaussian data with a spherical decision boundary for binary classification,
generated via Scikit-Learn. - ‘Friedman’: ‘Friedman #1’ regression problem, generated via Scikit-Learn. - ‘BikeSharing’: Refer to https://archive.ics.uci.edu/ml/datasets/bike+sharing+dataset. - ‘TaiwanCredit’: Refer to https://archive.ics.uci.edu/ml/datasets/default+of+credit+card+clients. - ‘CaliforniaHousing_raw’: Refer to https://developers.google.com/machine-learning/crash-course/california-housing-data-description. - ‘CaliforniaHousing_trim1’: ‘CaliforniaHousing_raw’ dataset with the feature ‘AveOccup’ trimmed by upper threshold 5. - ‘CaliforniaHousing_trim2’: ‘CaliforniaHousing_raw’ dataset with the features ‘AveRooms’, ‘AveBedrms’,’Population’, ‘AveOccup’ trimmed by upper threshold quantile (0.98). - ‘SimuCredit’: A Credit simulation data for fairness testing. - ‘SolasSimu1’: A simulated dataset, modified from the ‘Friedman #1’ regression problem. The covariates used for modeling are ‘Segment’, ‘x1’, ‘x2’, …, ‘x5’, the response ‘Label’ is binary, and it is a classification problem. The rest variables are demographic variables used for testing fairness. The data is contributed by Solas-AI (https://github.com/SolasAI/solas-ai-disparity). - ‘SolasHMDA’: A preprocessed sample of the 2018 Home Mortgage Disclosure Act (HMDA) data. The HMDA dataset includes information about nearly every home mortgage application in the United States. - pandas DataFrame: Dataframe with row as data points and column as variables.
- silentbool, default=False
Whether to display data preview or not.
- sparkbool, default=False
Whether to load data using spark backend.
- spark_sample_sizeint, default=100000
The number of samples to load, used as spark=True.
Note that we would convert this value to the frac parameter in spark, and the actual number of samples retrieved from spark may be different from this value. If spark_sample_size=None, then the whole data would be used, and no subsampling will be performed.
- spark_sample_by_featurestr, default=None
The column name to be used for stratified sampling, used as spark=True. If None, no stratified sampling will be performed.
- spark_sample_fractionsdict, default=None
The ratios of each category in spark_sample_by_feature, used as spark=True and spark_sample_by_feature is not None.
For instance, if spark_sample_by_feature has two categories (0 and 1), then spark_sample_fractions={0: 1.0, 1: 2.0} means that the ratio of category 0 and category 1 in the subsample is 1.0:2.0. If it is None, or mis-specified, we would keep the ratios of each category to be 1:1.
Note that the actual ratios between the categories can be different from the given ratios, as the total counts of some categories may be less than the desired number.
- spark_random_stateint, default=0
The random seed for spark subsampling, used as spark=True and spark_sample_size is not None.
- data_prepare(target: unicode = None, sample_weight: unicode = None, task_type: unicode = None, test_ratio: Union[float, list] = None, random_state: int = None, split_method: unicode = None, train_idx=None, test_idx=None, return_data: bool = None, silent: bool = None)¶
Prepare data for model fitting.
This step will set the target response, task type (classification or regression), and train / test split. If no parameter is passed into the method, the program will run in low code mode, otherwise high code mode.
- Parameters:
- targetstr, default=None
Target variable name. If None, the last column in the data will be selected.
- sample_weight: str, default=None
Sample weight column name.
- task_type{‘classification’, ‘regression’}, default=None
Task type, if None, it will be automatically determined according to the target variable.
- split_method: {‘random’, ‘outer-sample’, ‘kmeans’}, default=None
The method of splitting train test samples.
random: random split.
outer-sample: split samples based on euclidean distance of each sample to the data center.
kmeans: Run kmeans, and the test samples are randomly picked from each cluster, with the ratios determined by test_ratio.
- test_ratiofloat or list of float, default=None
Test sample ratio. If None, it will be 0.2.
As split_method=’kmeans’, it is expected to be a list of float values, and its length determines the number of clusters in KMeans. Each element is a float with values ranges from 0 to 1, corresponds to a test sample ratio for the corresponding cluster.
- random_stateint, default=None
Random seed for train / test split. If None, it will be 0.
- train_idxarray-like of shape (n_samples_train,), default=None
If train_idx and test_idx are not None, it will be ignored.
- test_idxarray-like of shape (n_samples_test,), default=None
If train_idx and test_idx are not None, it will be ignored.
- return_databool, default=None
Whether to return data.
- silentbool, default=None
Whether to display data prepare summary or not. If None, it will be False.
- data_quality(method=None, dataset=None, show: unicode = None, threshold: Union[float, list] = None, remove_outliers: bool = None, distance_metric: unicode = None, psi_buckets: unicode = None, show_feature: unicode = None, return_data: bool = None, figsize: tuple = None)¶
Check the data quality and remove outliers.
Note that this method requires to execute data_prepare first.
For data integrity check, it uses the data after excluding the columns in data_summary as inputs. Hence, its result is independent of feature_select. However, some of the checks involves train test sets, and hence the results would be impacted by data_prepare.
For outlier detection, it uses the data after feature select as inputs. Therefore, data_quality and feature_select can be used in turn, and the result would be different if the selected features are different.
- Parameters:
- methodlist of object or a single object
The following outlier detection method object is supported.
‘IsolationForest’: piml.data.outlier_detection.IsolationForest
‘CBLOF’: piml.data.outlier_detection.CBLOF
‘PCA’: piml.data.outlier_detection.PCA
‘KMeansTree’: piml.data.outlier_detection.KMeansTree
‘OneClassSVM’: piml.data.outlier_detection.OneClassSVM
‘KNN’: piml.data.outlier_detection.KNN
‘HBOS’: piml.data.outlier_detection.HBOS
‘ECOD’: piml.data.outlier_detection.ECOD
- dataset{‘all’, ‘train’, ‘test’}, default=None
Specify the dataset for data quality check.
‘all’: Use all samples to check data quality (default choice).
‘train’: Only use training samples to check data quality, available as data_prepare is executed.
‘test’: Only use testing samples to check data quality, available as data_prepare is executed.
- show{‘od_score_distribution’, ‘od_marginal_outlier_distribution’, ‘od_tsne_comparison’,
‘integrity_single_column_check’, ‘integrity_duplicated_samples’, ‘integrity_highly_correlated_features’, ‘drift_test_info’, ‘drift_test_distance’}, default=None
- Data integrity method:
‘integrity_single_column_check’: Get overall single column integrity check result.
‘integrity_duplicated_samples’: Detect duplicated samples.
‘integrity_highly_correlated_features’: Detect highly correlated features.
- Drift test method:
‘drift_test_info’: Get train test size difference and energy distance value.
‘drift_test_distance’: Get train test data distance of each feature.
- Outlier detection method:
‘od_score_distribution’: Show the distribution of outlier scores.
‘od_marginal_outlier_distribution’: Show the outliers marginal distribution.
‘od_tsne_comparison’: Compare outliers detected by different methods under 2d t-SNE space.
- thresholdfloat or list, default=None
The threshold of outlier scores. It decides whether a sample is an outlier. Samples with outlier scores larger than the threshold are classified as outliers.
- remove_outliersbool, default=None
If True, and method is a single outlier detection model, then the detected outliers will be removed. It should only be used when show=’score_distribution’ or ‘marginal_outlier_distribution’.
- distance_metric{‘PSI’, ‘WD1’, ‘KS’}, default=None
The distance metric used when
show
= ‘drift_test_distance’. If None, it will be ‘PSI’.‘PSI’: Population stability index.
‘WD1’: Wasserstein distance (1D).
‘KS’: Kolmogorov-Smirnov test.
- psi_buckets{‘uniform’, ‘quantile’}, default=None
Bucketing strategy for PSI metric in data drift test. If None, it will be ‘uniform’.
- show_featurestr, default=None
Feature for distribution plot. If None, will show distance metric scores of each feature.
- return_databool, default=None
Whether to return data.
- figsizetuple, default=None
Figure size.
- data_summary(feature_type: Optional[Dict] = None, feature_exclude: Optional[List] = None, silent: Optional[bool] = None, return_data: Optional[bool] = None)¶
Summarize basic data statistics.
If no parameter is passed into the method, the program will run in low code mode, otherwise high code mode.
- Parameters:
- feature_typedict, default=None
Feature type for each feature. Available types include ‘categorical’ and ‘numerical’. For example, {‘X0’: ‘numerical’, ‘X1’: ‘categorical’, ‘X2’: ‘categorical’}.
- feature_excludelist, default=None
Features to exclude in training and diagnostics.
- silentbool, default=None
Whether to display data summary or not. If None, it will be False.
- return_databool, default=None
Whether to return data. If None, it will be False.
- eda(show: unicode = None, uni_feature: unicode = None, bi_features: List = None, multi_features: List = None, multi_type: unicode = None, return_data: bool = None, figsize: Tuple = None)¶
Run exploratory data analysis.
If no parameter is passed into the method, the program will run in low code mode, otherwise high code mode.
It uses all the raw data (after excluding the columns in data_summary) as input, and the results would not be impacted by data_prepare, data_quality, and feature_select.
- Parameters:
- show{‘univariate’, ‘bivariate’, ‘multivariate’}, default=None
The plot method.
- uni_featurestr, default=None
Feature name for ‘univariate’ plot, used when
show
= ‘univariate’.- bi_featureslist, default=None
Feature names for ‘bivariate’ plot, used when
show
= ‘bivariate’.- multi_featureslist, default=None
Feature names for ‘multivariate’ plot, used when
show
= ‘multivariate’. If None, then all the features will be used in the correlation plot.- multi_type{‘correlation_heatmap’, ‘correlation_graph’}, default=None
Plot type of ‘multivariate’ correlation plot, used when
show
= ‘multivariate’. If None, it will be ‘correlation_heatmap’.- return_data: bool, default=None
Whether to return the data object. If None, it will be False.
- figsizetuple, default=None
Figure size of the plot. If None, it will be (8, 6).
- feature_select(method: unicode = None, threshold: float = None, corr_algorithm: unicode = None, preset: list = None, kernel_size: int = None, n_forward_phase: int = None, return_data: bool = None, figsize: Tuple = None)¶
Select features that are important for modeling.
If no parameter is passed into the method, the program will run in low code mode, otherwise high code mode.
Note that this method requires to execute data_prepare first. It uses the training data after outlier removal as inputs. Therefore, data_quality and feature_select can be used in turn, and the result would be different if the selected features are different.
- Parameters:
- method{‘cor’, ‘dcor’, ‘pfi’, ‘rcit’}, default=None
The method for feature selection. If None, it will be ‘cor’.
‘cor’: Use Pearson correlation coefficient to select features.
‘dcor’: Use distance correlation coefficient to select features.
‘pfi’: Use permutation feature importance of a surrogate model (XGB) to select features.
‘rcit’: Use randomized conditional independence test to select features.
- thresholdfloat, default=None
The threshold for feature selection, which has different meaning under different method.
- ‘cor’: The absolute threshold of Pearson correlation coefficient.
Features with absolute Pearson correlation coefficient larger than the threshold will be chosen. If None, it will be 0.1.
- ‘dcor’: The threshold for the value of distance correlation coefficient.
Features with distance correlation coefficient larger than the threshold will be chosen. If None, it will be 0.1.
- ‘pfi’: The threshold for accumulated value of normalized permutation feature importance of XGB.
Features with accumulated importance larger than the threshold will be chosen. If None, it will be 0.99.
- ‘rcit’: The threshold for the p-value of RCIT test.
Features are selected in an iterative manner. In a forward iteration: a feature will be selected as the p-value of the RCIT test is smaller than or equal to the threshold, conditional on existing selected features. In a backward iteration: a selected feature will be removed if the corresponding p-value is greater than the threshold. If None, it will be 1e-6.
- corr_algorithm: str, default=None
The algorithm of correlation method.
‘pearson’: Use Pearson correlation.
‘spearman’: Use Spearman rank correlation.
- presetlist, default=None
The initialization of selected feature names; used when
method
= ‘rcit’. If None, it will be an empty list.- kernel_sizeint, default=None
The number of Random Fourier Features used in the conditioning set, used when
method
= ‘rcit’. If None, it will be 100.- n_forward_phaseint, default=None
The number of forward repetition iteration of forward backward feature selection with early stopping (FBEDk) algorithm. Used when
method
= ‘rcit’. If None, it will be 2.- return_data: bool, default=None
Whether to return the data object. If None, it will be False.
- figsizetuple, default=None
Figure size of the plot. If None, it will be (8, 6).
- get_data(x=None, y=None, sample_weight=None, train=False, test=False)¶
Get the preprocessed train-test data.
The train_x and test_x only include selected features.
- Parameters:
- xndarray of shape (n_samples, n_features), default=None
Selected features. If None, will use the default data in the workflow.
- yndarray of shape (n_samples, ), default=None
Response. If None, will use the default data in the workflow.
- sample_weightndarray of shape (n_samples, ), default=None
The sample weights. If None, will use the default data in the workflow.
- trainbool, default=False
Whether to return training data only. Not available if x, y, or sample_weight is specified.
- testbool, default=False
Whether to return testing data only. Not available if x, y, or sample_weight is specified.
- get_feature_names()¶
Get the input feature names.
- Returns:
- feature_names: list of str
Feature names.
- get_feature_types()¶
Get the data type of each input feature.
- Returns:
- feature_types: list of str
Feature types.
- get_interpretable_model_list()¶
Get the list of names of all registered interpretable models.
- Returns:
- List of str
The list of all registered pipelines names.
- get_leaderboard(metric=None)¶
Show the performance comparison table of all trained models.
Note that only models trained using exp.model_train api are considered here. For models trained outside the piml workflow, use e.g., exp.model_diagnose(model=”Model_Name”, show=”accuracy_table”) instead.
- Parameters:
- metric{‘MSE’, ‘MAE’, ‘R2’, ‘ACC’, ‘AUC’, ‘F1’, ‘LogLoss’, ‘Brier’}, default=None
For classification tasks: ‘ACC’, ‘AUC’, ‘F1’, ‘LogLoss’, ‘Brier’.
For regression tasks: ‘MSE’, ‘MAE’, ‘R2’.
- get_leaderboard_registered(metric=None)¶
Show the performance comparison table of all registered models.
Note that only models trained using exp.model_train api are considered here. For models trained outside the piml workflow, use exp.model_diagnose(model=”Model_Name”, show=”accuracy_table”) instead.
- Parameters:
- metric{‘MSE’, ‘MAE’, ‘R2’, ‘ACC’, ‘AUC’, ‘F1’, ‘LogLoss’, ‘Brier’}, default=None
For classification tasks: ‘ACC’, ‘AUC’, ‘F1’, ‘LogLoss’, ‘Brier’.
For regression tasks: ‘MSE’, ‘MAE’, ‘R2’.
- get_model(model)¶
Get a registered pipeline.
- Parameters:
- modelstr
The registered pipeline name.
- Returns:
- ModelPipeline
A pipeline containing raw data, data preprocessing, and an estimator.
- get_model_config(model)¶
Get the configuration of a model.
- Parameters:
- modelstr
Model’s registered name.
- get_model_list()¶
Get the list of names of all registered models.
- Returns:
- List of str
The list of all registered pipelines names.
- get_raw_data()¶
Get the raw train-test data.
The train_x and test_x only include selected features.
- Returns:
- dataDataTuple
It includes (train_x, train_y, train_sample_weight, test_x, test_y, test_sample_weight, feature_names, target_name).
- get_target_name()¶
Get the target feature name.
- Returns:
- target_names: str
Target name.
- make_pipeline(model=None, task_type=None, predict_func=None, predict_proba_func=None, fit_func=None, train_x=None, train_y=None, test_x=None, test_y=None, train_sample_weight=None, test_sample_weight=None, excluded_features=None, feature_names=None, feature_types=None, target_name=None, normalize_strategy=None, encode_strategy=None)¶
Customize a pipeline.
- Parameters:
- modelEstimator Object or pickle file path
An estimator which should follow the sklearn style.
- task_type{‘regression’, ‘classification’}, default=None
The task type. If None, it will be automatically determined using the data type of train_y and test_y. Would raise an error if the task type does not match the model type
- predict_funccallable function, default=None
The predict function that receives X (numpy array) as input and output the predictions (numpy array). Only Used as model is None.
- predict_proba_funccallable function, default=None
The predict_proba function that receives X (numpy array) as input and output the predicted probability of each class (numpy array of shape n * 2). Only Used as model is None and task_type is ‘classification’.
- fit_funccallable function, default=None
The fit function that receives X, y, sample_weight as inputs and refit the model. Only Used as model is None, and it is just optional.
- encode_strategy{‘ordinal’, ‘one_hot’}, default=None
The encoding strategy names. If None, no encoding is performed.
- task_type{‘regression’, ‘classification’}, default=None
The task type, used as the model does not have the attribute “_estimator_type”.
- train_xndarray of shape (n_samples_train, n_features), default=None
The training data for the estimator. Use the default data in Experiment if None.
- train_yndarray of shape (n_samples_train, ), default=None
The training target for the estimator. Use the default data in Experiment if None.
- test_xndarray of shape (n_samples_test, n_features), default=None
The testing data for the estimator. Use the default data in Experiment if None.
- test_yndarray of shape (n_samples_test, ), default=None
The testing target for the estimator. Use the default data in Experiment if None.
- train_sample_weightndarray of shape (n_samples_test, ), default=None
The testing sample_weight for the estimator. Use the default data in Experiment if None.
- test_sample_weightndarray of shape (n_samples_test, ), default=None
The testing sample_weight for the estimator. Use the default data in Experiment if None.
- excluded_featureslist, default=None
Feature names to exclude for the model.
- feature_nameslist, default=None
Feature names.
- feature_types: list, default=None
Feature types, can be ‘numerical’ or ‘categorical’; if None or empty, the type of each feature will be determined by several samples
- target_namestr, default=None
Target name.
- encode_strategy{‘ordinal’, ‘one_hot’}, default=None
The encoding strategy names. If None, no encoding is performed.
- normalize_strategy{‘minmax’, ‘unit_norm’}, default=None
The normalization strategy names. If None, no normalization is performed.
- Returns:
- Pipeline object
A pipeline containing raw data, data preprocessing, and an estimator.
- model_compare(models: List[str] = None, show: unicode = None, metric: unicode = None, immu_feature: unicode = None, perturb_features: Union[str, List[str]] = None, perturb_method: unicode = None, resilience_method: unicode = None, perturb_size: float = None, psi_buckets: unicode = None, distance_metric: unicode = None, min_samples: int = None, alpha: float = None, n_clusters: int = None, bins: int = None, slice_feature: unicode = None, slice_method: unicode = None, original_scale: bool = None, return_data: bool = None, figsize: Tuple[int, int] = None)¶
Compare the diagnostic results of multiple models.
If no parameter is passed into the method, the program will run in low code mode, otherwise high code mode.
- Parameters:
- modelslist, default=None
Models names, up to 3 models.
- showstr, default=None
Supported diagnostic methods for model comparison:
Accuracy
‘accuracy_plot’: evaluate the model performance. (params:
metric
)
Overfit
‘overfit’: compare overfit performance between models. (params:
slice_method
,
slice_feature
,bins
,metric
)Reliability
‘reliability_bandwidth’: compare reliability bandwidth. (params:
alpha
)‘reliability_coverage’: only supports Regressors; compare reliability coverage. (params:
alpha
)
Robustness
‘robustness_perf’: compare robustness performance. (params:
perturb_method
,
perturb_features
,perturb_size
,metric
)‘robustness_perf_worst’: compare robustness performance between models, based on worst sample.
(params:
perturb_method
,perturb_features
,perturb_size
,metric
,alpha
)Resilience
‘resilience_perf’: compare resilience performance. (params:
resilience_method
,
alpha
,n_clusters
,immu_feature
,metric
)‘resilience_distance’: compare resilience distribution distance. (params:
resilience_method
,
alpha
,n_clusters
,immu_feature
,distance_metric
)
- metric{‘MSE’, ‘MAE’, ‘R2’, ‘ACC’, ‘AUC’, ‘F1’, ‘LogLoss’, ‘Brier’}, default=None
Performance metric, used when
show
= ‘accuracy_plot’, ‘overfit’, ‘robustness_perf’, ‘robustness_perf_worst’, or ‘resilience_perf’.For classification tasks: ‘ACC’, ‘AUC’, ‘F1’, ‘LogLoss’, ‘Brier’.
For regression tasks: ‘MSE’, ‘MAE’, ‘R2’.
- immu_featurestr, default=None
The name of immutable feature, used when
show
= ‘resilience_perf’ or ‘resilience_distance’. If None, it will be an empty list.- perturb_featuresstr or list of str, default=None
The feature or features list to perturb in Robustness test, used when
show
= ‘robustness_perf’ or ‘robustness_perf_worst’. If None, it will be the list of all X.- perturb_method{‘raw’, ‘quantile’}, default=None
Perturbation method for Robustness test, used when
show
= ‘robustness_perf’ or ‘robustness_perf_worst’. If None, it will be ‘raw’.‘raw’: add normal noise directly on X for perturbation;
‘quantile’: add uniform noise on quantile space of X.
- perturb_sizefloat, default=None
The perturbation strength in Robustness test, used when
show
= ‘robustness_perf’ or ‘robustness_perf_worst’. If None, it will be 0.1.For numeric features:
When
perturb_method
= ‘raw’: the standard deviation of the noise,
i.e., noise std =
perturb_size
* std(X).When
perturb_method
= ‘quantile’: the range of the uniform distribution of noise,
i.e., noise range = [-0.5 *
perturb_size
, 0.5 *perturb_size
].For categorical features, the perturbation probability of categorical feature.
- resilience_method{‘worst-sample’, ‘hard-sample’, ‘outer-sample’, ‘worst-cluster’}, default=None
The method used for selecting worst samples, used when
show
= ‘resilience_perf’ or ‘resilience_distance’. If None, it will be ‘worst-sample’.‘worst-sample’: Select the worst samples according to the loss of each sample;
‘hard-sample’: Use a deep XGB model to distinguish hard and easy samples;
‘outer-sample’: Select the worst samples according to the distance of each sample to the center;
‘worst-cluster’: Fit a K-means model, and select the worst performing cluster as the worst sample.
- distance_metric{‘PSI’, ‘WD1’, ‘KS’}, default=None
The distance metric used when
show
= ‘resilience_distance’. If None, it will be ‘PSI’.‘PSI’: Population Stability Index.
‘WD1’: Wasserstein distance-1D.
‘KS’: Kolmogorov-Smirnov.
- psi_buckets{‘uniform’, ‘quantile’}, default=None
Bucketing strategy for PSI metric. If None, it will be ‘uniform’.
- min_samplesint, default=None
The min samples allow in each overfit region.
- alphafloat, default=None
This parameter has different meanings for different diagnostics, used when
show
= ‘reliability_bandwidth’, or ‘reliability_coverage’, ‘robustness_perf_worst’, ‘resilience_perf’, or ‘resilience_distance’. If None, it will be 0.1.For Resilience test: decides the ratio for worst sampling, can only be 0.1, 0.2, …, 1;
For Reliability test: the error rate in Split Conformal Prediction (only for regression tasks).
- n_clustersint, default=None
The number of clusters in K-means used when
resilience_method
= ‘worst-cluster’ andshow
= ‘resilience_perf’ or ‘resilience_distance’. If None, it will be 10.- binsint, default=None
The number of bins, used when
show
= ‘overfit’. If None, it will be 10.- slice_featurestr, default=None
Decide the feature of interest in the plot (usually the x-axis), used when
show
= ‘overfit’.- slice_method{‘histogram’, ‘tree’}, default=None
The slicing method for Overfit test, used when
show
= ‘overfit’. If None, it will be ‘histogram’.‘histogram’: default, use equal-space binning;
‘tree’: fit a decision tree to generate regions, not applicable when
show
= ‘overfit’.
- original_scale: bool, default=None
To use the original scale of X in the plots. If None, it will be False.
- return_data: bool, default=None
Whether to return the data object. If None, it will be False.
- figsizetuple, default=None
Figure size of the plot, If None, it will be (8, 6).
- model_diagnose(model: unicode = None, show: unicode = None, metric: unicode = None, perturb_size: float = None, perturb_features: Union[str, List[str]] = None, perturb_method: unicode = None, bins: int = None, resilience_method: unicode = None, alpha: float = None, n_clusters: int = None, slice_features: Union[str, List[str]] = None, slice_method: unicode = None, threshold: float = None, min_samples: int = None, use_test: bool = None, psi_buckets: unicode = None, immu_feature: unicode = None, show_feature: unicode = None, distance_metric: unicode = None, original_scale: bool = None, return_data: bool = None, figsize: Tuple[int, int] = None)¶
Test model performance using various diagnostic tools.
If no parameter is passed into the method, the program will run in low code mode, otherwise high code mode.
- Parameters:
- modelstr, default=None
Model’s registered name.
- show{‘accuracy_residual’, ‘accuracy_plot’, ‘accuracy_table’, ‘overfit’, ‘weakspot’, ‘robustness_perf’,
- ‘robustness_perf_worst’, ‘resilience_perf’, ‘resilience_distance’, ‘resilience_shift_density’,
- ‘resilience_shift_histogram’, ‘reliability_distance’, ‘reliability_marginal’, ‘reliability_table’,
- ‘reliability_perf’, ‘reliability_calibration’}, default=None
All the supported model diagnostics methods.
Accuracy
‘accuracy_residual’: plot residual with respect to a variable. (params:
show_feature
)‘accuracy_plot’: only supports Classifiers; plot confusion matrix, ROC and Recall-Precision.
‘accuracy_table’: show train and test performance via a table.
WeakSpot
‘weakspot’: show detected weakspot regions. (params:
slice_method
,slice_features
,bins
,
metric
,threshold
,min_samples
,use_test
)Overfit
‘overfit’: identify overfitting regions with high testing - training error gaps. (params:
slice_method
,slice_features
,bins
,metric
,threshold
,min_samples
)Reliability
‘reliability_table’: show empirical coverage and average bandwidth for regression or Brier Loss
for classification. (params:
alpha
)‘reliability_distance’: calculate distribution shift distance of features between reliable and
unreliable data. (params:
alpha
,threshold
,distance_metric
)‘reliability_marginal’: plot the histogram of bandwidth against a given feature. (params:
alpha
,
show_feature
,bins
,threshold
)‘reliability_perf’: only for classifiers; reliability diagram.
‘reliability_calibration’: only for classifiers; show the calibrated predicted probability vs.
original predicted probability.
Robustness
‘robustness_perf’: performance against perturbation size. (params:
perturb_method
,
perturb_features
,perturb_size
,metric
)‘robustness_perf_worst’: performance against perturbation size based on worst sample. (params:
perturb_method
,perturb_features
,perturb_size
,metric
,alpha
)Resilience
‘resilience_perf’: performance against worst sample ratio. (params:
resilience_method
,
alpha
,metric
,immu_feature
)‘resilience_distance’: calculate distribution shift distance of features between worst sample and
full dataset. (params:
resilience_method
,alpha
,immu_feature
,distance_metric
)‘resilience_shift_density’: compare distribution between worst sample and full dataset with
density plot. (params:
resilience_method
,alpha
,immu_feature
,show_feature
)‘resilience_shift_histogram’: compare distribution between worst sample and full dataset with
shift plot. (params:
resilience_method
,alpha
,immu_feature
,show_feature
)
- metric{‘MSE’, ‘MAE’, ‘R2’, ‘ACC’, ‘AUC’, ‘F1’, ‘LogLoss’, ‘Brier’}, default=None
Performance metric, used when
show
= ‘weakspot’, ‘overfit’, ‘robustness_perf’, ‘robustness_perf_worst’, ‘resilience_perf’, ‘resilience_distance’, ‘resilience_shift_density’ or ‘resilience_shift_histogram’.For classification tasks: ‘ACC’, ‘AUC’, ‘F1’, ‘LogLoss’, ‘Brier’.
For regression tasks: ‘MSE’, ‘MAE’, ‘R2’.
- slice_featureslist, default=None
List of slicing features (at most 2) for Weakspot and Overfit tests, used when
show
= ‘weakspot’ or ‘overfit’.- slice_method{‘histogram’, ‘tree’}, default=None
The slicing method for WeakSpot and Overfit tests, used when
show
=’ weakspot’ or ‘overfit’. If None, it will be ‘histogram’.‘histogram’: default, use equal-space binning;
‘tree’: fit a decision tree to generate regions, not applicable when
show
= ‘overfit’.
- perturb_featuresstr or list of str, default=None
The feature or features list to perturb in Robustness test, used when
show
= ‘robustness_perf’ or ‘robustness_perf_worst’. If None, it will be the list of all X.- perturb_method{‘raw’, ‘quantile’}, default=None
Perturbation method for Robustness test, used when
show
= ‘robustness_perf’ or ‘robustness_perf_worst’. If None, it will be ‘raw’.‘raw’: add normal noise directly on X for perturbation;
‘quantile’: add uniform noise on quantile space of X.
- perturb_sizefloat, default=None
The perturbation strength in Robustness test, used when
show
= ‘robustness_perf’ or ‘robustness_perf_worst’. If None, it will be 0.1.For numeric features:
When
perturb_method
= ‘raw’: the standard deviation of the noise, i.e.,
noise std =
perturb_size
* std(X). - Whenperturb_method
= ‘quantile’: the range of the uniform distribution of noise, i.e., noise range = [-0.5 *perturb_size
, 0.5 *perturb_size
].For categorical features, the perturbation probability of categorical feature.
- resilience_method{‘worst-sample’, ‘hard-sample’, ‘outer-sample’, ‘worst-cluster’}, default=None
The method used for selecting worst samples, used when
show
= ‘resilience_perf’, ‘resilience_distance’, ‘resilience_shift_density’, or ‘resilience_shift_histogram’. If None, it will be ‘worst-sample’.‘worst-sample’: Select the worst samples according to the loss of each sample;
‘hard-sample’: Use a deep XGB model to distinguish hard and easy samples;
‘outer-sample’: Select the worst samples according to the distance of each sample to the center;
‘worst-cluster’: Fit a K-means model, and select the worst performing cluster as the worst sample.
- alphafloat, default=None
This parameter has different meanings for different diagnostics, used when
show
= ‘resilience_perf’, ‘resilience_distance’, ‘resilience_shift_density’, ‘resilience_shift_histogram’, ‘reliability_distance’, ‘reliability_marginal’ or ‘reliability_table’. If None, it will be 0.1.For Resilience test: determines the ratio for worst sampling, which can only be 0.1, 0.2, …, 0.9;
For Reliability test: the error rate in Split Conformal Prediction (only for regression tasks).
- n_clustersint, default=None
The number of clusters in K-means used when
resilience_method
= ‘worst-cluster’ andshow
= ‘resilience_shift_density’, ‘resilience_shift_histogram’ or ‘resilience_distance’. If None, it will be 10.- binsint, default=None
The number of bins, used when
show
= ‘weakspot’, ‘overfit’, ‘reliability_perf’, or ‘reliability_marginal’. If None, it will be 10.- thresholdfloat, default=None
This parameter has different meanings for different diagnostics, used when
show
= ‘weakspot’,’overfit’,’reliability_distance’, or ‘reliability_marginal’. If None, it will be 1.1.For Overfit test: decides minimal error gap for an overfit region;
For Weakspot test: decides minimal error gap for a weak region;
For Reliability test: the threshold to determine whether data are inside or outside coverage.
- min_samplesint, default=None
The minimal sample size for selected regions, used when
show
= ‘weakspot’ or ‘overfit’. If None, it will be 20.- use_testbool, default=None
Whether to use test data or not, used when
show
= ‘weakspot’. If None, it will be False.- immu_featurestr, default=None
The name of immutable feature, used when
show
= ‘resilience_perf’, ‘resilience_distance’, ‘resilience_shift_density’, or ‘resilience_shift_histogram’. If None, it will be an empty list.- show_featurestr, default=None
The feature of interest in the plot (usually the x-axis). It is used when
show
= ‘accuracy_residual’, ‘reliability_marginal’, ‘resilience_shift_density’, or ‘resilience_shift_histogram’.- distance_metric{‘PSI’, ‘WD1’, ‘KS’}, default=None
The distance metric used when
show
= ‘reliability_distance’ or ‘resilience_distance’. If None, it will be ‘PSI’.‘PSI’: Population stability index.
‘WD1’: Wasserstein distance (1D).
‘KS’: Kolmogorov-Smirnov test.
- psi_buckets{‘uniform’, ‘quantile’}, default=None
Bucketing strategy for PSI metric. If None, it will be ‘uniform’.
- original_scale: bool, default=None
To use the original scale of X in the plots. If None, it will be False.
- return_data: bool, default=None
Whether to return the data object. If None, it will be False.
- figsizetuple, default=None
Figure size of the plot, If None, it will be (8, 6).
- Returns:
- TestResult
Diagnose result, only available as
return_data
= True.
- model_explain(model: unicode = None, show: unicode = None, uni_feature: unicode = None, bi_features: List[str] = None, sample_id: int = None, sample_size: int = None, n_repeats: int = None, return_data: bool = None, grid_size: int = None, response_method: unicode = None, sliced_line: bool = None, centered: bool = None, use_test: bool = None, original_scale: bool = None, figsize: Tuple[int, int] = None)¶
Explain an arbitrary fitted model using post-hoc explanation tools.
If no parameter is passed into the method, the program will run in low code mode, otherwise high code mode.
- Parameters:
- modelstr, default=None
Model’s registered name.
- show{‘pfi’, ‘pdp’, ‘ice’, ‘ale’, ‘shap_summary’, ‘shap_fi’, ‘shap_scatter’, ‘shap_waterfall’,
‘lime’},default=None
Supported global and local explanation methods:
‘pfi’: Permutation feature importance (PFI).
‘pdp’: Partial dependence plot (PDP).
‘ice’: Individual conditional effect (ICE).
‘hstats’: H-statistic.
‘ale’: Accumulated local effects (ALE).
‘shap_waterfall’: SHAP for single data point.
‘lime’: LIME for single data point.
‘shap_summary’: SHAP summary.
‘shap_fi’: SHAP feature importance.
‘shap_scatter’: SHAP scatter plot.
- uni_featurestr, default=None
Feature name of the univariate plot, used when
show
= ‘pdp’, ‘ice’, ‘ale’, or ‘shap_scatter’.- bi_featureslist of str, default=None
Feature names of the bivariate plot, used when
show
= ‘pdp’ or ‘ale’.- sample_idint, default=None
The index of single data point for local explanation, used when
show
= ‘lime’ or ‘shap_waterfall’. If None, it will be 0.- sample_sizeint, default=None
The subsample size for generating some computational expensive explanations, used when
show
= ‘pdp’, ‘hstats’, ‘ice’, ‘shap_fi’, ‘shap_summary’, and ‘shap_scatter’. If None, it will be 500 for ‘shap_fi’, ‘shap_summary’, and ‘shap_scatter’; 2000 for ‘pdp’, ‘ice’, and ‘hstats’.- grid_sizeint, default=None
The grid size for generating plots, used when
show
= ‘pdp’, ‘hstats’, and ‘ale’. If None, it will be 100 for 1D PDP, ICE, and ALE; 10 for 2D PDP, ALE and H-statistic.- response_method{‘auto’, ‘predict_proba’, ‘decision_function’}, default=’auto’
Specifies whether to use predict_proba or decision_function as the target response as
show
= ‘pdp’, ‘ice’, ‘hstats’, and ‘ale’. For regressors this parameter is ignored and the response is always the output of predict. By default, predict_proba is tried first, and we revert to decision_function if it does not exist.- n_repeatsint, default=None
The number of repetition in the permutation feature importance test, used when
show
= ‘pfi’. If None, it will be 1.- return_data: bool, default=None
Whether to return the data object. If None, it will be False.
- sliced_linebool, default=None
Whether to show sliced line for 2d explain instead of heatmap, used when
show
= ‘pdp’ or ‘ale’. If None, it will be False.- centeredbool, default=None
Whether to center X (subtract the average) for generating local explanations, used when
show
= ‘lime’ or ‘local_fi’. If None, it will be True.- use_testbool default=None
Whether to use test set to do the explanation. If True, the test set will be used. Otherwise, the training set will be used. If None, it will be False.
- original_scale: bool, default=None
Whether to use original scale of X, used when
show
= ‘pdp’, ‘ice’, or ‘ale’. If None, it will be False.- figsizetuple, default=None
Figure size of the plot. If None, it will be (8, 6).
- Returns:
- TestResult
Diagnose result, only available as
return_data
= True.
- model_fairness(model: unicode = None, show: unicode = None, metric: unicode = None, metric_threshold: float = None, favorable_class: int = None, favorable_threshold: float = None, group_category: List = None, reference_group: List = None, protected_group: List = None, thresholding_bins: int = None, segment_feature: unicode = None, segment_bins: int = None, performance_metric: unicode = None, distance_metric: unicode = None, binning_dict: Dict = None, by_weights: List = None, return_data: bool = None, figsize: Tuple[int, int] = None)¶
Test model fairness.
- Parameters:
- modelstr, default=None
Model’s registered name.
- show{‘metrics’, ‘segmented’, ‘thresholding’, ‘binning’}, default=None
The supported methods in the fairness module.
‘metrics’: show fairness metric results.
‘segmented’: show segmented fairness metric results.
‘thresholding’: show fairness-accuracy trade-off results plot with different outcome cut-offs
(classification). - ‘binning’: show fairness-accuracy trade-off results plot with different feature binning.
- metric{‘AIR’, ‘SMD’, ‘Precision’, ‘Recall’}, default=None
The metric for fairness testing.
‘AIR’: Adverse impact ratio.
‘SMD’: Standardized mean difference.
‘Precision’: Positive predictive value disparity ratio.
‘Recall’: True positive rate disparity ratio.
- favorable_classint, default=None
Favorable target class for AIR.
- favorable_thresholdfloat, default=None
Favorable threshold to binarize the predicted outcomes (predict_proba for classifiers). Not available for regressor.
- group_categorylist, default=None
Feature list of each group.
- reference_grouplist, default=None
Reference group list of category names.
- protected_grouplist, default=None
Protected group list of category names.
- metric_thresholdfloat, default=None
Threshold for the fairness metric. (the dotted line in the plots)
- thresholding_binsint, default=None
The number of bins of numerical features as the
show
= ‘binning’.- segment_featurestr, default=None
The segment feature’s name as show=’segmented’.
- segment_binsint, default=None
The number of bins of numerical features as the
show
= ‘segmented’.- performance_metric{‘MSE’, ‘MAE’, ‘R2’, ‘ACC’, ‘AUC’, ‘F1’, ‘LogLoss’, ‘Brier’}, default=None
For classification tasks: ‘ACC’, ‘AUC’, ‘F1’, ‘LogLoss’, ‘Brier’.
For regression tasks: ‘MSE’, ‘MAE’, ‘R2’.
- distance_metric{‘PSI’, ‘WD1’, ‘KS’}, default=None
The distance metric name for data comparing.
‘PSI’: Population stability index.
‘WD1’: Wasserstein distance (1D).
‘KS’: Kolmogorov-Smirnov test.
- binning_dictlist, default=None
The binning list for each binning config.
For example: the following list defines two configurations.
binning_dict = {'Balance': {'type': 'uniform', 'value': (1, 5), 'Mortgage': {'type': 'quantile', 'value': (2, 5)}, 'Utilization': {'type': 'custom', 'value': (0.1, 0.2)}}
- by_weights: list, default=None
The index list of weighted groups.
- return_data: bool, default=None
Whether to return the data object. If None, it will be False.
- figsizetuple, default=None
Figure size of the plot, if None, the figure size will be (8, 6).
- model_fairness_compare(models: list = None, show: unicode = None, metric: unicode = None, favorable_class: int = None, favorable_threshold: float = None, group_category: List = None, reference_group: List = None, protected_group: List = None, metric_threshold: float = None, segment_feature: unicode = None, segment_bins: int = None, by_weights: List = None, return_data: bool = None, figsize: Tuple[int, int] = None)¶
Compare the fairness results of multiple models.
- Parameters:
- modelslist, default=None
List of model’s registered names.
- show{‘metrics’, ‘segmented’}, default=None
The fairness result method.
‘metrics’: show fairness metric results.
‘segmented’: show segmented fairness metric results.
- metric{‘AIR’, ‘SMD’, ‘Precision’, ‘Recall’}, default=None
The metric for fairness testing.
‘AIR’: Adverse impact ratio.
‘SMD’: Standardized mean difference.
‘Precision’: Positive predictive value disparity ratio.
‘Recall’: True positive rate disparity ratio.
- favorable_classint, default=None
Favorable target class for AIR.
- favorable_thresholdfloat, default=None
Favorable threshold for binarizing the predicted outcomes (predict_proba for classifiers).
- group_categorylist, default=None
Feature list of each group.
- reference_grouplist, default=None
Reference group list of category names.
- protected_grouplist, default=None
Protected group list of category names.
- metric_thresholdfloat, default=None
Threshold for the fairness metric. (the dotted line in the plots)
- segment_featurestr, default=None
The segment feature’s name as
show
= ‘segmented’.- segment_binsint, default=None
The number of bins of numerical features as the
show
= ‘segmented’.- by_weights: list, default=None
The index list of weighted groups.
- return_data: bool, default=None
Whether to return the data object. If None, it will be False.
- figsizetuple, default=None
Figure size of the plot. If None, it will be (8, 6).
- model_fairness_solas(model: unicode = None, show: unicode = None, metric: unicode = None, favorable_class: int = None, favorable_threshold: float = None, group_category: List = None, reference_group: List = None, protected_group: List = None, metric_threshold: float = None, segment_feature: unicode = None, segment_bins: int = None, by_weights: List = None, figsize: tuple = None)¶
Test model fairness based on solas-ai.
- Parameters:
- modelstr, default=None
Model’s registered name.
- show{‘metrics’, ‘segmented’}, default=None
The supported methods in the fairness module.
‘metrics’: show fairness metric result.
‘segmented’: show segmented fairness metric result.
- metric{‘AIR’, ‘SMD’, ‘RSMD’, ‘Odds Ratio’}, default=None
The metric for fairness testing.
‘AIR’: Adverse impact ratio.
‘SMD’: Standardized mean difference.
‘RSMD’: Residual standardized mean difference.
‘Odds Ratio’: Odds ratio.
- favorable_classint, default=None
Favorable target class for AIR.
- favorable_thresholdfloat, default=None
Favorable threshold for binarizing the predicted outcomes (predict_proba for classifiers).
- segment_featurestr, default=None
Segmented feature name for segmented AIR.
- segment_binsint, default=None
The number of bins of numerical features as the show=’segmented’.
- group_categorylist, default=None
Category list of group.
- reference_grouplist, default=None
Reference group list.
- protected_grouplist, default=None
Protected group list.
- metric_thresholdfloat, default=None
Threshold for the fairness metric. (the dotted line in the plots)
- metric{‘MSE’, ‘MAE’, ‘R2’, ‘ACC’, ‘AUC’, ‘F1’, ‘LogLoss’, ‘Brier’}, default=None
For classification tasks: ‘ACC’, ‘AUC’, ‘F1’, ‘LogLoss’, ‘Brier’.
For regression tasks: ‘MSE’, ‘MAE’, ‘R2’.
- by_weights: list, default=None
The index list of weighted groups.
- figsizetuple, default=None
Figure size of the plot. If None, it will be (8, 6).
- model_interpret(model: unicode = None, show: unicode = None, uni_feature: unicode = None, bi_features: List[str] = None, sample_id: int = None, tree_idx: int = None, root: int = None, depth: int = None, sliced_line: bool = None, centered: bool = None, use_test: bool = None, original_scale: bool = None, return_data: bool = None, figsize: Tuple[int, int] = None)¶
Interpret inherently interpretable models.
If no parameter is passed into the method, the program will run in low code mode, otherwise high code mode.
- Parameters:
- modelstr, default=None
Model’s registered name.
- show{‘global_fi’, ‘local_fi’, ‘global_ei’, ‘local_ei’, ‘global_effect_plot’, ‘glm_coef_plot’,
‘glm_coef_table’, ‘tree_global’, ‘tree_local’, ‘figs_heatmap’, ‘llm_summary’, ‘llm_violin’, ‘llm_pc’}, default=None
Each inherently interpretable model supports different interpretation methods (corresponding to different
show
parameter values):| Interpret Method | GLM | GAM | GAMI-NET | Tree | FIGS | XGB1 | XGB2 | EBM | ReLU-DNN | Description | | -------------------- | --- | --- | -------- | ---- | ---- | ---- | --- | --- | -------- | -----------------------------------------------------| | "global_fi" | X | X | X | | | X | X | X | X | Global Feature Importance | | "local_fi" | X | X | X | | | X | X | X | X | Local Feature Importance | | "global_ei" | | | X | | | | X | X | | Global Effect Importance | | "local_ei" | | | X | | | | X | X | | Local Effect Importance | | "global_effect_plot" | | X | X | | | X | X | X | X | Global Effect Plot | | "glm_coef_plot" | X | | | | | | | | | Global Model Coefficients Plot for GLM models | | "glm_coef_table" | X | | | | | | | | | Global Model Coefficients Table for GLM models | | "tree_global" | | | | X | X | | | | | Global Interpretation Plot for Tree models | | "tree_local" | | | | X | X | | | | | Local Interpretation Plot for Tree models | | "figs_heatmap" | | | | | X | | | | | Feature Importance Heatmap Plot for FIGS models | | "llm_summary" | | | | | | | | | X | Local Linear Model (LLM) summary plot | | "llm_violin" | | | | | | | | | X | Local Linear Model (LLM) violin plot | | "llm_pc" | | | | | | | | | X | Local Linear Model (LLM) parallel coordinate plot | | "xgb1_iv" | | | | | | X | | | X | Optimal binning Information Value plot | | "xgb1_woe" | | | | | | X | | | X | Optimal binning Weight of Evidence plot |
- uni_featurestr, default=None
Feature name of the univariate plot, used when
show
= ‘global_effect_plot’ or ‘glm_coef_plot’, and model is within {GAM, EBM, XGB2, GAMI-Net, ReLU-DNN}.As
show
= ‘glm_coef_plot’,uni_feature
is used in the following way.if None: plot for all the numeric features
if “[CATEGORICAL_FEATURE_NAME]”: plot for a specific categorical feature named
“[CATEGORICAL_FEATURE_NAME]”
- bi_featureslist of str, default=None
Feature names of the bivariate plot, used when
show
= ‘global_effect_plot’, and model is within {EBM, XGB2, GAMI-Net, ReLU-DNN}.- sample_idint, default=None
The index of single data point for local interpretation, used when
show
= ‘local_fi’, ‘local_ei’ or ‘tree_local’. If None, it will be 0.- tree_idxint, default=None
Useless for this function.
- rootint, default=None
The node to start drawing the tree diagram, used when
show
= ‘tree_global’. If None, it will be 0.- depthint, default=None
The the max depth to show of the tree diagram (counts start from the
root
node), used whenshow
= ‘tree_global’. If None, it will be 3.- sliced_linebool, default=None
Whether to show sliced line for 2d interpret instead of heatmap, used when
show
= ‘global_effect_plot’. If None, it will be False.- centeredbool, default=None
Whether to center X (subtract the average) for generating local explanations, used when
show
= ‘local_fi’ or ‘local_ei’. If None, it will be True.- use_testbool default=None
Whether to use test set to do the interpretation. If True, the test set will be used. Otherwise, the training set will be used. If None, it will be False.
- original_scale: bool, default=None
Whether to use original scale of X, used when
show
= ‘local_fi’, ‘local_ei’, ‘global_effect_plot’, or ‘tree_global’, ‘tree_local’. If None, it will be False.- return_data: bool, default=None
Whether to return the data object. If None, it will be False.
- figsizetuple, default=None
Figure size of the plot. If None, it will be (8, 6).
- Returns:
- TestResult
Diagnose result, only available as
return_data
= True.
- model_save(model: unicode, path: unicode = 'SavedModel.pkl')¶
Save a PiML-trained model as a pickle file.
- Parameters:
- modelstr
Model’s registered name.
- pathstr
File path to save.
- model_train(model=None, name: unicode = None)¶
Fit interpretable models.
If no parameter is passed into the method, the program will run in low code mode, otherwise high code mode.
- Parameters:
- modelestimator object, default=None
Estimator to be fitted, which should follow the sklearn style. Below is the list of built-in inherently interpretable models.
| Model | PiML Regressor Class | PiML Classifier Class | | ------------------------------------------ | ------------------------------------------ | ------------------------------------------- | | Generalized Linear Model (GLM) | `piml.models.GLMRegressor` | `piml.models.GLMClassifier` | | Generalized Additive Model (GAM) | `piml.models.GAMRegressor` | `piml.models.GAMClassifier` | | Decision Tree (Tree) | `piml.models.TreeRegressor` | `piml.models.TreeClassifier` | | Fast Interpretable Greedy-Tree Sums (FIGS) | `piml.models.FIGSRegressor` | `piml.models.FIGSClassifier` | | Explainable Boosting Machine (EBM) | `piml.models.ExplainableBoostingRegressor` | `piml.models.ExplainableBoostingClassifier` | | Xgboost Depth 1 (XGB1) | `piml.models.XGB1Regressor` | `piml.models.XGB1Classifier` | | Xgboost Depth 2 (XGB2) | `piml.models.XGB2Regressor` | `piml.models.XGB2Classifier` | | GAMI-Net | `piml.models.GAMINetRegressor` | `piml.models.GAMINetClassifier` | | ReLU-DNN | `piml.models.ReluDNNRegressor` | `piml.models.ReluDNNClassifier` |
- namestr, default=None
Model’s name for registration.
Notes
You can use a method to get all directly supported models in PiML with:
from piml.models import get_all_supported_models get_all_supported_models()
- model_tune(model=None, method=None, parameters=None, n_runs=None, cv=None, test_ratio=None, metric=None, n_jobs=None, random_state=None)¶
Refit a model with new parameters.
- Parameters:
- modelstr
Model’s registered name to tune.
- method: {‘grid’, ‘randomized’}
Tuning method, support two kind of methods:
grid: Exhaustive search over specified parameter values for an estimator.
randomized: Randomized search on hyperparameters.
- parametersdict
Parameter search space for tuning. For example,
grid: {‘n_estimators’: [100, 300, 500], ‘max_depth’: [3, 4, 5]}
randomized: {‘n_estimators’: scipy.stats.randint(100, 1000), ‘max_depth’: [3, 4, 5]}
- n_runsint, optional
Number of parameter settings that are sampled. Only works when method == ‘randomized’ n_runs trades off runtime vs quality of the solution, by default 10.
- metric{‘MSE’, ‘MAE’, ‘R2’, ‘ACC’, ‘AUC’, ‘F1’, ‘LogLoss’, ‘Brier’}, default=None
For classification tasks: ‘ACC’, ‘AUC’, ‘F1’, ‘LogLoss’, ‘Brier’.
For regression tasks: ‘MSE’, ‘MAE’, ‘R2’.
- test_ratiofloat, optional
Test samples ratio for hpo testing. Only works when cv == None.
- cvint, cross-validation generator or an iterable, optional
Determines the cross-validation splitting strategy, by default None. Possible inputs for cv are:
None, to use the default 20 percent of data as validation data to test, integer, to specify the number of folds in a (Stratified) KFold, CV splitter, An iterable yielding (train, test) splits as arrays of indices.
For integer/None inputs, if the estimator is a classifier and y is either binary or multiclass, StratifiedKFold is used. In all other cases, KFold is used. These splitters are instantiated with shuffle=False so the splits will be the same across calls.
- n_jobsint, optional
Number of jobs to run in parallel, by default None. None means 1 unless in a joblib.parallel_backend context. -1 means using all processors. See Glossary for more details.
- random_stateint, optional
Pseudo random number generator state used for random uniform sampling from lists of possible values instead of scipy.stats distributions. Pass an int for reproducible output across multiple function calls, by default None
- register(pipeline, name)¶
Register a pipeline.
- Parameters:
- pipelinePipeline Object
The pipeline object to register.
- namestr
The name of registered pipeline.
- segmented_diagnose(show=None, model=None, segment_id=None, segment_method=None, segment_feature=None, segment_bins=None, slice_method=None, slice_features=None, slice_bins=None, metric=None, threshold=None, min_samples=None, show_feature=None, distance_metric=None, psi_buckets=None, use_test=None, original_scale=None, return_data=None, figsize=None)¶
Test model performance using various diagnostic tools after bucketing.
If no parameter is passed into the method, the program will run in low code mode, otherwise high code mode.
- Parameters:
- modelstr, default=None
Model’s registered name.
- show{‘segment_table’, ‘accuracy_residual’, ‘accuracy_table’, ‘weakspot’, distribution_shift’}, default=None
All the supported model diagnostics methods.
‘segment_table’: show accuracy of each segment segmented by given segmented method and feature.
If
segment_feature
= None, the table for all features will be output.Accuracy
‘accuracy_residual’: plot residual with respect to a variable. (params:
show_feature
)‘accuracy_table’: show accuracy results with a table.
WeakSpot
‘weakspot’: show detected weakspot regions. (params:
slice_method
,slice_features
,bins
,
metric
,threshold
,min_samples
,use_test
)distribution_shift: plot distribution shift between samples in specific segment and others.
- segment_idint, default=None
The id of segment to diagnose, required when show = ‘accuracy_residual’, ‘accuracy_table’, ‘weakspot’ and ‘distribution_shift’.
- segment_featurestr, default=None
Feature for bucketing.
- segment_method{‘uniform’, ‘quantile’, ‘auto’}
Method for bucketing. - ‘uniform’: All bins have identical widths. - ‘quantile’: All bins have the same number of samples. - ‘auto’: All bins is get from xgb1 model trained by residual.
- segment_binsint, default=None
The number of bins of bucketing when segment_method == ‘uniform’ or ‘quantile’.
If None, it will be 5.
- metric{‘MSE’, ‘MAE’, ‘R2’, ‘ACC’, ‘AUC’, ‘F1’, ‘LogLoss’, ‘Brier’}, default=None
Performance metric, used when
show
= ‘weakspot’.For classification tasks: ‘ACC’, ‘AUC’, ‘F1’, ‘LogLoss’, ‘Brier’.
For regression tasks: ‘MSE’, ‘MAE’, ‘R2’.
- slice_method{‘histogram’, ‘tree’}, default=None
The slicing method for WeakSpot tests, used when
show
=’ weakspot’. If None, it will be ‘histogram’.‘histogram’: default, use equal-space binning;
‘tree’: fit a decision tree to generate regions.
- slice_featureslist, default=None
List of slicing features (at most 2) for Weakspot tests, used when
show
= ‘weakspot’.- slice_binsint, default=None
The number of bins of slicing, used when
show
= ‘weaskspot’. If None, it will be 10.- thresholdfloat, default=None
This parameter determines the threshold for a weak region, used when
show
= ‘weakspot’. If None, it will be 1.1.- min_samplesint, default=None
The minimal sample size for selected regions, used when
show
= ‘weakspot’. If None, it will be 20.- use_testbool, default=None
Whether to use test data or not, used when
show
= ‘weakspot’ and ‘accuracy_residual’. If None, it will be False.- show_featurestr, default=None
Decide the feature of interest in the plot (usually the x-axis), used when
show
= ‘accuracy_residual’, ‘distribution_shift’.- distance_metric{‘PSI’, ‘WD1’, ‘KS’}, default=None
The distance metric used when
show
= ‘distribution_shift’. If None, it will be ‘PSI’.‘PSI’: Population stability index.
‘WD1’: Wasserstein distance (1D).
‘KS’: Kolmogorov-Smirnov test.
- psi_buckets{‘uniform’, ‘quantile’}, default=None
Bucketing strategy for PSI metric. If None, it will be ‘uniform’.
- original_scale: bool, default=None
To use the original scale of X in the plots. If None, it will be False.
- return_data: bool, default=None
Whether to return the data object. If None, it will be False.
- figsizetuple, default=None
Figure size of the plot. If None, it will be (8, 6).
- Returns:
- TestResult
Diagnose result, only available as
return_data
= True.
Examples using piml.Experiment
¶
Build Robust Models with Monotonicity Constraints