piml.Experiment

class piml.Experiment(mode='external', highcode_only=False)

The workflow management class.

Parameters:
modestr, default=’external’

The path of built-in demo datasets.

highcode_onlybool, default=False

Whether to use high code mode only. If True, then the css ingestion is disabled, and low-code interface may look bad.

Methods

data_loader([data, silent, spark, ...])

Load data for experimentation.

data_prepare([target, sample_weight, ...])

Prepare data for model fitting.

data_quality([method, dataset, show, ...])

Check the data quality and remove outliers.

data_summary([feature_type, ...])

Summarize basic data statistics.

eda([show, uni_feature, bi_features, ...])

Run exploratory data analysis.

feature_select([method, threshold, ...])

Select features that are important for modeling.

get_data([x, y, sample_weight, train, test])

Get the preprocessed train-test data.

get_feature_names()

Get the input feature names.

get_feature_types()

Get the data type of each input feature.

get_interpretable_model_list()

Get the list of names of all registered interpretable models.

get_leaderboard([metric])

Show the performance comparison table of all trained models.

get_leaderboard_registered([metric])

Show the performance comparison table of all registered models.

get_model(model)

Get a registered pipeline.

get_model_config(model)

Get the configuration of a model.

get_model_list()

Get the list of names of all registered models.

get_raw_data()

Get the raw train-test data.

get_target_name()

Get the target feature name.

make_pipeline([model, task_type, ...])

Customize a pipeline.

model_compare([models, show, metric, ...])

Compare the diagnostic results of multiple models.

model_diagnose([model, show, metric, ...])

Test model performance using various diagnostic tools.

model_explain([model, show, uni_feature, ...])

Explain an arbitrary fitted model using post-hoc explanation tools.

model_fairness([model, show, metric, ...])

Test model fairness.

model_fairness_compare([models, show, ...])

Compare the fairness results of multiple models.

model_fairness_solas([model, show, metric, ...])

Test model fairness based on solas-ai.

model_interpret([model, show, uni_feature, ...])

Interpret inherently interpretable models.

model_save(model[, path])

Save a PiML-trained model as a pickle file.

model_train([model, name])

Fit interpretable models.

model_tune([model, method, parameters, ...])

Refit a model with new parameters.

register(pipeline, name)

Register a pipeline.

segmented_diagnose([show, model, ...])

Test model performance using various diagnostic tools after bucketing.

data_loader(data: Union[str, DataFrame] = None, silent: bool = False, spark: bool = False, spark_sample_size: int = 100000, spark_sample_by_feature: unicode = None, spark_sample_fractions: dict = None, spark_random_state: int = 0)

Load data for experimentation.

If no parameter is passed into the method, the program will run in low code mode, otherwise high code mode.

Note that loading new data will reset all data, model and experimental results to None.

Parameters:
data{‘CoCircles’, ‘Friedman’, ‘BikeSharing’, ‘TaiwanCredit’, ‘CaliforniaHousing_raw’, ‘CaliforniaHousing_trim1’, ‘CaliforniaHousing_trim2’, ‘SimuCredit’, ‘SolasSimu1’, ‘SolasHMDA’, pd.DataFrame], default=None

The supported inputs:

  • ‘CoCircles’: Gaussian data with a spherical decision boundary for binary classification,

generated via Scikit-Learn. - ‘Friedman’: ‘Friedman #1’ regression problem, generated via Scikit-Learn. - ‘BikeSharing’: Refer to https://archive.ics.uci.edu/ml/datasets/bike+sharing+dataset. - ‘TaiwanCredit’: Refer to https://archive.ics.uci.edu/ml/datasets/default+of+credit+card+clients. - ‘CaliforniaHousing_raw’: Refer to https://developers.google.com/machine-learning/crash-course/california-housing-data-description. - ‘CaliforniaHousing_trim1’: ‘CaliforniaHousing_raw’ dataset with the feature ‘AveOccup’ trimmed by upper threshold 5. - ‘CaliforniaHousing_trim2’: ‘CaliforniaHousing_raw’ dataset with the features ‘AveRooms’, ‘AveBedrms’,’Population’, ‘AveOccup’ trimmed by upper threshold quantile (0.98). - ‘SimuCredit’: A Credit simulation data for fairness testing. - ‘SolasSimu1’: A simulated dataset, modified from the ‘Friedman #1’ regression problem. The covariates used for modeling are ‘Segment’, ‘x1’, ‘x2’, …, ‘x5’, the response ‘Label’ is binary, and it is a classification problem. The rest variables are demographic variables used for testing fairness. The data is contributed by Solas-AI (https://github.com/SolasAI/solas-ai-disparity). - ‘SolasHMDA’: A preprocessed sample of the 2018 Home Mortgage Disclosure Act (HMDA) data. The HMDA dataset includes information about nearly every home mortgage application in the United States. - pandas DataFrame: Dataframe with row as data points and column as variables.

silentbool, default=False

Whether to display data preview or not.

sparkbool, default=False

Whether to load data using spark backend.

spark_sample_sizeint, default=100000

The number of samples to load, used as spark=True.

Note that we would convert this value to the frac parameter in spark, and the actual number of samples retrieved from spark may be different from this value. If spark_sample_size=None, then the whole data would be used, and no subsampling will be performed.

spark_sample_by_featurestr, default=None

The column name to be used for stratified sampling, used as spark=True. If None, no stratified sampling will be performed.

spark_sample_fractionsdict, default=None

The ratios of each category in spark_sample_by_feature, used as spark=True and spark_sample_by_feature is not None.

For instance, if spark_sample_by_feature has two categories (0 and 1), then spark_sample_fractions={0: 1.0, 1: 2.0} means that the ratio of category 0 and category 1 in the subsample is 1.0:2.0. If it is None, or mis-specified, we would keep the ratios of each category to be 1:1.

Note that the actual ratios between the categories can be different from the given ratios, as the total counts of some categories may be less than the desired number.

spark_random_stateint, default=0

The random seed for spark subsampling, used as spark=True and spark_sample_size is not None.

data_prepare(target: unicode = None, sample_weight: unicode = None, task_type: unicode = None, test_ratio: Union[float, list] = None, random_state: int = None, split_method: unicode = None, train_idx=None, test_idx=None, return_data: bool = None, silent: bool = None)

Prepare data for model fitting.

This step will set the target response, task type (classification or regression), and train / test split. If no parameter is passed into the method, the program will run in low code mode, otherwise high code mode.

Parameters:
targetstr, default=None

Target variable name. If None, the last column in the data will be selected.

sample_weight: str, default=None

Sample weight column name.

task_type{‘classification’, ‘regression’}, default=None

Task type, if None, it will be automatically determined according to the target variable.

split_method: {‘random’, ‘outer-sample’, ‘kmeans’}, default=None

The method of splitting train test samples.

  • random: random split.

  • outer-sample: split samples based on euclidean distance of each sample to the data center.

  • kmeans: Run kmeans, and the test samples are randomly picked from each cluster, with the ratios determined by test_ratio.

test_ratiofloat or list of float, default=None

Test sample ratio. If None, it will be 0.2.

As split_method=’kmeans’, it is expected to be a list of float values, and its length determines the number of clusters in KMeans. Each element is a float with values ranges from 0 to 1, corresponds to a test sample ratio for the corresponding cluster.

random_stateint, default=None

Random seed for train / test split. If None, it will be 0.

train_idxarray-like of shape (n_samples_train,), default=None

If train_idx and test_idx are not None, it will be ignored.

test_idxarray-like of shape (n_samples_test,), default=None

If train_idx and test_idx are not None, it will be ignored.

return_databool, default=None

Whether to return data.

silentbool, default=None

Whether to display data prepare summary or not. If None, it will be False.

data_quality(method=None, dataset=None, show: unicode = None, threshold: Union[float, list] = None, remove_outliers: bool = None, distance_metric: unicode = None, psi_buckets: unicode = None, show_feature: unicode = None, return_data: bool = None, figsize: tuple = None)

Check the data quality and remove outliers.

Note that this method requires to execute data_prepare first.

  • For data integrity check, it uses the data after excluding the columns in data_summary as inputs. Hence, its result is independent of feature_select. However, some of the checks involves train test sets, and hence the results would be impacted by data_prepare.

  • For outlier detection, it uses the data after feature select as inputs. Therefore, data_quality and feature_select can be used in turn, and the result would be different if the selected features are different.

Parameters:
methodlist of object or a single object

The following outlier detection method object is supported.

  • ‘IsolationForest’: piml.data.outlier_detection.IsolationForest

  • ‘CBLOF’: piml.data.outlier_detection.CBLOF

  • ‘PCA’: piml.data.outlier_detection.PCA

  • ‘KMeansTree’: piml.data.outlier_detection.KMeansTree

  • ‘OneClassSVM’: piml.data.outlier_detection.OneClassSVM

  • ‘KNN’: piml.data.outlier_detection.KNN

  • ‘HBOS’: piml.data.outlier_detection.HBOS

  • ‘ECOD’: piml.data.outlier_detection.ECOD

dataset{‘all’, ‘train’, ‘test’}, default=None

Specify the dataset for data quality check.

  • ‘all’: Use all samples to check data quality (default choice).

  • ‘train’: Only use training samples to check data quality, available as data_prepare is executed.

  • ‘test’: Only use testing samples to check data quality, available as data_prepare is executed.

show{‘od_score_distribution’, ‘od_marginal_outlier_distribution’, ‘od_tsne_comparison’,

‘integrity_single_column_check’, ‘integrity_duplicated_samples’, ‘integrity_highly_correlated_features’, ‘drift_test_info’, ‘drift_test_distance’}, default=None

Data integrity method:
  • ‘integrity_single_column_check’: Get overall single column integrity check result.

  • ‘integrity_duplicated_samples’: Detect duplicated samples.

  • ‘integrity_highly_correlated_features’: Detect highly correlated features.

Drift test method:
  • ‘drift_test_info’: Get train test size difference and energy distance value.

  • ‘drift_test_distance’: Get train test data distance of each feature.

Outlier detection method:
  • ‘od_score_distribution’: Show the distribution of outlier scores.

  • ‘od_marginal_outlier_distribution’: Show the outliers marginal distribution.

  • ‘od_tsne_comparison’: Compare outliers detected by different methods under 2d t-SNE space.

thresholdfloat or list, default=None

The threshold of outlier scores. It decides whether a sample is an outlier. Samples with outlier scores larger than the threshold are classified as outliers.

remove_outliersbool, default=None

If True, and method is a single outlier detection model, then the detected outliers will be removed. It should only be used when show=’score_distribution’ or ‘marginal_outlier_distribution’.

distance_metric{‘PSI’, ‘WD1’, ‘KS’}, default=None

The distance metric used when show = ‘drift_test_distance’. If None, it will be ‘PSI’.

  • ‘PSI’: Population stability index.

  • ‘WD1’: Wasserstein distance (1D).

  • ‘KS’: Kolmogorov-Smirnov test.

psi_buckets{‘uniform’, ‘quantile’}, default=None

Bucketing strategy for PSI metric in data drift test. If None, it will be ‘uniform’.

show_featurestr, default=None

Feature for distribution plot. If None, will show distance metric scores of each feature.

return_databool, default=None

Whether to return data.

figsizetuple, default=None

Figure size.

data_summary(feature_type: Optional[Dict] = None, feature_exclude: Optional[List] = None, silent: Optional[bool] = None, return_data: Optional[bool] = None)

Summarize basic data statistics.

If no parameter is passed into the method, the program will run in low code mode, otherwise high code mode.

Parameters:
feature_typedict, default=None

Feature type for each feature. Available types include ‘categorical’ and ‘numerical’. For example, {‘X0’: ‘numerical’, ‘X1’: ‘categorical’, ‘X2’: ‘categorical’}.

feature_excludelist, default=None

Features to exclude in training and diagnostics.

silentbool, default=None

Whether to display data summary or not. If None, it will be False.

return_databool, default=None

Whether to return data. If None, it will be False.

eda(show: unicode = None, uni_feature: unicode = None, bi_features: List = None, multi_features: List = None, multi_type: unicode = None, return_data: bool = None, figsize: Tuple = None)

Run exploratory data analysis.

If no parameter is passed into the method, the program will run in low code mode, otherwise high code mode.

It uses all the raw data (after excluding the columns in data_summary) as input, and the results would not be impacted by data_prepare, data_quality, and feature_select.

Parameters:
show{‘univariate’, ‘bivariate’, ‘multivariate’}, default=None

The plot method.

uni_featurestr, default=None

Feature name for ‘univariate’ plot, used when show = ‘univariate’.

bi_featureslist, default=None

Feature names for ‘bivariate’ plot, used when show = ‘bivariate’.

multi_featureslist, default=None

Feature names for ‘multivariate’ plot, used when show = ‘multivariate’. If None, then all the features will be used in the correlation plot.

multi_type{‘correlation_heatmap’, ‘correlation_graph’}, default=None

Plot type of ‘multivariate’ correlation plot, used when show = ‘multivariate’. If None, it will be ‘correlation_heatmap’.

return_data: bool, default=None

Whether to return the data object. If None, it will be False.

figsizetuple, default=None

Figure size of the plot. If None, it will be (8, 6).

feature_select(method: unicode = None, threshold: float = None, corr_algorithm: unicode = None, preset: list = None, kernel_size: int = None, n_forward_phase: int = None, return_data: bool = None, figsize: Tuple = None)

Select features that are important for modeling.

If no parameter is passed into the method, the program will run in low code mode, otherwise high code mode.

Note that this method requires to execute data_prepare first. It uses the training data after outlier removal as inputs. Therefore, data_quality and feature_select can be used in turn, and the result would be different if the selected features are different.

Parameters:
method{‘cor’, ‘dcor’, ‘pfi’, ‘rcit’}, default=None

The method for feature selection. If None, it will be ‘cor’.

  • ‘cor’: Use Pearson correlation coefficient to select features.

  • ‘dcor’: Use distance correlation coefficient to select features.

  • ‘pfi’: Use permutation feature importance of a surrogate model (XGB) to select features.

  • ‘rcit’: Use randomized conditional independence test to select features.

thresholdfloat, default=None

The threshold for feature selection, which has different meaning under different method.

  • ‘cor’: The absolute threshold of Pearson correlation coefficient.

    Features with absolute Pearson correlation coefficient larger than the threshold will be chosen. If None, it will be 0.1.

  • ‘dcor’: The threshold for the value of distance correlation coefficient.

    Features with distance correlation coefficient larger than the threshold will be chosen. If None, it will be 0.1.

  • ‘pfi’: The threshold for accumulated value of normalized permutation feature importance of XGB.

    Features with accumulated importance larger than the threshold will be chosen. If None, it will be 0.99.

  • ‘rcit’: The threshold for the p-value of RCIT test.

    Features are selected in an iterative manner. In a forward iteration: a feature will be selected as the p-value of the RCIT test is smaller than or equal to the threshold, conditional on existing selected features. In a backward iteration: a selected feature will be removed if the corresponding p-value is greater than the threshold. If None, it will be 1e-6.

corr_algorithm: str, default=None

The algorithm of correlation method.

  • ‘pearson’: Use Pearson correlation.

  • ‘spearman’: Use Spearman rank correlation.

presetlist, default=None

The initialization of selected feature names; used when method = ‘rcit’. If None, it will be an empty list.

kernel_sizeint, default=None

The number of Random Fourier Features used in the conditioning set, used when method = ‘rcit’. If None, it will be 100.

n_forward_phaseint, default=None

The number of forward repetition iteration of forward backward feature selection with early stopping (FBEDk) algorithm. Used when method = ‘rcit’. If None, it will be 2.

return_data: bool, default=None

Whether to return the data object. If None, it will be False.

figsizetuple, default=None

Figure size of the plot. If None, it will be (8, 6).

get_data(x=None, y=None, sample_weight=None, train=False, test=False)

Get the preprocessed train-test data.

The train_x and test_x only include selected features.

Parameters:
xndarray of shape (n_samples, n_features), default=None

Selected features. If None, will use the default data in the workflow.

yndarray of shape (n_samples, ), default=None

Response. If None, will use the default data in the workflow.

sample_weightndarray of shape (n_samples, ), default=None

The sample weights. If None, will use the default data in the workflow.

trainbool, default=False

Whether to return training data only. Not available if x, y, or sample_weight is specified.

testbool, default=False

Whether to return testing data only. Not available if x, y, or sample_weight is specified.

get_feature_names()

Get the input feature names.

Returns:
feature_names: list of str

Feature names.

get_feature_types()

Get the data type of each input feature.

Returns:
feature_types: list of str

Feature types.

get_interpretable_model_list()

Get the list of names of all registered interpretable models.

Returns:
List of str

The list of all registered pipelines names.

get_leaderboard(metric=None)

Show the performance comparison table of all trained models.

Note that only models trained using exp.model_train api are considered here. For models trained outside the piml workflow, use e.g., exp.model_diagnose(model=”Model_Name”, show=”accuracy_table”) instead.

Parameters:
metric{‘MSE’, ‘MAE’, ‘R2’, ‘ACC’, ‘AUC’, ‘F1’, ‘LogLoss’, ‘Brier’}, default=None
  • For classification tasks: ‘ACC’, ‘AUC’, ‘F1’, ‘LogLoss’, ‘Brier’.

  • For regression tasks: ‘MSE’, ‘MAE’, ‘R2’.

get_leaderboard_registered(metric=None)

Show the performance comparison table of all registered models.

Note that only models trained using exp.model_train api are considered here. For models trained outside the piml workflow, use exp.model_diagnose(model=”Model_Name”, show=”accuracy_table”) instead.

Parameters:
metric{‘MSE’, ‘MAE’, ‘R2’, ‘ACC’, ‘AUC’, ‘F1’, ‘LogLoss’, ‘Brier’}, default=None
  • For classification tasks: ‘ACC’, ‘AUC’, ‘F1’, ‘LogLoss’, ‘Brier’.

  • For regression tasks: ‘MSE’, ‘MAE’, ‘R2’.

get_model(model)

Get a registered pipeline.

Parameters:
modelstr

The registered pipeline name.

Returns:
ModelPipeline

A pipeline containing raw data, data preprocessing, and an estimator.

get_model_config(model)

Get the configuration of a model.

Parameters:
modelstr

Model’s registered name.

get_model_list()

Get the list of names of all registered models.

Returns:
List of str

The list of all registered pipelines names.

get_raw_data()

Get the raw train-test data.

The train_x and test_x only include selected features.

Returns:
dataDataTuple

It includes (train_x, train_y, train_sample_weight, test_x, test_y, test_sample_weight, feature_names, target_name).

get_target_name()

Get the target feature name.

Returns:
target_names: str

Target name.

make_pipeline(model=None, task_type=None, predict_func=None, predict_proba_func=None, fit_func=None, train_x=None, train_y=None, test_x=None, test_y=None, train_sample_weight=None, test_sample_weight=None, excluded_features=None, feature_names=None, feature_types=None, target_name=None, normalize_strategy=None, encode_strategy=None)

Customize a pipeline.

Parameters:
modelEstimator Object or pickle file path

An estimator which should follow the sklearn style.

task_type{‘regression’, ‘classification’}, default=None

The task type. If None, it will be automatically determined using the data type of train_y and test_y. Would raise an error if the task type does not match the model type

predict_funccallable function, default=None

The predict function that receives X (numpy array) as input and output the predictions (numpy array). Only Used as model is None.

predict_proba_funccallable function, default=None

The predict_proba function that receives X (numpy array) as input and output the predicted probability of each class (numpy array of shape n * 2). Only Used as model is None and task_type is ‘classification’.

fit_funccallable function, default=None

The fit function that receives X, y, sample_weight as inputs and refit the model. Only Used as model is None, and it is just optional.

encode_strategy{‘ordinal’, ‘one_hot’}, default=None

The encoding strategy names. If None, no encoding is performed.

task_type{‘regression’, ‘classification’}, default=None

The task type, used as the model does not have the attribute “_estimator_type”.

train_xndarray of shape (n_samples_train, n_features), default=None

The training data for the estimator. Use the default data in Experiment if None.

train_yndarray of shape (n_samples_train, ), default=None

The training target for the estimator. Use the default data in Experiment if None.

test_xndarray of shape (n_samples_test, n_features), default=None

The testing data for the estimator. Use the default data in Experiment if None.

test_yndarray of shape (n_samples_test, ), default=None

The testing target for the estimator. Use the default data in Experiment if None.

train_sample_weightndarray of shape (n_samples_test, ), default=None

The testing sample_weight for the estimator. Use the default data in Experiment if None.

test_sample_weightndarray of shape (n_samples_test, ), default=None

The testing sample_weight for the estimator. Use the default data in Experiment if None.

excluded_featureslist, default=None

Feature names to exclude for the model.

feature_nameslist, default=None

Feature names.

feature_types: list, default=None

Feature types, can be ‘numerical’ or ‘categorical’; if None or empty, the type of each feature will be determined by several samples

target_namestr, default=None

Target name.

encode_strategy{‘ordinal’, ‘one_hot’}, default=None

The encoding strategy names. If None, no encoding is performed.

normalize_strategy{‘minmax’, ‘unit_norm’}, default=None

The normalization strategy names. If None, no normalization is performed.

Returns:
Pipeline object

A pipeline containing raw data, data preprocessing, and an estimator.

model_compare(models: List[str] = None, show: unicode = None, metric: unicode = None, immu_feature: unicode = None, perturb_features: Union[str, List[str]] = None, perturb_method: unicode = None, resilience_method: unicode = None, perturb_size: float = None, psi_buckets: unicode = None, distance_metric: unicode = None, min_samples: int = None, alpha: float = None, n_clusters: int = None, bins: int = None, slice_feature: unicode = None, slice_method: unicode = None, original_scale: bool = None, return_data: bool = None, figsize: Tuple[int, int] = None)

Compare the diagnostic results of multiple models.

If no parameter is passed into the method, the program will run in low code mode, otherwise high code mode.

Parameters:
modelslist, default=None

Models names, up to 3 models.

showstr, default=None

Supported diagnostic methods for model comparison:

  • Accuracy

    • ‘accuracy_plot’: evaluate the model performance. (params: metric)

  • Overfit

    • ‘overfit’: compare overfit performance between models. (params: slice_method,

    slice_feature, bins, metric)

  • Reliability

    • ‘reliability_bandwidth’: compare reliability bandwidth. (params: alpha)

    • ‘reliability_coverage’: only supports Regressors; compare reliability coverage. (params: alpha)

  • Robustness

    • ‘robustness_perf’: compare robustness performance. (params: perturb_method,

    perturb_features, perturb_size, metric)

    • ‘robustness_perf_worst’: compare robustness performance between models, based on worst sample.

    (params: perturb_method, perturb_features, perturb_size, metric, alpha)

  • Resilience

    • ‘resilience_perf’: compare resilience performance. (params: resilience_method,

    alpha, n_clusters, immu_feature, metric)

    • ‘resilience_distance’: compare resilience distribution distance. (params: resilience_method,

    alpha, n_clusters, immu_feature, distance_metric)

metric{‘MSE’, ‘MAE’, ‘R2’, ‘ACC’, ‘AUC’, ‘F1’, ‘LogLoss’, ‘Brier’}, default=None

Performance metric, used when show = ‘accuracy_plot’, ‘overfit’, ‘robustness_perf’, ‘robustness_perf_worst’, or ‘resilience_perf’.

  • For classification tasks: ‘ACC’, ‘AUC’, ‘F1’, ‘LogLoss’, ‘Brier’.

  • For regression tasks: ‘MSE’, ‘MAE’, ‘R2’.

immu_featurestr, default=None

The name of immutable feature, used when show = ‘resilience_perf’ or ‘resilience_distance’. If None, it will be an empty list.

perturb_featuresstr or list of str, default=None

The feature or features list to perturb in Robustness test, used when show = ‘robustness_perf’ or ‘robustness_perf_worst’. If None, it will be the list of all X.

perturb_method{‘raw’, ‘quantile’}, default=None

Perturbation method for Robustness test, used when show = ‘robustness_perf’ or ‘robustness_perf_worst’. If None, it will be ‘raw’.

  • ‘raw’: add normal noise directly on X for perturbation;

  • ‘quantile’: add uniform noise on quantile space of X.

perturb_sizefloat, default=None

The perturbation strength in Robustness test, used when show = ‘robustness_perf’ or ‘robustness_perf_worst’. If None, it will be 0.1.

  • For numeric features:

    • When perturb_method = ‘raw’: the standard deviation of the noise,

    i.e., noise std = perturb_size * std(X).

    • When perturb_method = ‘quantile’: the range of the uniform distribution of noise,

    i.e., noise range = [-0.5 * perturb_size, 0.5 * perturb_size].

  • For categorical features, the perturbation probability of categorical feature.

resilience_method{‘worst-sample’, ‘hard-sample’, ‘outer-sample’, ‘worst-cluster’}, default=None

The method used for selecting worst samples, used when show = ‘resilience_perf’ or ‘resilience_distance’. If None, it will be ‘worst-sample’.

  • ‘worst-sample’: Select the worst samples according to the loss of each sample;

  • ‘hard-sample’: Use a deep XGB model to distinguish hard and easy samples;

  • ‘outer-sample’: Select the worst samples according to the distance of each sample to the center;

  • ‘worst-cluster’: Fit a K-means model, and select the worst performing cluster as the worst sample.

distance_metric{‘PSI’, ‘WD1’, ‘KS’}, default=None

The distance metric used when show = ‘resilience_distance’. If None, it will be ‘PSI’.

  • ‘PSI’: Population Stability Index.

  • ‘WD1’: Wasserstein distance-1D.

  • ‘KS’: Kolmogorov-Smirnov.

psi_buckets{‘uniform’, ‘quantile’}, default=None

Bucketing strategy for PSI metric. If None, it will be ‘uniform’.

min_samplesint, default=None

The min samples allow in each overfit region.

alphafloat, default=None

This parameter has different meanings for different diagnostics, used when show = ‘reliability_bandwidth’, or ‘reliability_coverage’, ‘robustness_perf_worst’, ‘resilience_perf’, or ‘resilience_distance’. If None, it will be 0.1.

  • For Resilience test: decides the ratio for worst sampling, can only be 0.1, 0.2, …, 1;

  • For Reliability test: the error rate in Split Conformal Prediction (only for regression tasks).

n_clustersint, default=None

The number of clusters in K-means used when resilience_method = ‘worst-cluster’ and show = ‘resilience_perf’ or ‘resilience_distance’. If None, it will be 10.

binsint, default=None

The number of bins, used when show = ‘overfit’. If None, it will be 10.

slice_featurestr, default=None

Decide the feature of interest in the plot (usually the x-axis), used when show = ‘overfit’.

slice_method{‘histogram’, ‘tree’}, default=None

The slicing method for Overfit test, used when show = ‘overfit’. If None, it will be ‘histogram’.

  • ‘histogram’: default, use equal-space binning;

  • ‘tree’: fit a decision tree to generate regions, not applicable when show = ‘overfit’.

original_scale: bool, default=None

To use the original scale of X in the plots. If None, it will be False.

return_data: bool, default=None

Whether to return the data object. If None, it will be False.

figsizetuple, default=None

Figure size of the plot, If None, it will be (8, 6).

model_diagnose(model: unicode = None, show: unicode = None, metric: unicode = None, perturb_size: float = None, perturb_features: Union[str, List[str]] = None, perturb_method: unicode = None, bins: int = None, resilience_method: unicode = None, alpha: float = None, n_clusters: int = None, slice_features: Union[str, List[str]] = None, slice_method: unicode = None, threshold: float = None, min_samples: int = None, use_test: bool = None, psi_buckets: unicode = None, immu_feature: unicode = None, show_feature: unicode = None, distance_metric: unicode = None, original_scale: bool = None, return_data: bool = None, figsize: Tuple[int, int] = None)

Test model performance using various diagnostic tools.

If no parameter is passed into the method, the program will run in low code mode, otherwise high code mode.

Parameters:
modelstr, default=None

Model’s registered name.

show{‘accuracy_residual’, ‘accuracy_plot’, ‘accuracy_table’, ‘overfit’, ‘weakspot’, ‘robustness_perf’,
‘robustness_perf_worst’, ‘resilience_perf’, ‘resilience_distance’, ‘resilience_shift_density’,
‘resilience_shift_histogram’, ‘reliability_distance’, ‘reliability_marginal’, ‘reliability_table’,
‘reliability_perf’, ‘reliability_calibration’}, default=None

All the supported model diagnostics methods.

  • Accuracy

    • ‘accuracy_residual’: plot residual with respect to a variable. (params: show_feature)

    • ‘accuracy_plot’: only supports Classifiers; plot confusion matrix, ROC and Recall-Precision.

    • ‘accuracy_table’: show train and test performance via a table.

  • WeakSpot

    • ‘weakspot’: show detected weakspot regions. (params: slice_method, slice_features, bins,

    metric, threshold, min_samples, use_test)

  • Overfit

    • ‘overfit’: identify overfitting regions with high testing - training error gaps. (params:

    slice_method, slice_features, bins, metric, threshold, min_samples)

  • Reliability

    • ‘reliability_table’: show empirical coverage and average bandwidth for regression or Brier Loss

    for classification. (params: alpha)

    • ‘reliability_distance’: calculate distribution shift distance of features between reliable and

    unreliable data. (params: alpha, threshold, distance_metric)

    • ‘reliability_marginal’: plot the histogram of bandwidth against a given feature. (params: alpha,

    show_feature, bins, threshold)

    • ‘reliability_perf’: only for classifiers; reliability diagram.

    • ‘reliability_calibration’: only for classifiers; show the calibrated predicted probability vs.

    original predicted probability.

  • Robustness

    • ‘robustness_perf’: performance against perturbation size. (params: perturb_method,

    perturb_features, perturb_size, metric)

    • ‘robustness_perf_worst’: performance against perturbation size based on worst sample. (params:

    perturb_method, perturb_features, perturb_size, metric, alpha)

  • Resilience

    • ‘resilience_perf’: performance against worst sample ratio. (params: resilience_method,

    alpha, metric, immu_feature)

    • ‘resilience_distance’: calculate distribution shift distance of features between worst sample and

    full dataset. (params: resilience_method, alpha, immu_feature, distance_metric)

    • ‘resilience_shift_density’: compare distribution between worst sample and full dataset with

    density plot. (params: resilience_method, alpha, immu_feature, show_feature)

    • ‘resilience_shift_histogram’: compare distribution between worst sample and full dataset with

    shift plot. (params: resilience_method, alpha, immu_feature, show_feature)

metric{‘MSE’, ‘MAE’, ‘R2’, ‘ACC’, ‘AUC’, ‘F1’, ‘LogLoss’, ‘Brier’}, default=None

Performance metric, used when show = ‘weakspot’, ‘overfit’, ‘robustness_perf’, ‘robustness_perf_worst’, ‘resilience_perf’, ‘resilience_distance’, ‘resilience_shift_density’ or ‘resilience_shift_histogram’.

  • For classification tasks: ‘ACC’, ‘AUC’, ‘F1’, ‘LogLoss’, ‘Brier’.

  • For regression tasks: ‘MSE’, ‘MAE’, ‘R2’.

slice_featureslist, default=None

List of slicing features (at most 2) for Weakspot and Overfit tests, used when show = ‘weakspot’ or ‘overfit’.

slice_method{‘histogram’, ‘tree’}, default=None

The slicing method for WeakSpot and Overfit tests, used when show =’ weakspot’ or ‘overfit’. If None, it will be ‘histogram’.

  • ‘histogram’: default, use equal-space binning;

  • ‘tree’: fit a decision tree to generate regions, not applicable when show = ‘overfit’.

perturb_featuresstr or list of str, default=None

The feature or features list to perturb in Robustness test, used when show = ‘robustness_perf’ or ‘robustness_perf_worst’. If None, it will be the list of all X.

perturb_method{‘raw’, ‘quantile’}, default=None

Perturbation method for Robustness test, used when show = ‘robustness_perf’ or ‘robustness_perf_worst’. If None, it will be ‘raw’.

  • ‘raw’: add normal noise directly on X for perturbation;

  • ‘quantile’: add uniform noise on quantile space of X.

perturb_sizefloat, default=None

The perturbation strength in Robustness test, used when show = ‘robustness_perf’ or ‘robustness_perf_worst’. If None, it will be 0.1.

  • For numeric features:

    • When perturb_method = ‘raw’: the standard deviation of the noise, i.e.,

    noise std = perturb_size * std(X). - When perturb_method = ‘quantile’: the range of the uniform distribution of noise, i.e., noise range = [-0.5 * perturb_size, 0.5 * perturb_size].

  • For categorical features, the perturbation probability of categorical feature.

resilience_method{‘worst-sample’, ‘hard-sample’, ‘outer-sample’, ‘worst-cluster’}, default=None

The method used for selecting worst samples, used when show = ‘resilience_perf’, ‘resilience_distance’, ‘resilience_shift_density’, or ‘resilience_shift_histogram’. If None, it will be ‘worst-sample’.

  • ‘worst-sample’: Select the worst samples according to the loss of each sample;

  • ‘hard-sample’: Use a deep XGB model to distinguish hard and easy samples;

  • ‘outer-sample’: Select the worst samples according to the distance of each sample to the center;

  • ‘worst-cluster’: Fit a K-means model, and select the worst performing cluster as the worst sample.

alphafloat, default=None

This parameter has different meanings for different diagnostics, used when show = ‘resilience_perf’, ‘resilience_distance’, ‘resilience_shift_density’, ‘resilience_shift_histogram’, ‘reliability_distance’, ‘reliability_marginal’ or ‘reliability_table’. If None, it will be 0.1.

  • For Resilience test: determines the ratio for worst sampling, which can only be 0.1, 0.2, …, 0.9;

  • For Reliability test: the error rate in Split Conformal Prediction (only for regression tasks).

n_clustersint, default=None

The number of clusters in K-means used when resilience_method = ‘worst-cluster’ and show = ‘resilience_shift_density’, ‘resilience_shift_histogram’ or ‘resilience_distance’. If None, it will be 10.

binsint, default=None

The number of bins, used when show = ‘weakspot’, ‘overfit’, ‘reliability_perf’, or ‘reliability_marginal’. If None, it will be 10.

thresholdfloat, default=None

This parameter has different meanings for different diagnostics, used when show = ‘weakspot’,’overfit’,’reliability_distance’, or ‘reliability_marginal’. If None, it will be 1.1.

  • For Overfit test: decides minimal error gap for an overfit region;

  • For Weakspot test: decides minimal error gap for a weak region;

  • For Reliability test: the threshold to determine whether data are inside or outside coverage.

min_samplesint, default=None

The minimal sample size for selected regions, used when show = ‘weakspot’ or ‘overfit’. If None, it will be 20.

use_testbool, default=None

Whether to use test data or not, used when show = ‘weakspot’. If None, it will be False.

immu_featurestr, default=None

The name of immutable feature, used when show = ‘resilience_perf’, ‘resilience_distance’, ‘resilience_shift_density’, or ‘resilience_shift_histogram’. If None, it will be an empty list.

show_featurestr, default=None

The feature of interest in the plot (usually the x-axis). It is used when show = ‘accuracy_residual’, ‘reliability_marginal’, ‘resilience_shift_density’, or ‘resilience_shift_histogram’.

distance_metric{‘PSI’, ‘WD1’, ‘KS’}, default=None

The distance metric used when show = ‘reliability_distance’ or ‘resilience_distance’. If None, it will be ‘PSI’.

  • ‘PSI’: Population stability index.

  • ‘WD1’: Wasserstein distance (1D).

  • ‘KS’: Kolmogorov-Smirnov test.

psi_buckets{‘uniform’, ‘quantile’}, default=None

Bucketing strategy for PSI metric. If None, it will be ‘uniform’.

original_scale: bool, default=None

To use the original scale of X in the plots. If None, it will be False.

return_data: bool, default=None

Whether to return the data object. If None, it will be False.

figsizetuple, default=None

Figure size of the plot, If None, it will be (8, 6).

Returns:
TestResult

Diagnose result, only available as return_data = True.

model_explain(model: unicode = None, show: unicode = None, uni_feature: unicode = None, bi_features: List[str] = None, sample_id: int = None, sample_size: int = None, n_repeats: int = None, return_data: bool = None, grid_size: int = None, response_method: unicode = None, sliced_line: bool = None, centered: bool = None, use_test: bool = None, original_scale: bool = None, figsize: Tuple[int, int] = None)

Explain an arbitrary fitted model using post-hoc explanation tools.

If no parameter is passed into the method, the program will run in low code mode, otherwise high code mode.

Parameters:
modelstr, default=None

Model’s registered name.

show{‘pfi’, ‘pdp’, ‘ice’, ‘ale’, ‘shap_summary’, ‘shap_fi’, ‘shap_scatter’, ‘shap_waterfall’,

‘lime’},default=None

Supported global and local explanation methods:

  • ‘pfi’: Permutation feature importance (PFI).

  • ‘pdp’: Partial dependence plot (PDP).

  • ‘ice’: Individual conditional effect (ICE).

  • ‘hstats’: H-statistic.

  • ‘ale’: Accumulated local effects (ALE).

  • ‘shap_waterfall’: SHAP for single data point.

  • ‘lime’: LIME for single data point.

  • ‘shap_summary’: SHAP summary.

  • ‘shap_fi’: SHAP feature importance.

  • ‘shap_scatter’: SHAP scatter plot.

uni_featurestr, default=None

Feature name of the univariate plot, used when show = ‘pdp’, ‘ice’, ‘ale’, or ‘shap_scatter’.

bi_featureslist of str, default=None

Feature names of the bivariate plot, used when show = ‘pdp’ or ‘ale’.

sample_idint, default=None

The index of single data point for local explanation, used when show = ‘lime’ or ‘shap_waterfall’. If None, it will be 0.

sample_sizeint, default=None

The subsample size for generating some computational expensive explanations, used when show = ‘pdp’, ‘hstats’, ‘ice’, ‘shap_fi’, ‘shap_summary’, and ‘shap_scatter’. If None, it will be 500 for ‘shap_fi’, ‘shap_summary’, and ‘shap_scatter’; 2000 for ‘pdp’, ‘ice’, and ‘hstats’.

grid_sizeint, default=None

The grid size for generating plots, used when show = ‘pdp’, ‘hstats’, and ‘ale’. If None, it will be 100 for 1D PDP, ICE, and ALE; 10 for 2D PDP, ALE and H-statistic.

response_method{‘auto’, ‘predict_proba’, ‘decision_function’}, default=’auto’

Specifies whether to use predict_proba or decision_function as the target response as show = ‘pdp’, ‘ice’, ‘hstats’, and ‘ale’. For regressors this parameter is ignored and the response is always the output of predict. By default, predict_proba is tried first, and we revert to decision_function if it does not exist.

n_repeatsint, default=None

The number of repetition in the permutation feature importance test, used when show = ‘pfi’. If None, it will be 1.

return_data: bool, default=None

Whether to return the data object. If None, it will be False.

sliced_linebool, default=None

Whether to show sliced line for 2d explain instead of heatmap, used when show = ‘pdp’ or ‘ale’. If None, it will be False.

centeredbool, default=None

Whether to center X (subtract the average) for generating local explanations, used when show = ‘lime’ or ‘local_fi’. If None, it will be True.

use_testbool default=None

Whether to use test set to do the explanation. If True, the test set will be used. Otherwise, the training set will be used. If None, it will be False.

original_scale: bool, default=None

Whether to use original scale of X, used when show = ‘pdp’, ‘ice’, or ‘ale’. If None, it will be False.

figsizetuple, default=None

Figure size of the plot. If None, it will be (8, 6).

Returns:
TestResult

Diagnose result, only available as return_data = True.

model_fairness(model: unicode = None, show: unicode = None, metric: unicode = None, metric_threshold: float = None, favorable_class: int = None, favorable_threshold: float = None, group_category: List = None, reference_group: List = None, protected_group: List = None, thresholding_bins: int = None, segment_feature: unicode = None, segment_bins: int = None, performance_metric: unicode = None, distance_metric: unicode = None, binning_dict: Dict = None, by_weights: List = None, return_data: bool = None, figsize: Tuple[int, int] = None)

Test model fairness.

Parameters:
modelstr, default=None

Model’s registered name.

show{‘metrics’, ‘segmented’, ‘thresholding’, ‘binning’}, default=None

The supported methods in the fairness module.

  • ‘metrics’: show fairness metric results.

  • ‘segmented’: show segmented fairness metric results.

  • ‘thresholding’: show fairness-accuracy trade-off results plot with different outcome cut-offs

(classification). - ‘binning’: show fairness-accuracy trade-off results plot with different feature binning.

metric{‘AIR’, ‘SMD’, ‘Precision’, ‘Recall’}, default=None

The metric for fairness testing.

  • ‘AIR’: Adverse impact ratio.

  • ‘SMD’: Standardized mean difference.

  • ‘Precision’: Positive predictive value disparity ratio.

  • ‘Recall’: True positive rate disparity ratio.

favorable_classint, default=None

Favorable target class for AIR.

favorable_thresholdfloat, default=None

Favorable threshold to binarize the predicted outcomes (predict_proba for classifiers). Not available for regressor.

group_categorylist, default=None

Feature list of each group.

reference_grouplist, default=None

Reference group list of category names.

protected_grouplist, default=None

Protected group list of category names.

metric_thresholdfloat, default=None

Threshold for the fairness metric. (the dotted line in the plots)

thresholding_binsint, default=None

The number of bins of numerical features as the show = ‘binning’.

segment_featurestr, default=None

The segment feature’s name as show=’segmented’.

segment_binsint, default=None

The number of bins of numerical features as the show = ‘segmented’.

performance_metric{‘MSE’, ‘MAE’, ‘R2’, ‘ACC’, ‘AUC’, ‘F1’, ‘LogLoss’, ‘Brier’}, default=None
  • For classification tasks: ‘ACC’, ‘AUC’, ‘F1’, ‘LogLoss’, ‘Brier’.

  • For regression tasks: ‘MSE’, ‘MAE’, ‘R2’.

distance_metric{‘PSI’, ‘WD1’, ‘KS’}, default=None

The distance metric name for data comparing.

  • ‘PSI’: Population stability index.

  • ‘WD1’: Wasserstein distance (1D).

  • ‘KS’: Kolmogorov-Smirnov test.

binning_dictlist, default=None

The binning list for each binning config.

For example: the following list defines two configurations.

binning_dict = {'Balance': {'type': 'uniform', 'value': (1, 5),
                'Mortgage': {'type': 'quantile', 'value': (2, 5)},
                'Utilization': {'type': 'custom', 'value': (0.1, 0.2)}}
by_weights: list, default=None

The index list of weighted groups.

return_data: bool, default=None

Whether to return the data object. If None, it will be False.

figsizetuple, default=None

Figure size of the plot, if None, the figure size will be (8, 6).

model_fairness_compare(models: list = None, show: unicode = None, metric: unicode = None, favorable_class: int = None, favorable_threshold: float = None, group_category: List = None, reference_group: List = None, protected_group: List = None, metric_threshold: float = None, segment_feature: unicode = None, segment_bins: int = None, by_weights: List = None, return_data: bool = None, figsize: Tuple[int, int] = None)

Compare the fairness results of multiple models.

Parameters:
modelslist, default=None

List of model’s registered names.

show{‘metrics’, ‘segmented’}, default=None

The fairness result method.

  • ‘metrics’: show fairness metric results.

  • ‘segmented’: show segmented fairness metric results.

metric{‘AIR’, ‘SMD’, ‘Precision’, ‘Recall’}, default=None

The metric for fairness testing.

  • ‘AIR’: Adverse impact ratio.

  • ‘SMD’: Standardized mean difference.

  • ‘Precision’: Positive predictive value disparity ratio.

  • ‘Recall’: True positive rate disparity ratio.

favorable_classint, default=None

Favorable target class for AIR.

favorable_thresholdfloat, default=None

Favorable threshold for binarizing the predicted outcomes (predict_proba for classifiers).

group_categorylist, default=None

Feature list of each group.

reference_grouplist, default=None

Reference group list of category names.

protected_grouplist, default=None

Protected group list of category names.

metric_thresholdfloat, default=None

Threshold for the fairness metric. (the dotted line in the plots)

segment_featurestr, default=None

The segment feature’s name as show = ‘segmented’.

segment_binsint, default=None

The number of bins of numerical features as the show = ‘segmented’.

by_weights: list, default=None

The index list of weighted groups.

return_data: bool, default=None

Whether to return the data object. If None, it will be False.

figsizetuple, default=None

Figure size of the plot. If None, it will be (8, 6).

model_fairness_solas(model: unicode = None, show: unicode = None, metric: unicode = None, favorable_class: int = None, favorable_threshold: float = None, group_category: List = None, reference_group: List = None, protected_group: List = None, metric_threshold: float = None, segment_feature: unicode = None, segment_bins: int = None, by_weights: List = None, figsize: tuple = None)

Test model fairness based on solas-ai.

Parameters:
modelstr, default=None

Model’s registered name.

show{‘metrics’, ‘segmented’}, default=None

The supported methods in the fairness module.

  • ‘metrics’: show fairness metric result.

  • ‘segmented’: show segmented fairness metric result.

metric{‘AIR’, ‘SMD’, ‘RSMD’, ‘Odds Ratio’}, default=None

The metric for fairness testing.

  • ‘AIR’: Adverse impact ratio.

  • ‘SMD’: Standardized mean difference.

  • ‘RSMD’: Residual standardized mean difference.

  • ‘Odds Ratio’: Odds ratio.

favorable_classint, default=None

Favorable target class for AIR.

favorable_thresholdfloat, default=None

Favorable threshold for binarizing the predicted outcomes (predict_proba for classifiers).

segment_featurestr, default=None

Segmented feature name for segmented AIR.

segment_binsint, default=None

The number of bins of numerical features as the show=’segmented’.

group_categorylist, default=None

Category list of group.

reference_grouplist, default=None

Reference group list.

protected_grouplist, default=None

Protected group list.

metric_thresholdfloat, default=None

Threshold for the fairness metric. (the dotted line in the plots)

metric{‘MSE’, ‘MAE’, ‘R2’, ‘ACC’, ‘AUC’, ‘F1’, ‘LogLoss’, ‘Brier’}, default=None
  • For classification tasks: ‘ACC’, ‘AUC’, ‘F1’, ‘LogLoss’, ‘Brier’.

  • For regression tasks: ‘MSE’, ‘MAE’, ‘R2’.

by_weights: list, default=None

The index list of weighted groups.

figsizetuple, default=None

Figure size of the plot. If None, it will be (8, 6).

model_interpret(model: unicode = None, show: unicode = None, uni_feature: unicode = None, bi_features: List[str] = None, sample_id: int = None, tree_idx: int = None, root: int = None, depth: int = None, sliced_line: bool = None, centered: bool = None, use_test: bool = None, original_scale: bool = None, return_data: bool = None, figsize: Tuple[int, int] = None)

Interpret inherently interpretable models.

If no parameter is passed into the method, the program will run in low code mode, otherwise high code mode.

Parameters:
modelstr, default=None

Model’s registered name.

show{‘global_fi’, ‘local_fi’, ‘global_ei’, ‘local_ei’, ‘global_effect_plot’, ‘glm_coef_plot’,

‘glm_coef_table’, ‘tree_global’, ‘tree_local’, ‘figs_heatmap’, ‘llm_summary’, ‘llm_violin’, ‘llm_pc’}, default=None

Each inherently interpretable model supports different interpretation methods (corresponding to different show parameter values):

| Interpret Method     | GLM | GAM | GAMI-NET | Tree | FIGS | XGB1 | XGB2 | EBM | ReLU-DNN | Description                                          |
| -------------------- | --- | --- | -------- | ---- | ---- | ---- | ---  | --- | -------- | -----------------------------------------------------|
| "global_fi"          |  X  |  X  |    X     |      |      |  X   |  X   |  X  |    X     | Global Feature Importance                            |
| "local_fi"           |  X  |  X  |    X     |      |      |  X   |  X   |  X  |    X     | Local Feature Importance                             |
| "global_ei"          |     |     |    X     |      |      |      |  X   |  X  |          | Global Effect Importance                             |
| "local_ei"           |     |     |    X     |      |      |      |  X   |  X  |          | Local Effect Importance                              |
| "global_effect_plot" |     |  X  |    X     |      |      |  X   |  X   |  X  |    X     | Global Effect Plot                                   |
| "glm_coef_plot"      |  X  |     |          |      |      |      |      |     |          | Global Model Coefficients Plot for GLM models        |
| "glm_coef_table"     |  X  |     |          |      |      |      |      |     |          | Global Model Coefficients Table for GLM models       |
| "tree_global"        |     |     |          |  X   |  X   |      |      |     |          | Global Interpretation Plot for Tree models           |
| "tree_local"         |     |     |          |  X   |  X   |      |      |     |          | Local Interpretation Plot for Tree models            |
| "figs_heatmap"       |     |     |          |      |  X   |      |      |     |          | Feature Importance Heatmap Plot for FIGS models      |
| "llm_summary"        |     |     |          |      |      |      |      |     |    X     | Local Linear Model (LLM) summary plot                |
| "llm_violin"         |     |     |          |      |      |      |      |     |    X     | Local Linear Model (LLM) violin plot                 |
| "llm_pc"             |     |     |          |      |      |      |      |     |    X     | Local Linear Model (LLM) parallel coordinate plot    |
| "xgb1_iv"            |     |     |          |      |      |  X   |      |     |    X     | Optimal binning Information Value plot               |
| "xgb1_woe"           |     |     |          |      |      |  X   |      |     |    X     | Optimal binning Weight of Evidence plot              |
uni_featurestr, default=None

Feature name of the univariate plot, used when show = ‘global_effect_plot’ or ‘glm_coef_plot’, and model is within {GAM, EBM, XGB2, GAMI-Net, ReLU-DNN}.

As show = ‘glm_coef_plot’, uni_feature is used in the following way.

  • if None: plot for all the numeric features

  • if “[CATEGORICAL_FEATURE_NAME]”: plot for a specific categorical feature named

“[CATEGORICAL_FEATURE_NAME]”

bi_featureslist of str, default=None

Feature names of the bivariate plot, used when show = ‘global_effect_plot’, and model is within {EBM, XGB2, GAMI-Net, ReLU-DNN}.

sample_idint, default=None

The index of single data point for local interpretation, used when show = ‘local_fi’, ‘local_ei’ or ‘tree_local’. If None, it will be 0.

tree_idxint, default=None

Useless for this function.

rootint, default=None

The node to start drawing the tree diagram, used when show = ‘tree_global’. If None, it will be 0.

depthint, default=None

The the max depth to show of the tree diagram (counts start from the root node), used when show = ‘tree_global’. If None, it will be 3.

sliced_linebool, default=None

Whether to show sliced line for 2d interpret instead of heatmap, used when show = ‘global_effect_plot’. If None, it will be False.

centeredbool, default=None

Whether to center X (subtract the average) for generating local explanations, used when show = ‘local_fi’ or ‘local_ei’. If None, it will be True.

use_testbool default=None

Whether to use test set to do the interpretation. If True, the test set will be used. Otherwise, the training set will be used. If None, it will be False.

original_scale: bool, default=None

Whether to use original scale of X, used when show = ‘local_fi’, ‘local_ei’, ‘global_effect_plot’, or ‘tree_global’, ‘tree_local’. If None, it will be False.

return_data: bool, default=None

Whether to return the data object. If None, it will be False.

figsizetuple, default=None

Figure size of the plot. If None, it will be (8, 6).

Returns:
TestResult

Diagnose result, only available as return_data = True.

model_save(model: unicode, path: unicode = 'SavedModel.pkl')

Save a PiML-trained model as a pickle file.

Parameters:
modelstr

Model’s registered name.

pathstr

File path to save.

model_train(model=None, name: unicode = None)

Fit interpretable models.

If no parameter is passed into the method, the program will run in low code mode, otherwise high code mode.

Parameters:
modelestimator object, default=None

Estimator to be fitted, which should follow the sklearn style. Below is the list of built-in inherently interpretable models.

| Model                                      | PiML Regressor Class                       | PiML Classifier Class                       |
| ------------------------------------------ | ------------------------------------------ | ------------------------------------------- |
| Generalized Linear Model (GLM)             | `piml.models.GLMRegressor`                 | `piml.models.GLMClassifier`                 |
| Generalized Additive Model (GAM)           | `piml.models.GAMRegressor`                 | `piml.models.GAMClassifier`                 |
| Decision Tree (Tree)                       | `piml.models.TreeRegressor`                | `piml.models.TreeClassifier`                |
| Fast Interpretable Greedy-Tree Sums (FIGS) | `piml.models.FIGSRegressor`                | `piml.models.FIGSClassifier`                |
| Explainable Boosting Machine (EBM)         | `piml.models.ExplainableBoostingRegressor` | `piml.models.ExplainableBoostingClassifier` |
| Xgboost Depth 1 (XGB1)                     | `piml.models.XGB1Regressor`                | `piml.models.XGB1Classifier`                |
| Xgboost Depth 2 (XGB2)                     | `piml.models.XGB2Regressor`                | `piml.models.XGB2Classifier`                |
| GAMI-Net                                   | `piml.models.GAMINetRegressor`             | `piml.models.GAMINetClassifier`             |
| ReLU-DNN                                   | `piml.models.ReluDNNRegressor`             | `piml.models.ReluDNNClassifier`             |
namestr, default=None

Model’s name for registration.

Notes

You can use a method to get all directly supported models in PiML with:

from piml.models import get_all_supported_models
get_all_supported_models()
model_tune(model=None, method=None, parameters=None, n_runs=None, cv=None, test_ratio=None, metric=None, n_jobs=None, random_state=None)

Refit a model with new parameters.

Parameters:
modelstr

Model’s registered name to tune.

method: {‘grid’, ‘randomized’}

Tuning method, support two kind of methods:

  • grid: Exhaustive search over specified parameter values for an estimator.

  • randomized: Randomized search on hyperparameters.

parametersdict

Parameter search space for tuning. For example,

  • grid: {‘n_estimators’: [100, 300, 500], ‘max_depth’: [3, 4, 5]}

  • randomized: {‘n_estimators’: scipy.stats.randint(100, 1000), ‘max_depth’: [3, 4, 5]}

n_runsint, optional

Number of parameter settings that are sampled. Only works when method == ‘randomized’ n_runs trades off runtime vs quality of the solution, by default 10.

metric{‘MSE’, ‘MAE’, ‘R2’, ‘ACC’, ‘AUC’, ‘F1’, ‘LogLoss’, ‘Brier’}, default=None
  • For classification tasks: ‘ACC’, ‘AUC’, ‘F1’, ‘LogLoss’, ‘Brier’.

  • For regression tasks: ‘MSE’, ‘MAE’, ‘R2’.

test_ratiofloat, optional

Test samples ratio for hpo testing. Only works when cv == None.

cvint, cross-validation generator or an iterable, optional

Determines the cross-validation splitting strategy, by default None. Possible inputs for cv are:

None, to use the default 20 percent of data as validation data to test, integer, to specify the number of folds in a (Stratified) KFold, CV splitter, An iterable yielding (train, test) splits as arrays of indices.

For integer/None inputs, if the estimator is a classifier and y is either binary or multiclass, StratifiedKFold is used. In all other cases, KFold is used. These splitters are instantiated with shuffle=False so the splits will be the same across calls.

n_jobsint, optional

Number of jobs to run in parallel, by default None. None means 1 unless in a joblib.parallel_backend context. -1 means using all processors. See Glossary for more details.

random_stateint, optional

Pseudo random number generator state used for random uniform sampling from lists of possible values instead of scipy.stats distributions. Pass an int for reproducible output across multiple function calls, by default None

register(pipeline, name)

Register a pipeline.

Parameters:
pipelinePipeline Object

The pipeline object to register.

namestr

The name of registered pipeline.

segmented_diagnose(show=None, model=None, segment_id=None, segment_method=None, segment_feature=None, segment_bins=None, slice_method=None, slice_features=None, slice_bins=None, metric=None, threshold=None, min_samples=None, show_feature=None, distance_metric=None, psi_buckets=None, use_test=None, original_scale=None, return_data=None, figsize=None)

Test model performance using various diagnostic tools after bucketing.

If no parameter is passed into the method, the program will run in low code mode, otherwise high code mode.

Parameters:
modelstr, default=None

Model’s registered name.

show{‘segment_table’, ‘accuracy_residual’, ‘accuracy_table’, ‘weakspot’, distribution_shift’}, default=None

All the supported model diagnostics methods.

  • ‘segment_table’: show accuracy of each segment segmented by given segmented method and feature.

If segment_feature = None, the table for all features will be output.

  • Accuracy

    • ‘accuracy_residual’: plot residual with respect to a variable. (params: show_feature)

    • ‘accuracy_table’: show accuracy results with a table.

  • WeakSpot

    • ‘weakspot’: show detected weakspot regions. (params: slice_method, slice_features, bins,

    metric, threshold, min_samples, use_test)

  • distribution_shift: plot distribution shift between samples in specific segment and others.

segment_idint, default=None

The id of segment to diagnose, required when show = ‘accuracy_residual’, ‘accuracy_table’, ‘weakspot’ and ‘distribution_shift’.

segment_featurestr, default=None

Feature for bucketing.

segment_method{‘uniform’, ‘quantile’, ‘auto’}

Method for bucketing. - ‘uniform’: All bins have identical widths. - ‘quantile’: All bins have the same number of samples. - ‘auto’: All bins is get from xgb1 model trained by residual.

segment_binsint, default=None
  • The number of bins of bucketing when segment_method == ‘uniform’ or ‘quantile’.

If None, it will be 5.

metric{‘MSE’, ‘MAE’, ‘R2’, ‘ACC’, ‘AUC’, ‘F1’, ‘LogLoss’, ‘Brier’}, default=None

Performance metric, used when show = ‘weakspot’.

  • For classification tasks: ‘ACC’, ‘AUC’, ‘F1’, ‘LogLoss’, ‘Brier’.

  • For regression tasks: ‘MSE’, ‘MAE’, ‘R2’.

slice_method{‘histogram’, ‘tree’}, default=None

The slicing method for WeakSpot tests, used when show =’ weakspot’. If None, it will be ‘histogram’.

  • ‘histogram’: default, use equal-space binning;

  • ‘tree’: fit a decision tree to generate regions.

slice_featureslist, default=None

List of slicing features (at most 2) for Weakspot tests, used when show = ‘weakspot’.

slice_binsint, default=None

The number of bins of slicing, used when show = ‘weaskspot’. If None, it will be 10.

thresholdfloat, default=None

This parameter determines the threshold for a weak region, used when show = ‘weakspot’. If None, it will be 1.1.

min_samplesint, default=None

The minimal sample size for selected regions, used when show = ‘weakspot’. If None, it will be 20.

use_testbool, default=None

Whether to use test data or not, used when show = ‘weakspot’ and ‘accuracy_residual’. If None, it will be False.

show_featurestr, default=None

Decide the feature of interest in the plot (usually the x-axis), used when show = ‘accuracy_residual’, ‘distribution_shift’.

distance_metric{‘PSI’, ‘WD1’, ‘KS’}, default=None

The distance metric used when show = ‘distribution_shift’. If None, it will be ‘PSI’.

  • ‘PSI’: Population stability index.

  • ‘WD1’: Wasserstein distance (1D).

  • ‘KS’: Kolmogorov-Smirnov test.

psi_buckets{‘uniform’, ‘quantile’}, default=None

Bucketing strategy for PSI metric. If None, it will be ‘uniform’.

original_scale: bool, default=None

To use the original scale of X in the plots. If None, it will be False.

return_data: bool, default=None

Whether to return the data object. If None, it will be False.

figsizetuple, default=None

Figure size of the plot. If None, it will be (8, 6).

Returns:
TestResult

Diagnose result, only available as return_data = True.

Examples using piml.Experiment

Build Robust Models with Monotonicity Constraints

Build Robust Models with Monotonicity Constraints