API Reference

This is the class and function reference of PiML. Please refer to the user guide for further details, as the class and function raw specifications may not be enough to give full guidelines on their uses.

Data Pipeline

Functions

Experiment.data_loader([data, silent, ...])

Load data for experimentation.

Experiment.data_summary([feature_type, ...])

Summarize basic data statistics.

Experiment.data_prepare([target, ...])

Prepare data for model fitting.

Experiment.data_quality([method, dataset, ...])

Check the data quality and remove outliers.

Experiment.eda([show, uni_feature, ...])

Run exploratory data analysis.

Experiment.feature_select([method, ...])

Select features that are important for modeling.

Experiment.get_data([x, y, sample_weight, ...])

Get the preprocessed train-test data.

Experiment.get_raw_data()

Get the raw train-test data.

Experiment.get_feature_names()

Get the input feature names.

Experiment.get_feature_types()

Get the data type of each input feature.

Experiment.get_target_name()

Get the target feature name.

Outlier Detection Algorithms

data.outlier_detection.PCA

A wrapper of sklearn's PCA for outlier detection.

data.outlier_detection.CBLOF

Cluster-based local outlier factor for outlier detection.

data.outlier_detection.KMeansTree

Recursive unsupervised splitting tree via KMeans (K=2).

data.outlier_detection.IsolationForest

A wrapper of sklearn's Isolation Forest for outlier detection.

data.outlier_detection.OneClassSVM

A wrapper of sklearn's OneClassSVM for outlier detection.

data.outlier_detection.KNN

A wrapper of sklearn's K-Nearest Neighbor-based for outlier detection.

data.outlier_detection.HBOS

A wrapper of PyOD's Histogram-based outlier detection (HBOS) for outlier detection.

data.outlier_detection.ECOD

A wrapper of PyOD's Cumulative Distribution Functions (ECOD) for Unsupervised Outlier Detection.

Model Training

Experiment.model_train([model, name])

Fit interpretable models.

Experiment.model_tune([model, method, ...])

Refit a model with new parameters.

Experiment.model_save(model[, path])

Save a PiML-trained model as a pickle file.

Experiment.model_interpret([model, show, ...])

Interpret inherently interpretable models.

Experiment.make_pipeline([model, task_type, ...])

Customize a pipeline.

Experiment.register(pipeline, name)

Register a pipeline.

Experiment.get_model_list()

Get the list of names of all registered models.

Experiment.get_interpretable_model_list()

Get the list of names of all registered interpretable models.

Experiment.get_model(model)

Get a registered pipeline.

Experiment.get_model_config(model)

Get the configuration of a model.

Experiment.get_leaderboard([metric])

Show the performance comparison table of all trained models.

Experiment.get_leaderboard_registered([metric])

Show the performance comparison table of all registered models.

Post-hoc Explainability

Experiment.model_explain([model, show, ...])

Explain an arbitrary fitted model using post-hoc explanation tools.

Interpretable Models

models.GLMRegressor

A wrapper of generalized linear model regressor in scikit-learn.

models.GLMClassifier

A wrapper of generalized linear model classifier in scikit-learn.

models.GAMRegressor

A wrapper of generalized additive model regressor in pygam.

models.GAMClassifier

A wrapper of generalized additive model classifier in pygam.

models.TreeRegressor

A wrapper of the decision tree regressor in scikit-learn.

models.TreeClassifier

A wrapper of the decision tree classifier in scikit-learn.

models.FIGSRegressor

Fast interpretable greedy-tree sums regressor.

models.FIGSClassifier

Fast interpretable greedy-tree sums classifier.

models.XGB1Classifier

Depth-1 XGBoostClassifier with optimal binning.

models.XGB1Regressor

Depth-1 XGBoostRegressor with optimal binning.

models.XGB2Classifier

Depth-2 XGBoostClassifier.

models.XGB2Regressor

Depth-2 XGBoostRegressor.

models.ExplainableBoostingRegressor

An Explainable Boosting Regressor based on interpret==0.4.2

models.ExplainableBoostingClassifier

An Explainable Boosting Classifier based on interpret==0.4.2

models.GAMINetRegressor

Generalized additive model with pairwise interaction regressor.

models.GAMINetClassifier

Generalized additive model with pairwise interaction classifier.

models.ReluDNNRegressor

Multi-layer perceptron regressor with ReLU activation function.

models.ReluDNNClassifier

Multi-layer perceptron classifier with ReLU activation function.

Outcome Testing

Integrated Functions

Experiment.model_diagnose([model, show, ...])

Test model performance using various diagnostic tools.

Experiment.model_compare([models, show, ...])

Compare the diagnostic results of multiple models.

Experiment.model_fairness([model, show, ...])

Test model fairness.

Experiment.model_fairness_compare([models, ...])

Compare the fairness results of multiple models.

Experiment.model_fairness_solas([model, ...])

Test model fairness based on solas-ai.

Experiment.segmented_diagnose([show, model, ...])

Test model performance using various diagnostic tools after bucketing.

Scored Test Function

test_accuracy_table

Get accuracy result.

test_accuracy_residual

Get marginal residual plot based on a given feature.

test_accuracy_plot

Plot confusion matrix, ROC and Recall-Precision, only supports classifiers.

test_weakspot

Get marginal weakspot result based on a given feature.

test_overfit

Get marginal overfit result based on a given feature.

test_reliability_table

Get empirical coverage and average bandwidth for regression or Brier Loss for classification.

test_reliability_distance

Compare data distance between reliable and unreliable samples.

test_reliability_marginal

Get marginal slicing reliability result based on a given feature.

test_reliability_perf

Get reliability diagram, only for classifiers.

test_reliability_calibration

Get the calibrated predicted probability vs.

test_resilience_perf

Get resilience test result in each step.

test_resilience_distance

Compare data distance between samples in the worst region and the remaining region.

test_resilience_shift_histogram

Compare marginal distribution histogram between the worst region and remaining region.

test_resilience_shift_density

Compare marginal distribution density between the worst region and remaining region.