6.9. Scored Test

The scored test is a distinctive element of PiML diagnostic testing, specifically tailored for scenarios where the model object is absent, and only the model predictions are provided. This test necessitates solely the input features, target response data, and the corresponding model predictions.

In PiML, the scored test consists of a set of individual functions outside the Experiment workflow. These functions follow a consistent naming convention observed in other tests within the “model diagnose” module. These tests generally share a common prefix of “test”, such as “test_accuracy_table” and “test_accuracy_residual”.

6.9.1. Usage

The scored test covers all the tests in exp.model_diagnose, except for the robustness test. As the robustness test requires the model object to get the prediction of perturbed samples, it does not fit the requirements of the scored test. Here is a list of the supported scored tests:

All the scored tests share the same data inputs, as shown below:

  • x: Input data in the type of numpy array, including all train and test data

  • y: Target data in the type of numpy array, including all train and test data

  • prediction: Prediction result of model to test

  • prediction_proba: Prediction probability of model to test, only for classifiers

  • feature_names: List of feature names of input data, e.g., [‘temperature’, ‘season’]

  • feature_types: List of feature types of input data, e.g., [‘numerical’, ‘categorical’]

  • target_name: Target feature name

  • task_type: Task type, can be ‘regression’ or ‘classification’

  • train_idx: Train samples index

  • test_idx: Test samples index

  • random_state: Random seed

To simplify the input parameters for each scored test function, we can consolidate all data-related parameters into a single dictionary. Then, we can pass the data information and additional parameters to execute different tests. For example, the test_accuracy_residual test shows the residual plot against one feature of interest.

from piml.scored_test import test_accuracy_residual
result = test_accuracy_residual(**data_dict, show_feature='MedInc', figsize=(5, 4))
../../_images/sphx_glr_plot_8_scored_test_reg_001.png

Here, we first pass the data_dict to the function. This test further requires the show_feature parameter, which is the feature of interest. Finally, the figsize parameter controls the size of the figure.

6.9.2. Examples