Register Arbitrary Models ================================= In additon to sklearn and H2O models, we also support to test models in arbitrary format, as long as it can provide a predict function. Train and Register Models ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Below, we simulate a simple binary classification dataset and then fit a GLM model using `statsmodels` package. As this model is neither sklearn nor H2O models, we use it here to demonstrate how to register arbitrary machine learning models. .. jupyter-input:: import numpy as np import statsmodels.api as sm x = np.random.uniform(-1, 1, size=(1000, 2)) y = (np.sum(x, axis=1) + np.random.normal(0, 0.1, size=(1000,))) > 0.0 glm_binom = sm.GLM(y, x, family=sm.families.Binomial()) glm_results = glm_binom.fit() Define Wrapper Predict Functions ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Next, we need to write a wrapper function for making predictions using the model. As this is a binary classification task, we need both `predict_proba` and `predict` functions. Both of them takes covariates `X` as input, which is expected to be a numpy array of size (n, p). The output of `predict_proba` should be a numpy array of size (n, 2), which is the predicted probability of each sample. The `predict` function outputs the final predicted label, which is of shape (n, ). .. jupyter-input:: def predict_proba_func(X): proba = glm_binom.predict(glm_results.params, exog=X) return np.vstack([1 - proba, proba]).T def predict_func(X): proba = glm_binom.predict(glm_results.params, exog=X) return proba > 0.5 Register the predict Functions ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ As the predict functions have been prepared, the next step is to call the `make_pipeline` function and PiML would further wrap it as a sklearn style model estimator, as shown below. Finally, the pipeline can be registered and all the tests in PiML can be used. .. jupyter-input:: from piml import Experiment exp = Experiment(highcode_only=True) pipeline = exp.make_pipeline(predict_func=predict_func, predict_proba_func=predict_proba_func, task_type="classification", train_x=x[:800], train_y=y[:800], test_x=x[800:], test_y=y[800:], feature_names=["X0", "X1"], target_name="Y") exp.register(pipeline, "Statsmodels-GLM") Run Diagnostic Tests ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ As a model is registered, then all the tests and explanation tools in PiML can be used. For example, .. jupyter-input:: exp.model_explain(model="Statsmodels-GLM", show="ale", uni_feature="X0", figsize=(5, 4)) .. figure:: ../../auto_examples/1_train/images/sphx_glr_plot_2_register_2_arbitrary_002.png :target: ../../auto_examples/1_train/plot_2_register_2_arbitrary.html :align: left Examples ^^^^^^^^^^^^^^^^^^ .. topic:: Example 2: * :ref:`sphx_glr_auto_examples_1_train_plot_2_register_2_arbitrary.py`