piml.models
.ExplainableBoostingClassifier¶
- class piml.models.ExplainableBoostingClassifier(feature_names: Optional[Sequence[Union[None, str]]] = None, feature_types: Optional[Sequence[Union[None, str, Sequence[str], Sequence[float]]]] = None, max_bins: int = 256, max_interaction_bins: int = 32, interactions: Optional[Union[int, float, Sequence[Union[int, str, Sequence[Union[int, str]]]]]] = 10, exclude: Optional[Sequence[Union[int, str, Sequence[Union[int, str]]]]] = [], validation_size: Optional[Union[int, float]] = 0.15, outer_bags: int = 8, inner_bags: Optional[int] = 0, learning_rate: float = 0.01, greediness: Optional[float] = 0.0, smoothing_rounds: Optional[int] = 0, max_rounds: Optional[int] = 5000, early_stopping_rounds: Optional[int] = 50, early_stopping_tolerance: Optional[float] = 0.0001, min_samples_leaf: Optional[int] = 2, max_leaves: int = 3, objective: unicode = 'log_loss', n_jobs: Optional[int] = -2, random_state: Optional[int] = 42)¶
An Explainable Boosting Classifier based on interpret==0.4.2
- Parameters:
- feature_nameslist of str, default=None
List of feature names.
- feature_typeslist of FeatureType, default=None
List of feature types. FeatureType can be ‘numerical’ or ‘categorical’.
- max_binsint, default=256
Max number of bins per feature for the main effects stage.
- max_interaction_binsint, default=32
Max number of bins per feature for interaction terms.
- interactionsint, float, or list of tuples of feature indices, default=10
Interaction terms to be included in the model. Options are:
Integer (1 <= interactions): Count of interactions to be automatically selected
Percentage (interactions < 1.0): Determine the integer count of interactions by multiplying the number of features by this percentage
List of tuples: The tuples contain the indices of the features within the additive term
- exclude‘mains’ or list of tuples of feature indices|names, default=[]
Features or terms to be excluded.
- validation_sizeint or float, default=0.15
Validation set size. Used for early stopping during boosting, and is needed to create outer bags.
Integer (1 <= validation_size): Count of samples to put in the validation sets
Percentage (validation_size < 1.0): Percentage of the data to put in the validation sets
0: Turns off early stopping. Outer bags have no utility. Error bounds will be eliminated
- outer_bagsint, default=8
Number of outer bags. Outer bags are used to generate error bounds and help with smoothing the graphs.
- inner_bagsint, default=0
Number of inner bags. 0 turns off inner bagging.
- learning_ratefloat, default=0.01
Learning rate for boosting.
- greedinessfloat, default=0.0
Percentage of rounds where boosting is greedy instead of round-robin. Greedy rounds are intermixed with cyclic rounds.
- smoothing_roundsint, default=0
Number of initial highly regularized rounds to set the basic shape of the main effect feature graphs.
- max_roundsint, default=5000
Total number of boosting rounds with n_terms boosting steps per round.
- early_stopping_roundsint, default=50
Number of rounds with no improvement to trigger early stopping. 0 turns off early stopping and boosting will occur for exactly max_rounds.
- early_stopping_tolerancefloat, default=1e-4
Tolerance that dictates the smallest delta required to be considered an improvement.
- min_samples_leafint, default=2
Minimum number of samples allowed in the leaves.
- max_leavesint, default=3
Maximum number of leaves allowed in each tree.
- objectivestr, default=”log_loss”
The objective to optimize.
- n_jobsint, default=-2
Number of jobs to run in parallel. Negative integers are interpreted as following joblib’s formula (n_cpus + 1 + n_jobs), just like scikit-learn. Eg: -2 means using all threads except 1.
- random_stateint or None, default=0
Random state. None uses device_random and generates non-repeatable sequences.
- Attributes:
- classes_array of bool, int, or unicode with shape
(n_classes,)
The class labels.
- n_features_in_int
Number of features.
- feature_names_in_List of str
Resolved feature names. Names can come from feature_names, X, or be auto-generated.
- feature_types_in_List of str
Resolved feature types. Can be: ‘continuous’ or ‘nominal’.
- bins_List[Union[List[Dict[str, int]], List[array of float with shape
(n_cuts,)
]]] Per-feature list that defines how to bin each feature. Each feature in the list contains a list of binning resolutions. The first item in the binning resolution list is for binning main effect features. If there are more items in the binning resolution list, they define the binning for successive levels of resolutions. The item at index 1, if it exists, defines the binning for pairs. The last binning resolution defines the bins for all successive interaction levels. If the binning resolution list contains dictionaries, then the feature is either a ‘nominal’ or ‘ordinal’ categorical. If the binning resolution list contains arrays, then the feature is ‘continuous’ and the arrays will contain float cut points that separate continuous values into bins.
- feature_bounds_array of float with shape
(n_features, 2)
min/max bounds for each feature. feature_bounds_[feature_index, 0] is the min value of the feature and feature_bounds_[feature_index, 1] is the max value of the feature. Categoricals have min & max values of NaN.
- histogram_edges_List of None or array of float with shape
(n_hist_edges,)
Per-feature list of the histogram edges. Categorical features contain None within the List at their feature index.
- histogram_weights_List of array of float with shape
(n_hist_bins,)
Per-feature list of the total sample weights within each feature’s histogram bins.
- unique_val_counts_array of int with shape
(n_features,)
Per-feature count of the number of unique feature values.
- term_features_List of tuples of feature indices
Additive terms used in the model and their component feature indices.
- term_names_List of str
List of term names.
- bin_weights_List of array of float with shape
(n_feature0_bins, ..., n_featureN_bins)
Per-term list of the total sample weights in each term’s tensor bins.
- bagged_scores_List of array of float with shape
(n_outer_bags, n_feature0_bins, ..., n_featureN_bins, n_classes)
or(n_outer_bags, n_feature0_bins, ..., n_featureN_bins)
Per-term list of the bagged model scores. The last dimension of length n_classes is dropped for binary classification.
- term_scores_List of array of float with shape
(n_feature0_bins, ..., n_featureN_bins, n_classes)
or(n_feature0_bins, ..., n_featureN_bins)
Per-term list of the model scores. The last dimension of length n_classes is dropped for binary classification.
- standard_deviations_List of array of float with shape
(n_feature0_bins, ..., n_featureN_bins, n_classes)
or(n_feature0_bins, ..., n_featureN_bins)
Per-term list of the standard deviations of the bagged model scores. The last dimension of length n_classes is dropped for binary classification.
- link_str
Link function used to convert the predictions or targets into linear space additive scores and vice versa via the inverse link. Possible values include: “custom_classification”, “logit”, “probit”, “cloglog”, “loglog”, “cauchit”
- link_param_float
Float value that can be used by the link function. For classification it is only used by “custom_classification”.
- bag_weights_array of float with shape
(n_outer_bags,)
Per-bag record of the total weight within each bag.
- breakpoint_iteration_array of int with shape
(n_stages, n_outer_bags)
The number of boosting rounds performed within each stage until either early stopping, or the max_rounds was reached. Normally, the count of main effects boosting rounds will be in breakpoint_iteration_[0], and the count of interaction boosting rounds will be in breakpoint_iteration_[1].
- intercept_array of float with shape
(n_classes,)
or(1,)
Intercept of the model. Binary classification is shape
(1,)
, and multiclass is shape(n_classes,)
.
- classes_array of bool, int, or unicode with shape
Methods
decision_function
(X[, init_score])Returns numpy array of raw predicted value before softmax.
fit
(X, y[, sample_weight, bags, init_score])Fits model to provided samples.
Get metadata routing of this object.
get_params
([deep])Get parameters for this estimator.
Interpret the model using functional ANOVA.
partial_dependence
(fidx, X)Partial dependence of given effect index.
predict
(X[, init_score])Predict function.
predict_proba
(X[, init_score])Predict probability function.
score
(X, y[, sample_weight])Return the mean accuracy on the given test data and labels.
set_params
(**params)Set the parameters of this estimator.
set_score_request
(*[, sample_weight])Request metadata passed to the
score
method.- decision_function(X, init_score=None)¶
Returns numpy array of raw predicted value before softmax.
- Parameters:
- Xnp.ndarray of shape (n_samples, n_features)
Data features.
- init_score: np.ndarray of shape (n_samples, ), default=None
Either a model that can generate scores or per-sample initialization score. If samples scores it should be the same length as X.
- Returns:
- prednp.ndarray of shape (n_samples, )
numpy array of predicted class values.
- fit(X, y, sample_weight=None, bags=None, init_score=None)¶
Fits model to provided samples.
- Parameters:
- Xnp.ndarray of shape (n_samples, n_features)
Data features.
- ynp.ndarray of shape (n_samples, )
Target response.
- sample_weightnp.ndarray of shape (n_samples, )
Sample weight.
- bagslist of int, default=None
The first dimension should have length equal to the number of outer_bags. The second dimension should have length equal to the number of samples. The contents should be +1 for training, -1 for validation, and 0 if not included in the bag. Numbers other than 1 indicate how many times to include the sample in the training or validation sets.
- init_scorenp.ndarray of shape (n_samples, ), default=None
Either a model that can generate scores or per-sample initialization score. If samples scores it should be the same length as X.
- Returns:
- self
- get_metadata_routing()¶
Get metadata routing of this object.
Please check User Guide on how the routing mechanism works.
- Returns:
- routingMetadataRequest
A
MetadataRequest
encapsulating routing information.
- get_params(deep=True)¶
Get parameters for this estimator.
- Parameters:
- deepbool, default=True
If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns:
- paramsdict
Parameter names mapped to their values.
- parse_model()¶
Interpret the model using functional ANOVA.
- Returns:
- an instance of FANOVAINterpreter
The interpretation results.
- partial_dependence(fidx, X)¶
Partial dependence of given effect index.
- Parameters:
- fidxtuple of int
The main effect or pairwise interaction feature index.
- Xnp.ndarray of shape (n_samples, n_features)
Data features.
- Returns:
- prednp.ndarray of shape (n_samples, )
numpy array of predicted class values.
- predict(X, init_score=None)¶
Predict function.
- Parameters:
- Xnp.ndarray
Data features.
- init_scorenp.ndarray of shape (n_samples, ), default=None
Either a model that can generate scores or per-sample initialization score. If samples scores it should be the same length as X.
- Returns:
- np.ndarray
The predicted label.
- predict_proba(X, init_score=None)¶
Predict probability function.
- Parameters:
- Xnp.ndarray
Data features.
- init_scorenp.ndarray of shape (n_samples, ), default=None
Either a model that can generate scores or per-sample initialization score. If samples scores it should be the same length as X.
- Returns:
- np.ndarray
The predicted probability.
- score(X, y, sample_weight=None)¶
Return the mean accuracy on the given test data and labels.
In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted.
- Parameters:
- Xarray-like of shape (n_samples, n_features)
Test samples.
- yarray-like of shape (n_samples,) or (n_samples, n_outputs)
True labels for
X
.- sample_weightarray-like of shape (n_samples,), default=None
Sample weights.
- Returns:
- scorefloat
Mean accuracy of
self.predict(X)
w.r.t.y
.
- set_params(**params)¶
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline
). The latter have parameters of the form<component>__<parameter>
so that it’s possible to update each component of a nested object.- Parameters:
- **paramsdict
Estimator parameters.
- Returns:
- selfestimator instance
Estimator instance.
- set_score_request(*, sample_weight: Union[bool, None, str] = '$UNCHANGED$') ExplainableBoostingClassifier ¶
Request metadata passed to the
score
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed toscore
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it toscore
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.New in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline
. Otherwise it has no effect.- Parameters:
- sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
sample_weight
parameter inscore
.
- Returns:
- selfobject
The updated object.
Examples using piml.models.ExplainableBoostingClassifier
¶
EBM Classification (Taiwan Credit)