piml.data.outlier_detection.OneClassSVM

class piml.data.outlier_detection.OneClassSVM(kernel='rbf', degree=3, gamma='scale', coef0=0.0, tol=0.001, nu=0.5, shrinking=True, cache_size=200, verbose=False, max_iter=-1, standardization=True)

A wrapper of sklearn’s OneClassSVM for outlier detection.

Estimate the support of a high-dimensional distribution.

The implementation is based on libsvm.

Parameters:
kernel{‘linear’, ‘poly’, ‘rbf’, ‘sigmoid’, ‘precomputed’} or callable, default=’rbf’

Specifies the kernel type to be used in the algorithm. If none is given, ‘rbf’ will be used. If a callable is given it is used to precompute the kernel matrix.

degreeint, default=3

Degree of the polynomial kernel function (‘poly’). Must be non-negative. Ignored by all other kernels.

gamma{‘scale’, ‘auto’} or float, default=’scale’

Kernel coefficient for ‘rbf’, ‘poly’ and ‘sigmoid’.

  • if gamma='scale' (default) is passed then it uses 1 / (n_features * X.var()) as value of gamma,

  • if ‘auto’, uses 1 / n_features

  • if float, must be non-negative.

coef0float, default=0.0

Independent term in kernel function. It is only significant in ‘poly’ and ‘sigmoid’.

tolfloat, default=1e-3

Tolerance for stopping criterion.

nufloat, default=0.5

An upper bound on the fraction of training errors and a lower bound of the fraction of support vectors. Should be in the interval (0, 1]. By default 0.5 will be taken.

shrinkingbool, default=True

Whether to use the shrinking heuristic. See the User Guide.

cache_sizefloat, default=200

Specify the size of the kernel cache (in MB).

verbosebool, default=False

Enable verbose output. Note that this setting takes advantage of a per-process runtime setting in libsvm that, if enabled, may not work properly in a multithreaded context.

max_iterint, default=-1

Hard limit on iterations within solver, or -1 for no limit.

standardizationbool, default=True

Whether to standardize covariates before running the algorithm.

Attributes:
class_weight_ndarray of shape (n_classes,)

Multipliers of parameter C for each class. Computed based on the class_weight parameter.

coef_ndarray of shape (1, n_features)

Weights assigned to the features (coefficients in the primal problem). This is only available in the case of a linear kernel.

coef_ is readonly property derived from dual_coef_ and support_vectors_.

dual_coef_ndarray of shape (1, n_SV)

Coefficients of the support vectors in the decision function.

fit_status_int

0 if correctly fitted, 1 otherwise (will raise warning)

intercept_ndarray of shape (1,)

Constant in the decision function.

n_features_in_int

Number of features seen during fit.

feature_names_in_ndarray of shape (n_features_in_,)

Names of features seen during fit. Defined only when X has feature names that are all strings.

n_iter_int

Number of iterations run by the optimization routine to fit the model.

n_support_ndarray of shape (n_classes,), dtype=int32

Number of support vectors for each class.

offset_float

Offset used to define the decision function from the raw scores. We have the relation: decision_function = score_samples - offset_. The offset is the opposite of intercept_ and is provided for consistency with other outlier detection algorithms.

shape_fit_tuple of int of shape (n_dimensions_of_X,)

Array dimensions of training vector X.

support_ndarray of shape (n_SV,)

Indices of support vectors.

support_vectors_ndarray of shape (n_SV, n_features)

Support vectors.

Methods

decision_function(X[, scale])

Predict raw outliers score of X using the fitted detector.

fit(X[, y, sample_weight])

Fit the model.

predict([X, scale, threshold])

Predict raw outlier indicator.

decision_function(X, scale=True)
Predict raw outliers score of X using the fitted detector.

For consistency, outliers are assigned with larger anomaly scores.

Parameters:
Xnumpy array of shape (n_samples, n_features)

The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.

scalebool, default=True

If True, scale X before calculating the outlier score.

Returns:
outlier_scoresnumpy array of shape (n_samples,)

The anomaly score of the input samples.

fit(X, y=None, sample_weight=None)

Fit the model.

Parameters:
Xnp.ndarray of shape (n_samples, n_features)

Data features.

ynp.ndarray of shape (n_samples,), default=None

Data response.

sample_weightnp.ndarray of shape (n_samples, ), default=None

Sample weight.

predict(X=None, scale=True, threshold=0.9)

Predict raw outlier indicator.

Normal samples are classified as 1 and outliers are classified as -1.

Parameters:
Xnumpy array of shape (n_samples, n_features)

The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.

scalebool, default=True

If True, scale X before calculating the outlier score.

thresholdfloat, default=0.9

The quantile threshold of outliers. For example, the samples with outlier scores greater than 90% quantile of the whole sample will be classified as outliers.

Returns:
outlier_indicatornumpy array of shape (n_samples,)

The binary array indicating whether each sample is outlier.

Examples using piml.data.outlier_detection.OneClassSVM

Data Quality Check

Data Quality Check