piml.data.outlier_detection
.OneClassSVM¶
- class piml.data.outlier_detection.OneClassSVM(kernel='rbf', degree=3, gamma='scale', coef0=0.0, tol=0.001, nu=0.5, shrinking=True, cache_size=200, verbose=False, max_iter=-1, standardization=True)¶
A wrapper of sklearn’s OneClassSVM for outlier detection.
Estimate the support of a high-dimensional distribution.
The implementation is based on libsvm.
- Parameters:
- kernel{‘linear’, ‘poly’, ‘rbf’, ‘sigmoid’, ‘precomputed’} or callable, default=’rbf’
Specifies the kernel type to be used in the algorithm. If none is given, ‘rbf’ will be used. If a callable is given it is used to precompute the kernel matrix.
- degreeint, default=3
Degree of the polynomial kernel function (‘poly’). Must be non-negative. Ignored by all other kernels.
- gamma{‘scale’, ‘auto’} or float, default=’scale’
Kernel coefficient for ‘rbf’, ‘poly’ and ‘sigmoid’.
if
gamma='scale'
(default) is passed then it uses 1 / (n_features * X.var()) as value of gamma,if ‘auto’, uses 1 / n_features
if float, must be non-negative.
- coef0float, default=0.0
Independent term in kernel function. It is only significant in ‘poly’ and ‘sigmoid’.
- tolfloat, default=1e-3
Tolerance for stopping criterion.
- nufloat, default=0.5
An upper bound on the fraction of training errors and a lower bound of the fraction of support vectors. Should be in the interval (0, 1]. By default 0.5 will be taken.
- shrinkingbool, default=True
Whether to use the shrinking heuristic. See the User Guide.
- cache_sizefloat, default=200
Specify the size of the kernel cache (in MB).
- verbosebool, default=False
Enable verbose output. Note that this setting takes advantage of a per-process runtime setting in libsvm that, if enabled, may not work properly in a multithreaded context.
- max_iterint, default=-1
Hard limit on iterations within solver, or -1 for no limit.
- standardizationbool, default=True
Whether to standardize covariates before running the algorithm.
- Attributes:
- class_weight_ndarray of shape (n_classes,)
Multipliers of parameter C for each class. Computed based on the
class_weight
parameter.- coef_ndarray of shape (1, n_features)
Weights assigned to the features (coefficients in the primal problem). This is only available in the case of a linear kernel.
coef_
is readonly property derived fromdual_coef_
andsupport_vectors_
.- dual_coef_ndarray of shape (1, n_SV)
Coefficients of the support vectors in the decision function.
- fit_status_int
0 if correctly fitted, 1 otherwise (will raise warning)
- intercept_ndarray of shape (1,)
Constant in the decision function.
- n_features_in_int
Number of features seen during
fit
.- feature_names_in_ndarray of shape (
n_features_in_
,) Names of features seen during
fit
. Defined only whenX
has feature names that are all strings.- n_iter_int
Number of iterations run by the optimization routine to fit the model.
- n_support_ndarray of shape (n_classes,), dtype=int32
Number of support vectors for each class.
- offset_float
Offset used to define the decision function from the raw scores. We have the relation: decision_function = score_samples -
offset_
. The offset is the opposite ofintercept_
and is provided for consistency with other outlier detection algorithms.- shape_fit_tuple of int of shape (n_dimensions_of_X,)
Array dimensions of training vector
X
.- support_ndarray of shape (n_SV,)
Indices of support vectors.
- support_vectors_ndarray of shape (n_SV, n_features)
Support vectors.
Methods
decision_function
(X[, scale])Predict raw outliers score of X using the fitted detector.
fit
(X[, y, sample_weight])Fit the model.
predict
([X, scale, threshold])Predict raw outlier indicator.
- decision_function(X, scale=True)¶
- Predict raw outliers score of X using the fitted detector.
For consistency, outliers are assigned with larger anomaly scores.
- Parameters:
- Xnumpy array of shape (n_samples, n_features)
The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.
- scalebool, default=True
If True, scale X before calculating the outlier score.
- Returns:
- outlier_scoresnumpy array of shape (n_samples,)
The anomaly score of the input samples.
- fit(X, y=None, sample_weight=None)¶
Fit the model.
- Parameters:
- Xnp.ndarray of shape (n_samples, n_features)
Data features.
- ynp.ndarray of shape (n_samples,), default=None
Data response.
- sample_weightnp.ndarray of shape (n_samples, ), default=None
Sample weight.
- predict(X=None, scale=True, threshold=0.9)¶
Predict raw outlier indicator.
Normal samples are classified as 1 and outliers are classified as -1.
- Parameters:
- Xnumpy array of shape (n_samples, n_features)
The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.
- scalebool, default=True
If True, scale X before calculating the outlier score.
- thresholdfloat, default=0.9
The quantile threshold of outliers. For example, the samples with outlier scores greater than 90% quantile of the whole sample will be classified as outliers.
- Returns:
- outlier_indicatornumpy array of shape (n_samples,)
The binary array indicating whether each sample is outlier.