piml.data.outlier_detection
.CBLOF¶
- class piml.data.outlier_detection.CBLOF(n_clusters=10, clustering_method='kmeans', clustering_threshold=0.1, use_weights=False, standardization=True, random_state=0)¶
Cluster-based local outlier factor for outlier detection.
- Parameters:
- n_clustersint, default=10
The number of clusters.
- clustering_method{‘kmeans’, ‘gmm’}, default=’kmeans’
The base clustering algorithm for performing data clustering.
‘kmeans’: K-Means
‘gmm’: Gaussian mixture model
- clustering_thresholdfloat, default=0.1
The threshold of large cluster size.
- use_weightsbool, default=False
If set to True, the size of clusters are used as weights in outlier score calculation.
- standardizationbool, default=True
Whether to standardize covariates before running the algorithm.
- random_stateint, default=0
The random seed.
- Attributes:
- base_estimator_sklearn estimator object
The KMeans model or Gaussian mixture model.
- is_fitted_bool
Indicator of whether the model is fitted.
- cluster_centers_np.ndarray
The centers of each cluster.
- cluster_sizes_np.ndarray
The number of samples in each cluster.
- small_cluster_labels_np.ndarray
The indices of small clusters.
- large_cluster_labels_np.ndarray
The indices of large clusters.
Methods
decision_function
(X[, scale])Predict raw outliers score of X using the fitted detector.
fit
(X)Fit the outlier detection algorithm.
predict
([X, scale, threshold])Predict raw outlier indicator.
- decision_function(X, scale=True)¶
- Predict raw outliers score of X using the fitted detector.
For consistency, outliers are assigned with larger anomaly scores.
- Parameters:
- Xnumpy array of shape (n_samples, n_features)
The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.
- scalebool, default=True
If True, scale X before calculating the outlier score.
- Returns:
- outlier_scoresnumpy array of shape (n_samples,)
The anomaly score of the input samples.
- fit(X)¶
Fit the outlier detection algorithm.
- Parameters:
- Xnumpy array of shape (n_samples, n_features)
The input samples.
- predict(X=None, scale=True, threshold=0.9)¶
Predict raw outlier indicator.
Normal samples are classified as 1 and outliers are classified as -1.
- Parameters:
- Xnumpy array of shape (n_samples, n_features)
The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.
- scalebool, default=True
If True, scale X before calculating the outlier score.
- thresholdfloat, default=0.9
The quantile threshold of outliers. For example, the samples with outlier scores greater than 90% quantile of the whole sample will be classified as outliers.
- Returns:
- outlier_indicatornumpy array of shape (n_samples,)
The binary array indicating whether each sample is outlier.