piml.data.outlier_detection.CBLOF

class piml.data.outlier_detection.CBLOF(n_clusters=10, clustering_method='kmeans', clustering_threshold=0.1, use_weights=False, standardization=True, random_state=0)

Cluster-based local outlier factor for outlier detection.

Parameters:
n_clustersint, default=10

The number of clusters.

clustering_method{‘kmeans’, ‘gmm’}, default=’kmeans’

The base clustering algorithm for performing data clustering.

  • ‘kmeans’: K-Means

  • ‘gmm’: Gaussian mixture model

clustering_thresholdfloat, default=0.1

The threshold of large cluster size.

use_weightsbool, default=False

If set to True, the size of clusters are used as weights in outlier score calculation.

standardizationbool, default=True

Whether to standardize covariates before running the algorithm.

random_stateint, default=0

The random seed.

Attributes:
base_estimator_sklearn estimator object

The KMeans model or Gaussian mixture model.

is_fitted_bool

Indicator of whether the model is fitted.

cluster_centers_np.ndarray

The centers of each cluster.

cluster_sizes_np.ndarray

The number of samples in each cluster.

small_cluster_labels_np.ndarray

The indices of small clusters.

large_cluster_labels_np.ndarray

The indices of large clusters.

Methods

decision_function(X[, scale])

Predict raw outliers score of X using the fitted detector.

fit(X)

Fit the outlier detection algorithm.

predict([X, scale, threshold])

Predict raw outlier indicator.

decision_function(X, scale=True)
Predict raw outliers score of X using the fitted detector.

For consistency, outliers are assigned with larger anomaly scores.

Parameters:
Xnumpy array of shape (n_samples, n_features)

The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.

scalebool, default=True

If True, scale X before calculating the outlier score.

Returns:
outlier_scoresnumpy array of shape (n_samples,)

The anomaly score of the input samples.

fit(X)

Fit the outlier detection algorithm.

Parameters:
Xnumpy array of shape (n_samples, n_features)

The input samples.

predict(X=None, scale=True, threshold=0.9)

Predict raw outlier indicator.

Normal samples are classified as 1 and outliers are classified as -1.

Parameters:
Xnumpy array of shape (n_samples, n_features)

The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.

scalebool, default=True

If True, scale X before calculating the outlier score.

thresholdfloat, default=0.9

The quantile threshold of outliers. For example, the samples with outlier scores greater than 90% quantile of the whole sample will be classified as outliers.

Returns:
outlier_indicatornumpy array of shape (n_samples,)

The binary array indicating whether each sample is outlier.

Examples using piml.data.outlier_detection.CBLOF

Data Quality Check

Data Quality Check