`piml.data.outlier_detection`.CBLOF¶

class piml.data.outlier_detection.CBLOF(n_clusters=10, clustering_method='kmeans', clustering_threshold=0.1, use_weights=False, standardization=True, random_state=0)¶

Cluster-based local outlier factor for outlier detection.

Parameters:

n_clustersint, default=10

The number of clusters.

clustering_method{‘kmeans’, ‘gmm’}, default=’kmeans’

The base clustering algorithm for performing data clustering.

‘kmeans’: K-Means
‘gmm’: Gaussian mixture model

clustering_thresholdfloat, default=0.1

The threshold of large cluster size.

use_weightsbool, default=False

If set to True, the size of clusters are used as weights in outlier score calculation.

standardizationbool, default=True

Whether to standardize covariates before running the algorithm.

random_stateint, default=0

The random seed.

Attributes:

base_estimator_sklearn estimator object: The KMeans model or Gaussian mixture model.
is_fitted_bool: Indicator of whether the model is fitted.
cluster_centers_np.ndarray: The centers of each cluster.
cluster_sizes_np.ndarray: The number of samples in each cluster.
small_cluster_labels_np.ndarray: The indices of small clusters.
large_cluster_labels_np.ndarray: The indices of large clusters.

Methods

`decision_function`(X[, scale])	Predict raw outliers score of X using the fitted detector.
`fit`(X)	Fit the outlier detection algorithm.
`predict`([X, scale, threshold])	Predict raw outlier indicator.

decision_function(X, scale=True)¶

Predict raw outliers score of X using the fitted detector.: For consistency, outliers are assigned with larger anomaly scores.

Parameters:

Xnumpy array of shape (n_samples, n_features): The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.
scalebool, default=True: If True, scale X before calculating the outlier score.

Returns:

outlier_scoresnumpy array of shape (n_samples,): The anomaly score of the input samples.

fit(X)¶

Fit the outlier detection algorithm.

Parameters:

Xnumpy array of shape (n_samples, n_features): The input samples.

predict(X=None, scale=True, threshold=0.9)¶

Predict raw outlier indicator.

Normal samples are classified as 1 and outliers are classified as -1.

Parameters:

Xnumpy array of shape (n_samples, n_features): The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.
scalebool, default=True: If True, scale X before calculating the outlier score.
thresholdfloat, default=0.9: The quantile threshold of outliers. For example, the samples with outlier scores greater than 90% quantile of the whole sample will be classified as outliers.

Returns:

outlier_indicatornumpy array of shape (n_samples,): The binary array indicating whether each sample is outlier.

Examples using `piml.data.outlier_detection.CBLOF`¶

Data Quality Check

piml.data.outlier_detection.CBLOF¶

Examples using piml.data.outlier_detection.CBLOF¶

`piml.data.outlier_detection`.CBLOF¶

Examples using `piml.data.outlier_detection.CBLOF`¶