piml.data.outlier_detection.ECOD

class piml.data.outlier_detection.ECOD(n_jobs=1, standardization=True)

A wrapper of PyOD’s Cumulative Distribution Functions (ECOD) for Unsupervised Outlier Detection.

ECOD is a parameter-free, highly interpretable outlier detection algorithm based on empirical CDF functions.

Parameters:
n_jobsoptional (default=1)

The number of jobs to run in parallel for both fit and predict. If -1, then the number of jobs is set to the number of cores.

standardizationbool, default=True

Whether to standardize covariates before running the algorithm.

Methods

decision_function(X[, scale])

Predict raw outliers score of X using the fitted detector.

fit(X)

Fit the outlier detection algorithm.

predict([X, scale, threshold])

Predict raw outlier indicator.

decision_function(X, scale=True)
Predict raw outliers score of X using the fitted detector.

For consistency, outliers are assigned with larger anomaly scores.

Parameters:
Xnumpy array of shape (n_samples, n_features)

The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.

scalebool, default=True

If True, scale X before calculating the outlier score.

Returns:
outlier_scoresnumpy array of shape (n_samples,)

The anomaly score of the input samples.

fit(X)

Fit the outlier detection algorithm.

Parameters:
Xnumpy array of shape (n_samples, n_features)

The input samples.

predict(X=None, scale=True, threshold=0.9)

Predict raw outlier indicator.

Normal samples are classified as 1 and outliers are classified as -1.

Parameters:
Xnumpy array of shape (n_samples, n_features)

The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.

scalebool, default=True

If True, scale X before calculating the outlier score.

thresholdfloat, default=0.9

The quantile threshold of outliers. For example, the samples with outlier scores greater than 90% quantile of the whole sample will be classified as outliers.

Returns:
outlier_indicatornumpy array of shape (n_samples,)

The binary array indicating whether each sample is outlier.

Examples using piml.data.outlier_detection.ECOD

Data Quality Check

Data Quality Check