piml.data.outlier_detection
.ECOD¶
- class piml.data.outlier_detection.ECOD(n_jobs=1, standardization=True)¶
A wrapper of PyOD’s Cumulative Distribution Functions (ECOD) for Unsupervised Outlier Detection.
ECOD is a parameter-free, highly interpretable outlier detection algorithm based on empirical CDF functions.
- Parameters:
- n_jobsoptional (default=1)
The number of jobs to run in parallel for both
fit
andpredict
. If -1, then the number of jobs is set to the number of cores.- standardizationbool, default=True
Whether to standardize covariates before running the algorithm.
Methods
decision_function
(X[, scale])Predict raw outliers score of X using the fitted detector.
fit
(X)Fit the outlier detection algorithm.
predict
([X, scale, threshold])Predict raw outlier indicator.
- decision_function(X, scale=True)¶
- Predict raw outliers score of X using the fitted detector.
For consistency, outliers are assigned with larger anomaly scores.
- Parameters:
- Xnumpy array of shape (n_samples, n_features)
The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.
- scalebool, default=True
If True, scale X before calculating the outlier score.
- Returns:
- outlier_scoresnumpy array of shape (n_samples,)
The anomaly score of the input samples.
- fit(X)¶
Fit the outlier detection algorithm.
- Parameters:
- Xnumpy array of shape (n_samples, n_features)
The input samples.
- predict(X=None, scale=True, threshold=0.9)¶
Predict raw outlier indicator.
Normal samples are classified as 1 and outliers are classified as -1.
- Parameters:
- Xnumpy array of shape (n_samples, n_features)
The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.
- scalebool, default=True
If True, scale X before calculating the outlier score.
- thresholdfloat, default=0.9
The quantile threshold of outliers. For example, the samples with outlier scores greater than 90% quantile of the whole sample will be classified as outliers.
- Returns:
- outlier_indicatornumpy array of shape (n_samples,)
The binary array indicating whether each sample is outlier.