piml.data.outlier_detection
.HBOS¶
- class piml.data.outlier_detection.HBOS(n_bins=10, alpha=0.1, tol=0.5, standardization=True)¶
A wrapper of PyOD’s Histogram-based outlier detection (HBOS) for outlier detection.
It is an efficient unsupervised method. It assumes the feature independence and calculates the degree of outlyingness by building histograms.
- Parameters:
- n_binsint or string, optional (default=10)
The number of bins. “auto” uses the birge-rozenblac method for automatic selection of the optimal number of bins for each feature.
- alphafloat in (0, 1), optional (default=0.1)
The regularizer for preventing overflow.
- tolfloat in (0, 1), optional (default=0.5)
The parameter to decide the flexibility while dealing the samples falling outside the bins.
- standardizationbool, default=True
Whether to standardize covariates before running the algorithm.
Methods
decision_function
(X[, scale])Predict raw outliers score of X using the fitted detector.
fit
(X)Fit the outlier detection algorithm.
predict
([X, scale, threshold])Predict raw outlier indicator.
- decision_function(X, scale=True)¶
- Predict raw outliers score of X using the fitted detector.
For consistency, outliers are assigned with larger anomaly scores.
- Parameters:
- Xnumpy array of shape (n_samples, n_features)
The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.
- scalebool, default=True
If True, scale X before calculating the outlier score.
- Returns:
- outlier_scoresnumpy array of shape (n_samples,)
The anomaly score of the input samples.
- fit(X)¶
Fit the outlier detection algorithm.
- Parameters:
- Xnumpy array of shape (n_samples, n_features)
The input samples.
- predict(X=None, scale=True, threshold=0.9)¶
Predict raw outlier indicator.
Normal samples are classified as 1 and outliers are classified as -1.
- Parameters:
- Xnumpy array of shape (n_samples, n_features)
The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.
- scalebool, default=True
If True, scale X before calculating the outlier score.
- thresholdfloat, default=0.9
The quantile threshold of outliers. For example, the samples with outlier scores greater than 90% quantile of the whole sample will be classified as outliers.
- Returns:
- outlier_indicatornumpy array of shape (n_samples,)
The binary array indicating whether each sample is outlier.