logo
  • Install
  • User Guide
  • API
  • Examples
logo
  • User Guide
    • 1. Introduction
    • 2. Data Pipeline
      • 2.1. Data Load
      • 2.2. Data Summary
      • 2.3. Data Preparation
      • 2.4. Exploratory Analysis
      • 2.5. Data Quality (Integrity Check)
      • 2.6. Data Quality (Outlier Detection)
      • 2.7. Data Quality (Drift Test)
      • 2.8. Feature Selection
    • 3. Model Train and Tune
    • 4. Post-hoc Explainability
    • 5. Interpretable Models
    • 6. Diagnostic Suite
    • 7. Model Comparison
    • 8. Case Studies

2. Data Pipeline¶

  • 2.1. Data Load
    • 2.1.1. Built-in Dataset
    • 2.1.2. External Dataset (csv files)
    • 2.1.3. External Dataset (Spark file)
    • 2.1.4. Examples
  • 2.2. Data Summary
    • 2.2.1. Summary Statistics
    • 2.2.2. Feature Manipulation
    • 2.2.3. Examples
  • 2.3. Data Preparation
    • 2.3.1. Basic Settings
    • 2.3.2. Train-test Splits
    • 2.3.3. Examples
  • 2.4. Exploratory Analysis
    • 2.4.1. Univariate Plots
    • 2.4.2. Bivariate Plots
    • 2.4.3. Multivariate Plots
    • 2.4.4. Examples
  • 2.5. Data Quality (Integrity Check)
    • 2.5.1. Single-column Checks
    • 2.5.2. Duplicated Samples
    • 2.5.3. Highly correlated features
    • 2.5.4. Examples
  • 2.6. Data Quality (Outlier Detection)
    • 2.6.1. Methodology
    • 2.6.2. Analysis and Comparison
    • 2.6.3. Examples
  • 2.7. Data Quality (Drift Test)
    • 2.7.1. Marginal Distribution Drift
    • 2.7.2. Energy Distance
    • 2.7.3. Examples
  • 2.8. Feature Selection
    • 2.8.1. Correlations
    • 2.8.2. Distance Correlation
    • 2.8.3. Use of Feature Importance
    • 2.8.4. Randomized Conditional Independence Test
    • 2.8.5. Examples
© Copyright 2022-, PiML-Toolbox authors. Show this page source