binlearn: Data Binning and Discretization Library

PyPI Version Python Versions Build Status Code Coverage License Documentation Status

A modern, type-safe Python library for data binning and discretization with comprehensive error handling, sklearn compatibility, and DataFrame support.

🚀 Key Features

Multiple Binning Methods
  • EqualWidthBinning - Equal-width intervals across data range

  • EqualFrequencyBinning - Equal-frequency (quantile-based) bins

  • KMeansBinning - K-means clustering-based discretization

  • GaussianMixtureBinning - Gaussian mixture model clustering-based binning

  • DBSCANBinning - Density-based clustering for natural groupings

  • EqualWidthMinimumWeightBinning - Weight-constrained equal-width binning

  • TreeBinning - Decision tree-based supervised binning for classification and regression

  • Chi2Binning - Chi-square statistic-based supervised binning for optimal class separation

  • IsotonicBinning - Isotonic regression-based supervised binning for monotonic relationships

  • ManualIntervalBinning - Custom interval boundary specification

  • ManualFlexibleBinning - Mixed interval and singleton bin definitions

  • SingletonBinning - Creates one bin per unique numeric value

🔧 Framework Integration
  • Pandas DataFrames - Native support with column name preservation

  • Polars DataFrames - High-performance columnar data support (optional)

  • NumPy Arrays - Efficient numerical array processing

  • Scikit-learn Pipelines - Full transformer compatibility

Modern Code Quality
  • Type Safety - 100% mypy compliance with comprehensive type annotations

  • Code Quality - 100% ruff compliance with modern Python syntax

  • Error Handling - Comprehensive validation with helpful error messages and suggestions

  • Test Coverage - 100% code coverage with 841 comprehensive tests

  • Documentation - Extensive examples and API documentation

Quick Start

Installation

pip install binlearn

Basic Usage

import numpy as np
import pandas as pd
from binlearn import EqualWidthBinning

# Create sample data
data = pd.DataFrame({
    'age': np.random.normal(35, 10, 1000),
    'income': np.random.lognormal(10, 0.5, 1000),
    'score': np.random.uniform(0, 100, 1000)
})

# Equal-width binning with DataFrame preservation
binner = EqualWidthBinning(n_bins=5, preserve_dataframe=True)
data_binned = binner.fit_transform(data)

print(f"Original shape: {data.shape}")
print(f"Binned shape: {data_binned.shape}")
print(f"Bin edges for age: {binner.bin_edges_['age']}")

Documentation Contents

Indices and tables