binlearn: Data Binning and Discretization Library
A modern, type-safe Python library for data binning and discretization with comprehensive error handling, sklearn compatibility, and DataFrame support.
🚀 Key Features
- ✨ Multiple Binning Methods
EqualWidthBinning - Equal-width intervals across data range
EqualFrequencyBinning - Equal-frequency (quantile-based) bins
KMeansBinning - K-means clustering-based discretization
GaussianMixtureBinning - Gaussian mixture model clustering-based binning
DBSCANBinning - Density-based clustering for natural groupings
EqualWidthMinimumWeightBinning - Weight-constrained equal-width binning
TreeBinning - Decision tree-based supervised binning for classification and regression
Chi2Binning - Chi-square statistic-based supervised binning for optimal class separation
IsotonicBinning - Isotonic regression-based supervised binning for monotonic relationships
ManualIntervalBinning - Custom interval boundary specification
ManualFlexibleBinning - Mixed interval and singleton bin definitions
SingletonBinning - Creates one bin per unique numeric value
- 🔧 Framework Integration
Pandas DataFrames - Native support with column name preservation
Polars DataFrames - High-performance columnar data support (optional)
NumPy Arrays - Efficient numerical array processing
Scikit-learn Pipelines - Full transformer compatibility
- ⚡ Modern Code Quality
Type Safety - 100% mypy compliance with comprehensive type annotations
Code Quality - 100% ruff compliance with modern Python syntax
Error Handling - Comprehensive validation with helpful error messages and suggestions
Test Coverage - 100% code coverage with 841 comprehensive tests
Documentation - Extensive examples and API documentation
Quick Start
Installation
pip install binlearn
Basic Usage
import numpy as np
import pandas as pd
from binlearn import EqualWidthBinning
# Create sample data
data = pd.DataFrame({
'age': np.random.normal(35, 10, 1000),
'income': np.random.lognormal(10, 0.5, 1000),
'score': np.random.uniform(0, 100, 1000)
})
# Equal-width binning with DataFrame preservation
binner = EqualWidthBinning(n_bins=5, preserve_dataframe=True)
data_binned = binner.fit_transform(data)
print(f"Original shape: {data.shape}")
print(f"Binned shape: {data_binned.shape}")
print(f"Bin edges for age: {binner.bin_edges_['age']}")
Documentation Contents
Getting Started
User Guide
Design Concepts
Development
Additional Information