Chi2Binning
===========

.. currentmodule:: binlearn.methods

.. autoclass:: Chi2Binning
   :members:
   :inherited-members:
   :show-inheritance:

Overview
--------

``Chi2Binning`` is a supervised discretization method that uses the chi-square statistic to find optimal 
split points. The method iteratively merges adjacent intervals to minimize the chi-square statistic, 
creating bins that maximize the association between features and target variables.

This approach is particularly effective for:

* **Classification tasks** where bins need to separate different classes effectively
* **Categorical target variables** with clear class boundaries
* **Feature engineering** for improving downstream classification performance
* **Data preparation** where maintaining class relationships is crucial

Key Features
------------

* **Supervised Learning**: Uses target variable information for optimal binning
* **Statistical Foundation**: Based on chi-square test of independence
* **Iterative Optimization**: Merges intervals to minimize chi-square statistic
* **Classification Focus**: Optimized for categorical target variables
* **Automatic Stopping**: Uses significance levels to determine optimal number of bins
* **Sklearn Compatibility**: Full transformer interface with fit/transform methods
* **DataFrame Support**: Preserves pandas/polars column names and structure

Basic Usage
-----------

.. code-block:: python

   import numpy as np
   import pandas as pd
   from binlearn.methods import Chi2Binning
   from sklearn.datasets import make_classification
   
   # Create sample classification data
   X, y = make_classification(
       n_samples=1000, 
       n_features=3, 
       n_classes=3,
       n_redundant=0,
       random_state=42
   )
   
   # Apply chi-square binning
   binner = Chi2Binning(
       max_bins=10,
       min_bins=3,
       alpha=0.05
   )
   
   # Method 1: Using fit with X and y (sklearn style)
   binner.fit(X, y)
   X_binned = binner.transform(X)
   
   print(f"Original shape: {X.shape}")
   print(f"Binned shape: {X_binned.shape}")
   print(f"Bins for feature 0: {len(binner.bin_edges_[0]) - 1}")

DataFrame Example with Target Column
------------------------------------

.. code-block:: python

   # Create DataFrame with target column
   df = pd.DataFrame(X, columns=['feature1', 'feature2', 'feature3'])
   df['target'] = y
   
   # Method 2: Using guidance_columns (binlearn style)
   binner = Chi2Binning(
       guidance_columns=['target'],  # Use target column for guidance
       max_bins=8,
       min_bins=2,
       preserve_dataframe=True
   )
   
   # Fit and transform the entire DataFrame
   df_binned = binner.fit_transform(df)
   
   print(f"Bin edges for feature1: {binner.bin_edges_['feature1']}")
   print(f"Bin edges for feature2: {binner.bin_edges_['feature2']}")
   print(f"Target column preserved: {'target' in df_binned.columns}")

Regression Example
------------------

.. code-block:: python

   from sklearn.datasets import make_regression
   
   # Create regression data and discretize target
   X_reg, y_reg = make_regression(n_samples=1000, n_features=2, random_state=42)
   
   # Discretize continuous target for chi-square binning
   y_discrete = pd.cut(y_reg, bins=5, labels=['very_low', 'low', 'medium', 'high', 'very_high'])
   
   binner = Chi2Binning(
       max_bins=6,
       min_bins=3,
       alpha=0.01  # More stringent significance level
   )
   
   binner.fit(X_reg, y_discrete)
   X_reg_binned = binner.transform(X_reg)
   
   print(f"Regression bins created: {[len(edges)-1 for edges in binner.bin_edges_.values()]}")

Advanced Configuration
----------------------

.. code-block:: python

   # Fine-tuned chi-square binning for specific requirements
   
   # Conservative binning (fewer, more significant bins)
   conservative_binner = Chi2Binning(
       max_bins=15,
       min_bins=2,
       alpha=0.001,        # Very stringent significance level
       initial_bins=20     # Start with more initial bins
   )
   
   # Liberal binning (more bins, less stringent)
   liberal_binner = Chi2Binning(
       max_bins=25,
       min_bins=5,
       alpha=0.1,          # More permissive significance level
       initial_bins=30
   )

Comparison with Other Methods
-----------------------------

.. code-block:: python

   from binlearn.methods import EqualWidthBinning, TreeBinning
   from sklearn.metrics import accuracy_score
   from sklearn.ensemble import RandomForestClassifier
   from sklearn.model_selection import train_test_split
   
   # Split data
   X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
   
   # Compare different binning methods
   binners = {
       'chi2': Chi2Binning(max_bins=8, alpha=0.05),
       'equal_width': EqualWidthBinning(n_bins=8),
       'supervised': SupervisedBinning(max_depth=3, min_samples_leaf=50)
   }
   
   results = {}
   classifier = RandomForestClassifier(random_state=42, n_estimators=100)
   
   for name, binner in binners.items():
       # Fit binner and transform data
       binner.fit(X_train, y_train)
       X_train_binned = binner.transform(X_train)
       X_test_binned = binner.transform(X_test)
       
       # Train classifier on binned data
       classifier.fit(X_train_binned, y_train)
       y_pred = classifier.predict(X_test_binned)
       
       results[name] = accuracy_score(y_test, y_pred)
       print(f"{name}: {results[name]:.3f} accuracy")

Parameter Tuning
-----------------

.. code-block:: python

   # Grid search for optimal parameters
   from sklearn.model_selection import GridSearchCV
   from sklearn.pipeline import Pipeline
   
   # Create pipeline with chi-square binning
   pipeline = Pipeline([
       ('binning', Chi2Binning()),
       ('classifier', RandomForestClassifier(random_state=42))
   ])
   
   # Parameter grid for binning
   param_grid = {
       'binning__max_bins': [5, 8, 12, 15],
       'binning__alpha': [0.001, 0.01, 0.05, 0.1],
       'binning__initial_bins': [10, 15, 20]
   }
   
   # Grid search
   grid_search = GridSearchCV(
       pipeline, 
       param_grid, 
       cv=5, 
       scoring='accuracy',
       n_jobs=-1
   )
   
   grid_search.fit(X_train, y_train)
   print(f"Best parameters: {grid_search.best_params_}")
   print(f"Best cross-validation score: {grid_search.best_score_:.3f}")

Parameter Guide
---------------

**max_bins** (int, default=10)
    Maximum number of bins to create. The algorithm will never exceed this number:
    
    * Higher values: Allow more granular binning
    * Lower values: Force more aggressive merging
    * Consider your downstream model's capacity

**min_bins** (int, default=2)
    Minimum number of bins to maintain. Prevents over-merging:
    
    * Higher values: Ensure sufficient granularity
    * Lower values: Allow aggressive simplification
    * Should be at least 2 for meaningful binning

**alpha** (float, default=0.05)
    Significance level for chi-square test. Lower values are more stringent:
    
    * Lower values (0.001): More conservative, fewer bins
    * Higher values (0.1): More liberal, more bins
    * Common values: 0.05 (standard), 0.01 (conservative)

**initial_bins** (int, default=10)
    Number of initial equal-width bins before merging:
    
    * Higher values: More potential split points to consider
    * Lower values: Faster computation but less flexibility
    * Should be >= max_bins

Statistical Background
----------------------

The chi-square statistic measures the independence between a feature's bins and the target classes:

.. math::

   \\chi^2 = \\sum_{i=1}^{r} \\sum_{j=1}^{c} \\frac{(O_{ij} - E_{ij})^2}{E_{ij}}

Where:
- :math:`O_{ij}` is the observed frequency in bin i, class j
- :math:`E_{ij}` is the expected frequency under independence
- Lower chi-square values indicate better independence (good for merging)

Handling Edge Cases
-------------------

.. code-block:: python

   # Handling insufficient data
   small_X = X[:50]  # Very small dataset
   small_y = y[:50]
   
   # Use conservative parameters for small datasets
   small_binner = Chi2Binning(
       max_bins=5,      # Fewer bins for small data
       min_bins=2,      # Conservative minimum
       alpha=0.1,       # More permissive for small samples
       initial_bins=8   # Fewer initial bins
   )
   
   small_binned = small_binner.fit_transform(small_X, small_y)

Tips for Best Results
---------------------

1. **Choose initial_bins wisely**: Start with 2-3x your desired max_bins
2. **Adjust alpha based on sample size**: Use smaller alpha for larger datasets
3. **Consider target distribution**: Imbalanced classes may need different alpha values
4. **Validate on holdout data**: Chi-square optimization can overfit to training data

See Also
--------

* :class:`SupervisedBinning` - Decision tree-based supervised binning
* :class:`IsotonicBinning` - Isotonic regression-based supervised binning
* :class:`EqualFrequencyBinning` - Quantile-based unsupervised binning
* :class:`KMeansBinning` - K-means clustering-based binning