EqualWidthBinning

class binlearn.methods.EqualWidthBinning(n_bins: int | None = None, bin_range: tuple[float, float] | None = None, clip: bool | None = None, preserve_dataframe: bool | None = None, fit_jointly: bool | None = None, guidance_columns: Any | None = None, *, bin_edges: dict[Any, list[float]] | None = None, bin_representatives: dict[Any, list[float]] | None = None, class_: str | None = None, module_: str | None = None)[source]

Bases: IntervalBinningBase

Equal width binning implementation for creating uniform interval bins.

This class implements equal width binning, one of the most fundamental and widely-used discretization methods. It divides the range of continuous values into a specified number of bins, each having the same width (range span). The method is simple, interpretable, and works well when data is uniformly distributed across the value range.

Equal width binning is particularly effective for: - Uniformly distributed data where equal intervals make intuitive sense - Creating interpretable bins with consistent spacing - Baseline discretization before trying more sophisticated methods - Applications where maintaining consistent bin widths is important

Key Features: - Creates bins with identical width (max_value - min_value) / n_bins - Handles custom value ranges via bin_range parameter - Automatic bin representative calculation as interval midpoints - Inherits all interval binning capabilities (clipping, format preservation, etc.) - Supports both automatic range detection and user-specified ranges

Algorithm: 1. Determine value range (from data or user-specified bin_range) 2. Create n_bins equally-spaced intervals across the range 3. Assign representative values as bin centers 4. Transform values by finding their containing interval

Parameters:
  • n_bins – Number of bins to create. Must be a positive integer. Default value can be configured globally via binlearn.config.

  • bin_range – Optional custom range as (min, max) tuple. If provided, bins are created within this range regardless of actual data range. Useful for ensuring consistent binning across datasets.

bin_edges_

Dictionary mapping column identifiers to lists of bin edges after fitting. Each edge list contains n_bins + 1 values.

bin_representatives_

Dictionary mapping column identifiers to lists of bin representatives (typically bin centers).

Example

>>> import numpy as np
>>> from binlearn.methods import EqualWidthBinning
>>>
>>> # Basic equal width binning
>>> X = np.array([[1.0, 10.0], [2.0, 20.0], [3.0, 30.0], [4.0, 40.0]])
>>> binner = EqualWidthBinning(n_bins=3)
>>> binner.fit(X)
>>> X_binned = binner.transform(X)
>>> print(X_binned)  # [[0, 0], [1, 1], [2, 2], [2, 2]]
>>>
>>> # With custom range
>>> binner_custom = EqualWidthBinning(n_bins=2, bin_range=(0.0, 10.0))
>>> binner_custom.fit(X)
>>> # Bins are created in [0, 10] range regardless of actual data range

Note

  • Only works with numeric data - non-numeric columns will raise errors

  • Constant columns (same value everywhere) are handled by creating single bin

  • Outliers can significantly affect bin boundaries in automatic range mode

  • Consider using bin_range parameter for consistent binning across datasets

  • Inherits clipping behavior and DataFrame preservation from IntervalBinningBase

__init__(n_bins: int | None = None, bin_range: tuple[float, float] | None = None, clip: bool | None = None, preserve_dataframe: bool | None = None, fit_jointly: bool | None = None, guidance_columns: Any | None = None, *, bin_edges: dict[Any, list[float]] | None = None, bin_representatives: dict[Any, list[float]] | None = None, class_: str | None = None, module_: str | None = None)[source]

Initialize equal width binning with configuration and parameters.

Sets up the equal width binning method with user-specified parameters and configuration defaults. The method integrates with binlearn’s global configuration system while allowing parameter-specific overrides.

Parameters:
  • n_bins – Number of equal-width bins to create for each column. Must be a positive integer. If None, uses the global configuration default for the ‘equal_width’ method, typically 5.

  • bin_range – Optional custom range as (min, max) tuple within which to create bins. If provided, bins are created within this range regardless of the actual data range. Useful for ensuring consistent binning across different datasets. If None, range is determined automatically from the data during fitting.

  • clip – Whether to clip out-of-range values to the nearest bin boundary during transformation. If True, values outside the range are assigned to the nearest edge bin. If False, they receive special out-of-range indices. If None, uses global configuration default.

  • preserve_dataframe – Whether to preserve DataFrame format in outputs when input is a DataFrame. If None, uses global configuration default.

  • fit_jointly – Whether to fit all columns together (not applicable for equal width binning as it’s inherently per-column). If None, uses global configuration default.

  • guidance_columns – Additional columns to include in input validation but not to bin. Not typically used for equal width binning. If None, no guidance columns are expected.

  • bin_edges – Pre-computed bin edges for reconstruction/deserialization. If provided, no fitting is performed and these edges are used directly. Should be a dictionary mapping column identifiers to lists of edge values.

  • bin_representatives – Pre-computed bin representatives for reconstruction. Must be provided together with bin_edges. Should be a dictionary mapping column identifiers to lists of representative values.

  • class – Class name for reconstruction compatibility (ignored during normal initialization).

  • module – Module name for reconstruction compatibility (ignored during normal initialization).

Example

>>> # Basic initialization with defaults
>>> binner = EqualWidthBinning()
>>>
>>> # Custom number of bins and range
>>> binner = EqualWidthBinning(n_bins=10, bin_range=(0.0, 100.0))
>>>
>>> # Reconstruction from saved parameters
>>> binner = EqualWidthBinning(
...     n_bins=5,
...     bin_edges={'col1': [0, 1, 2, 3, 4, 5]},
...     bin_representatives={'col1': [0.5, 1.5, 2.5, 3.5, 4.5]}
... )

Note

  • Parameters integrate with binlearn’s global configuration system

  • None values allow configuration defaults to take effect

  • Pre-computed edges and representatives enable object reconstruction

  • Class follows sklearn’s estimator interface conventions

classmethod __init_subclass__(**kwargs)

Set the set_{method}_request methods.

This uses PEP-487 [1] to set the set_{method}_request methods. It looks for the information available in the set default values which are set using __metadata_request__* class attributes, or inferred from method signatures.

The __metadata_request__* class attributes are used when a method does not explicitly accept a metadata through its arguments or if the developer would like to specify a request value for those metadata which are different from the default None.

References

static check_data_quality(data: ndarray[Any, Any], name: str = 'data') None

Check data quality and issue warnings if needed.

property feature_names_in_: list[str] | None

Get feature names.

fit(X: Any, y: Any | None = None, **fit_params: Any) GeneralBinningBase

Fit the binning transformer with comprehensive orchestration.

This method orchestrates the complete fitting process, handling parameter validation, input preprocessing, column separation, and routing to the appropriate fitting strategy (joint vs independent).

Parameters:
  • X – Input data to fit the binning transformer on. Can be: - pandas.DataFrame: Column names are preserved - polars.DataFrame: Column names are preserved - numpy.ndarray: Numeric column indices are used - array-like: Converted to numpy array

  • y – Target values for supervised binning methods. Ignored by unsupervised methods. Can be array-like or None.

  • **fit_params – Additional fitting parameters passed to the specific binning algorithm implementation. Common parameters include: - guidance_data: Alternative guidance data (conflicts with fit_jointly=True)

Returns:

The fitted binning transformer instance.

Return type:

self

Raises:
  • ValueError – If parameter validation fails, inputs are invalid, or conflicting parameters are provided (e.g., fit_jointly=True with guidance_data).

  • BinningError – If the binning algorithm fails to fit the data.

  • RuntimeError – If an unexpected error occurs during fitting.

Example

>>> from binlearn import EqualWidthBinning
>>> import pandas as pd
>>> X = pd.DataFrame({'feature1': [1, 2, 3, 4, 5], 'feature2': [10, 20, 30, 40, 50]})
>>> binner = EqualWidthBinning(n_bins=3)
>>> binner.fit(X)
EqualWidthBinning(...)

Note

The method automatically handles column separation when guidance_columns is specified, routing guidance columns separately from binning columns. The fitting strategy (joint vs independent) is determined by the fit_jointly parameter.

fit_transform(X, y=None, **fit_params)

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

Parameters:
  • X (array-like of shape (n_samples, n_features)) – Input samples.

  • y (array-like of shape (n_samples,) or (n_samples, n_outputs), default=None) – Target values (None for unsupervised transformations).

  • **fit_params (dict) – Additional fit parameters.

Returns:

X_new – Transformed array.

Return type:

ndarray array of shape (n_samples, n_features_new)

get_input_columns() list[Any] | None

Get input columns for data preparation.

This method should be overridden by derived classes to provide appropriate column information without exposing binning-specific concepts.

Returns:

Column information or None if not available

get_metadata_routing()

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns:

routing – A MetadataRequest encapsulating routing information.

Return type:

MetadataRequest

get_params(deep: bool = True) dict[str, Any]

Get parameters for this estimator, including fitted parameters.

This method extends sklearn’s standard get_params to include fitted parameters when the estimator is fitted, enabling complete object reconstruction through the get_params/set_params interface. This is essential for pipeline persistence and model serialization.

Parameters:

deep – If True, returns parameters for sub-estimators (not applicable here but maintained for sklearn compatibility).

Returns:

  • Constructor parameters extracted from __init__ signature

  • Fitted parameters (if estimator is fitted) mapped from attributes

  • Class metadata (class_, module_) for automatic reconstruction

Return type:

Dictionary of parameter names mapped to their values, including

Example

>>> binner = EqualWidthBinning(n_bins=5)
>>> params = binner.get_params()
>>> print(params)
{'n_bins': 5, 'clip': None, ..., 'class_': 'EqualWidthBinning', 'module_': '...'}
>>>
>>> binner.fit(X)
>>> fitted_params = binner.get_params()
>>> # Now includes: {'bin_edges': {...}, 'bin_representatives': {...}, ...}

Note

  • Automatically extracts constructor parameters from __init__ signature

  • Includes fitted parameters only when estimator is fitted

  • Adds class metadata for reconstruction workflows

  • Excludes internal sklearn attributes like n_features_in_

  • class_ and module_ parameters are handled specially during set_params

inverse_transform(X: Any) Any

Inverse transform from bin indices back to representative values.

Converts discrete bin indices back to their representative values, effectively reversing the binning transformation. This is useful for interpreting results or reconstructing approximate original values.

Parameters:

X – Input data containing bin indices to inverse transform. Should contain only binning columns (no guidance columns). Can be: - pandas.DataFrame: Column names should match binning columns - polars.DataFrame: Column names should match binning columns - numpy.ndarray: Must have same number of binning columns - array-like: Converted to numpy array

Returns:

Inverse transformed data where bin indices are replaced with their representative values (typically bin centers). Output format matches the preserve_dataframe setting.

Raises:
  • RuntimeError – If the transformer has not been fitted yet.

  • ValueError – If input data has wrong number of columns or invalid format.

  • BinningError – If inverse transformation fails.

Example

>>> # After fitting and transforming
>>> X_binned = [[0, 1], [1, 0], [2, 2]]  # Bin indices
>>> X_reconstructed = binner.inverse_transform(X_binned)
>>> print(X_reconstructed)
[[0.5, 1.5], [1.5, 0.5], [2.5, 2.5]]  # Representative values

Note

For guided binning (when guidance_columns is specified), the input should only contain the binning columns, not the guidance columns. The number of input columns must match the number of binning columns.

property n_features_in_: int

Get number of features.

set_output(*, transform=None)

Set output container.

See Introducing the set_output API for an example on how to use the API.

Parameters:

transform ({"default", "pandas", "polars"}, default=None) –

Configure output of transform and fit_transform.

  • ”default”: Default output format of a transformer

  • ”pandas”: DataFrame output

  • ”polars”: Polars output

  • None: Transform configuration is unchanged

Added in version 1.4: “polars” option was added.

Returns:

self – Estimator instance.

Return type:

estimator instance

set_params(**params: Any) SklearnIntegrationBase

Set the parameters of this estimator.

This method supports reconstruction workflows by handling fitted parameters that come from get_params() output (without underscores) and setting them as fitted attributes (with underscores).

Parameters:

**params – Parameters to set. Can include: - Regular constructor parameters (n_bins, clip, etc.) - Fitted parameters from get_params (bin_edges, bin_representatives) - Class metadata (ignored during reconstruction)

Returns:

Returns the instance itself.

Return type:

self

transform(X: Any) Any

Transform input data using fitted binning parameters.

Applies the fitted binning transformation to new data, converting continuous values to discrete bin indices or representatives. Handles column separation when guidance columns are present.

Parameters:

X – Input data to transform. Must have the same structure as the data used during fitting (same number of columns). Can be: - pandas.DataFrame: Column names should match training data - polars.DataFrame: Column names should match training data - numpy.ndarray: Must have same number of columns as training - array-like: Converted to numpy array

Returns:

Transformed data where continuous values are replaced with bin indices or representative values. The output format depends on: - preserve_dataframe setting: DataFrame vs array format - binning method: indices vs representatives - guidance_columns: only binning columns are transformed

Raises:
  • RuntimeError – If the transformer has not been fitted yet.

  • ValueError – If the input data has incompatible structure or format.

  • BinningError – If transformation fails due to data issues.

Example

>>> # After fitting
>>> X_new = pd.DataFrame({'feature1': [1.5, 2.5], 'feature2': [15, 25]})
>>> X_binned = binner.transform(X_new)
>>> print(X_binned)
[[0, 0], [1, 1]]  # Bin indices

Note

When guidance_columns is specified, only the binning columns are transformed. Guidance columns are filtered out from the output. The method preserves the original data format when preserve_dataframe=True.

static validate_array_like(data: Any, name: str = 'data', allow_none: bool = False) ndarray[Any, Any] | None

Validate and convert array-like input to numpy array.

This method provides robust validation and conversion of various input formats to numpy arrays, with comprehensive error handling and helpful suggestions for common issues.

Parameters:
  • data – Input data to validate and convert. Can be: - numpy.ndarray: Used directly - pandas.DataFrame/Series: Converted to numpy array - polars.DataFrame: Converted to numpy array - list, tuple: Converted to numpy array - None: Allowed only if allow_none=True

  • name – Name of the data parameter for error messages. Used to provide context in error messages (e.g., “X”, “y”, “guidance_data”).

  • allow_none – Whether to allow None as a valid input. If True, None is returned unchanged; if False, None raises InvalidDataError.

Returns:

Validated numpy array, or None if data is None and allow_none=True. The returned array maintains the same data content but is guaranteed to be a numpy array.

Raises:

InvalidDataError – If validation fails: - data is None when allow_none=False - data cannot be converted to numpy array - Conversion process encounters errors

Example

>>> # Valid inputs
>>> arr = ValidationMixin.validate_array_like([1, 2, 3], "X")
>>> print(type(arr))
<class 'numpy.ndarray'>
>>>
>>> # Allow None
>>> result = ValidationMixin.validate_array_like(None, "y", allow_none=True)
>>> print(result)
None
>>>
>>> # Invalid input
>>> ValidationMixin.validate_array_like(None, "X", allow_none=False)
InvalidDataError: X cannot be None

Note

This method focuses on format validation and conversion. Content validation (like checking for NaN values) should be done separately using other validation methods.

static validate_column_specification(columns: Any, data_shape: tuple[int, ...]) list[Any]

Validate column specifications.

static validate_guidance_columns(guidance_cols: Any, binning_cols: list[Any], data_shape: tuple[int, ...]) list[Any]

Validate guidance column specifications.

Examples

Basic Usage

import numpy as np
from binlearn.methods import EqualWidthBinning

# Create sample data
X = np.random.rand(1000, 3)

# Create and fit binner
binner = EqualWidthBinning(n_bins=5)
X_binned = binner.fit_transform(X)

print(f"Original shape: {X.shape}")
print(f"Binned shape: {X_binned.shape}")
print(f"Bin edges: {binner.bin_edges_}")

With Custom Range

# Specify custom range for binning
binner = EqualWidthBinning(
    n_bins=4,
    bin_range=(0, 10)  # Force range from 0 to 10
)

X_binned = binner.fit_transform(X)

With DataFrame Preservation

import pandas as pd

# Create DataFrame
df = pd.DataFrame({
    'feature1': np.random.normal(0, 1, 100),
    'feature2': np.random.exponential(2, 100)
})

# Preserve DataFrame format
binner = EqualWidthBinning(n_bins=3, preserve_dataframe=True)
df_binned = binner.fit_transform(df)

print(type(df_binned))  # pandas.DataFrame
print(df_binned.columns.tolist())  # ['feature1', 'feature2']