binlearn.base.GeneralBinningBase

class binlearn.base.GeneralBinningBase(preserve_dataframe: bool | None = None, fit_jointly: bool | None = None, guidance_columns: list[Any] | Any | None = None)[source]

Clean binning base class focusing on orchestration and guidance logic.

This abstract base class provides the core infrastructure for all binning transformers in the binlearn library. It orchestrates the binning process, handles guidance column separation, and manages the interaction between fitting and transformation phases.

The class supports two main fitting strategies: - Per-column independent fitting: Each column is binned independently - Joint fitting: All columns are considered together for binning decisions

Key Features: - Guidance column support for supervised and semi-supervised binning - Flexible fitting strategies (independent vs joint) - DataFrame format preservation - Comprehensive error handling and validation - sklearn-compatible transformer interface

Parameters:

preserve_dataframebool, optional

Whether to preserve the original DataFrame format in output. If None, uses the global configuration default. When True, pandas/polars DataFrames are returned as DataFrames; otherwise numpy arrays.

fit_jointlybool, optional

Whether to fit all columns jointly rather than independently. If None, uses the global configuration default. When True, all binning columns are considered together; when False, each column is binned independently.

guidance_columnsGuidanceColumns, optional

Specification of columns to use for guidance (supervision). Can be: - None: No guidance columns (unsupervised binning) - Column identifier: Single guidance column - List of identifiers: Multiple guidance columns Incompatible with fit_jointly=True.

Attributes:

preserve_dataframebool

Whether to preserve DataFrame format in output.

fit_jointlybool

Whether to fit columns jointly or independently.

guidance_columnsGuidanceColumns

Specification of guidance columns for supervision.

Note:

This is an abstract base class and cannot be instantiated directly. Concrete implementations must provide the abstract methods for specific binning algorithms.

The class enforces mutual exclusivity between fit_jointly=True and guidance_columns to prevent conflicting binning strategies.

__init__(preserve_dataframe: bool | None = None, fit_jointly: bool | None = None, guidance_columns: list[Any] | Any | None = None)[source]

Initialize the binning transformer.

Sets up the binning transformer with the specified configuration options, applying global configuration defaults where parameters are not provided. Validates parameter compatibility to prevent conflicting configurations.

Parameters:
  • preserve_dataframe – Whether to preserve DataFrame format in output. If None, uses global configuration default.

  • fit_jointly – Whether to fit all columns together. If None, uses global configuration default.

  • guidance_columns – Guidance column specification for supervised binning. Must be None if fit_jointly=True.

Raises:

ValueError – If guidance_columns is specified when fit_jointly=True, as these options are mutually exclusive.

Note

The binning and guidance column lists are computed dynamically during fitting based on the actual input data and the guidance_columns parameter.

Methods

__init__([preserve_dataframe, fit_jointly, ...])

Initialize the binning transformer.

check_data_quality(data[, name])

Check data quality and issue warnings if needed.

fit(X[, y])

Fit the binning transformer with comprehensive orchestration.

fit_transform(X[, y])

Fit to data, then transform it.

get_input_columns()

Get input columns for data preparation.

get_metadata_routing()

Get metadata routing of this object.

get_params([deep])

Get parameters for this estimator, including fitted parameters.

inverse_transform(X)

Inverse transform from bin indices back to representative values.

set_output(*[, transform])

Set output container.

set_params(**params)

Set the parameters of this estimator.

transform(X)

Transform input data using fitted binning parameters.

validate_array_like(data[, name, allow_none])

Validate and convert array-like input to numpy array.

validate_column_specification(columns, ...)

Validate column specifications.

validate_guidance_columns(guidance_cols, ...)

Validate guidance column specifications.

Attributes

feature_names_in_

Get feature names.

n_features_in_

Get number of features.

__init__(preserve_dataframe: bool | None = None, fit_jointly: bool | None = None, guidance_columns: list[Any] | Any | None = None)[source]

Initialize the binning transformer.

Sets up the binning transformer with the specified configuration options, applying global configuration defaults where parameters are not provided. Validates parameter compatibility to prevent conflicting configurations.

Parameters:
  • preserve_dataframe – Whether to preserve DataFrame format in output. If None, uses global configuration default.

  • fit_jointly – Whether to fit all columns together. If None, uses global configuration default.

  • guidance_columns – Guidance column specification for supervised binning. Must be None if fit_jointly=True.

Raises:

ValueError – If guidance_columns is specified when fit_jointly=True, as these options are mutually exclusive.

Note

The binning and guidance column lists are computed dynamically during fitting based on the actual input data and the guidance_columns parameter.

fit(X: Any, y: Any | None = None, **fit_params: Any) GeneralBinningBase[source]

Fit the binning transformer with comprehensive orchestration.

This method orchestrates the complete fitting process, handling parameter validation, input preprocessing, column separation, and routing to the appropriate fitting strategy (joint vs independent).

Parameters:
  • X – Input data to fit the binning transformer on. Can be: - pandas.DataFrame: Column names are preserved - polars.DataFrame: Column names are preserved - numpy.ndarray: Numeric column indices are used - array-like: Converted to numpy array

  • y – Target values for supervised binning methods. Ignored by unsupervised methods. Can be array-like or None.

  • **fit_params – Additional fitting parameters passed to the specific binning algorithm implementation. Common parameters include: - guidance_data: Alternative guidance data (conflicts with fit_jointly=True)

Returns:

The fitted binning transformer instance.

Return type:

self

Raises:
  • ValueError – If parameter validation fails, inputs are invalid, or conflicting parameters are provided (e.g., fit_jointly=True with guidance_data).

  • BinningError – If the binning algorithm fails to fit the data.

  • RuntimeError – If an unexpected error occurs during fitting.

Example

>>> from binlearn import EqualWidthBinning
>>> import pandas as pd
>>> X = pd.DataFrame({'feature1': [1, 2, 3, 4, 5], 'feature2': [10, 20, 30, 40, 50]})
>>> binner = EqualWidthBinning(n_bins=3)
>>> binner.fit(X)
EqualWidthBinning(...)

Note

The method automatically handles column separation when guidance_columns is specified, routing guidance columns separately from binning columns. The fitting strategy (joint vs independent) is determined by the fit_jointly parameter.

transform(X: Any) Any[source]

Transform input data using fitted binning parameters.

Applies the fitted binning transformation to new data, converting continuous values to discrete bin indices or representatives. Handles column separation when guidance columns are present.

Parameters:

X – Input data to transform. Must have the same structure as the data used during fitting (same number of columns). Can be: - pandas.DataFrame: Column names should match training data - polars.DataFrame: Column names should match training data - numpy.ndarray: Must have same number of columns as training - array-like: Converted to numpy array

Returns:

Transformed data where continuous values are replaced with bin indices or representative values. The output format depends on: - preserve_dataframe setting: DataFrame vs array format - binning method: indices vs representatives - guidance_columns: only binning columns are transformed

Raises:
  • RuntimeError – If the transformer has not been fitted yet.

  • ValueError – If the input data has incompatible structure or format.

  • BinningError – If transformation fails due to data issues.

Example

>>> # After fitting
>>> X_new = pd.DataFrame({'feature1': [1.5, 2.5], 'feature2': [15, 25]})
>>> X_binned = binner.transform(X_new)
>>> print(X_binned)
[[0, 0], [1, 1]]  # Bin indices

Note

When guidance_columns is specified, only the binning columns are transformed. Guidance columns are filtered out from the output. The method preserves the original data format when preserve_dataframe=True.

inverse_transform(X: Any) Any[source]

Inverse transform from bin indices back to representative values.

Converts discrete bin indices back to their representative values, effectively reversing the binning transformation. This is useful for interpreting results or reconstructing approximate original values.

Parameters:

X – Input data containing bin indices to inverse transform. Should contain only binning columns (no guidance columns). Can be: - pandas.DataFrame: Column names should match binning columns - polars.DataFrame: Column names should match binning columns - numpy.ndarray: Must have same number of binning columns - array-like: Converted to numpy array

Returns:

Inverse transformed data where bin indices are replaced with their representative values (typically bin centers). Output format matches the preserve_dataframe setting.

Raises:
  • RuntimeError – If the transformer has not been fitted yet.

  • ValueError – If input data has wrong number of columns or invalid format.

  • BinningError – If inverse transformation fails.

Example

>>> # After fitting and transforming
>>> X_binned = [[0, 1], [1, 0], [2, 2]]  # Bin indices
>>> X_reconstructed = binner.inverse_transform(X_binned)
>>> print(X_reconstructed)
[[0.5, 1.5], [1.5, 0.5], [2.5, 2.5]]  # Representative values

Note

For guided binning (when guidance_columns is specified), the input should only contain the binning columns, not the guidance columns. The number of input columns must match the number of binning columns.

get_input_columns() list[Any] | None[source]

Get input columns for data preparation.

This method should be overridden by derived classes to provide appropriate column information without exposing binning-specific concepts.

Returns:

Column information or None if not available