binlearn.base.IntervalBinningBase

class binlearn.base.IntervalBinningBase(clip: bool | None = None, preserve_dataframe: bool | None = None, fit_jointly: bool | None = None, guidance_columns: Any | None = None, *, bin_edges: dict[Any, list[float]] | None = None, bin_representatives: dict[Any, list[float]] | None = None)[source]

Interval-based binning functionality inheriting from GeneralBinningBase.

This abstract base class provides specialized functionality for binning methods that create discrete intervals from continuous data. It extends GeneralBinningBase with interval-specific features like bin edge management, representative value calculation, and out-of-range value handling.

Key Features: - Interval boundary (bin edges) management and validation - Representative value calculation and storage - Clipping behavior for out-of-range values - sklearn-compatible fitted attributes - Comprehensive parameter validation

The class manages two core concepts: - Bin edges: Define interval boundaries [a, b, c] creating bins [a,b) and [b,c] - Representatives: Values that represent each bin (typically centers or means)

Parameters:

clipbool, optional

Whether to clip out-of-range values to the nearest bin boundaries. If None, uses the global configuration default. When True: - Values below minimum edge are assigned to first bin - Values above maximum edge are assigned to last bin When False, out-of-range values get special indices (BELOW_RANGE, ABOVE_RANGE).

preserve_dataframebool, optional

Inherited from GeneralBinningBase. Whether to preserve DataFrame format.

fit_jointlybool, optional

Inherited from GeneralBinningBase. Whether to fit columns jointly.

guidance_columnsGuidanceColumns, optional

Inherited from GeneralBinningBase. Guidance column specification.

bin_edgesBinEdgesDict, optional

Pre-specified bin edges as a dictionary mapping column identifiers to edge lists. If provided, the fitting process will validate and use these edges instead of computing them from data.

bin_representativesBinEdgesDict, optional

Pre-specified bin representatives as a dictionary mapping column identifiers to representative value lists. If provided, validates consistency with bin_edges.

Attributes:

clipbool

Whether to clip out-of-range values to bin boundaries.

bin_edgesBinEdgesDict | None

Pre-specified bin edges (input parameter).

bin_representativesBinEdgesDict | None

Pre-specified bin representatives (input parameter).

bin_edges_BinEdgesDict

Fitted bin edges after calling fit(). Dictionary mapping each column to its list of bin boundary values.

bin_representatives_BinEdgesDict

Fitted bin representatives after calling fit(). Dictionary mapping each column to its list of representative values.

Note:

This is an abstract base class. Concrete implementations must provide the abstract method _calculate_bins() to define how bin edges are computed from input data for their specific binning algorithm.

__init__(clip: bool | None = None, preserve_dataframe: bool | None = None, fit_jointly: bool | None = None, guidance_columns: Any | None = None, *, bin_edges: dict[Any, list[float]] | None = None, bin_representatives: dict[Any, list[float]] | None = None)[source]

Initialize interval binning base with configuration and validation.

Sets up the interval binning transformer with the specified parameters, applying configuration defaults and performing early parameter validation to catch configuration errors before fitting.

Parameters:
  • clip – Whether to clip out-of-range values to bin boundaries. If None, uses global configuration default.

  • preserve_dataframe – Whether to preserve DataFrame format in output. Passed to GeneralBinningBase. If None, uses global configuration default.

  • fit_jointly – Whether to fit all columns jointly rather than independently. Passed to GeneralBinningBase. If None, uses global configuration default.

  • guidance_columns – Specification of guidance columns for supervised binning. Passed to GeneralBinningBase.

  • bin_edges – Pre-specified bin edges for manual binning. If provided, the fitting process validates and uses these instead of computing from data.

  • bin_representatives – Pre-specified bin representatives. If provided, must be consistent with bin_edges.

Raises:
  • ValueError – If clip parameter is invalid or pre-specified bins are inconsistent.

  • ConfigurationError – If parameter validation fails.

Note

Early parameter validation helps catch configuration issues before expensive fitting operations. The bin_edges_ and bin_representatives_ attributes are initialized as empty dictionaries and populated during fitting.

Methods

__init__([clip, preserve_dataframe, ...])

Initialize interval binning base with configuration and validation.

check_data_quality(data[, name])

Check data quality and issue warnings if needed.

fit(X[, y])

Fit the binning transformer with comprehensive orchestration.

fit_transform(X[, y])

Fit to data, then transform it.

get_input_columns()

Get input columns for data preparation.

get_metadata_routing()

Get metadata routing of this object.

get_params([deep])

Get parameters for this estimator, including fitted parameters.

inverse_transform(X)

Inverse transform from bin indices back to representative values.

set_output(*[, transform])

Set output container.

set_params(**params)

Set the parameters of this estimator.

transform(X)

Transform input data using fitted binning parameters.

validate_array_like(data[, name, allow_none])

Validate and convert array-like input to numpy array.

validate_column_specification(columns, ...)

Validate column specifications.

validate_guidance_columns(guidance_cols, ...)

Validate guidance column specifications.

Attributes

feature_names_in_

Get feature names.

n_features_in_

Get number of features.

__init__(clip: bool | None = None, preserve_dataframe: bool | None = None, fit_jointly: bool | None = None, guidance_columns: Any | None = None, *, bin_edges: dict[Any, list[float]] | None = None, bin_representatives: dict[Any, list[float]] | None = None)[source]

Initialize interval binning base with configuration and validation.

Sets up the interval binning transformer with the specified parameters, applying configuration defaults and performing early parameter validation to catch configuration errors before fitting.

Parameters:
  • clip – Whether to clip out-of-range values to bin boundaries. If None, uses global configuration default.

  • preserve_dataframe – Whether to preserve DataFrame format in output. Passed to GeneralBinningBase. If None, uses global configuration default.

  • fit_jointly – Whether to fit all columns jointly rather than independently. Passed to GeneralBinningBase. If None, uses global configuration default.

  • guidance_columns – Specification of guidance columns for supervised binning. Passed to GeneralBinningBase.

  • bin_edges – Pre-specified bin edges for manual binning. If provided, the fitting process validates and uses these instead of computing from data.

  • bin_representatives – Pre-specified bin representatives. If provided, must be consistent with bin_edges.

Raises:
  • ValueError – If clip parameter is invalid or pre-specified bins are inconsistent.

  • ConfigurationError – If parameter validation fails.

Note

Early parameter validation helps catch configuration issues before expensive fitting operations. The bin_edges_ and bin_representatives_ attributes are initialized as empty dictionaries and populated during fitting.