binlearn.base.IntervalBinningBase
- class binlearn.base.IntervalBinningBase(clip: bool | None = None, preserve_dataframe: bool | None = None, fit_jointly: bool | None = None, guidance_columns: Any | None = None, *, bin_edges: dict[Any, list[float]] | None = None, bin_representatives: dict[Any, list[float]] | None = None)[source]
Interval-based binning functionality inheriting from GeneralBinningBase.
This abstract base class provides specialized functionality for binning methods that create discrete intervals from continuous data. It extends GeneralBinningBase with interval-specific features like bin edge management, representative value calculation, and out-of-range value handling.
Key Features: - Interval boundary (bin edges) management and validation - Representative value calculation and storage - Clipping behavior for out-of-range values - sklearn-compatible fitted attributes - Comprehensive parameter validation
The class manages two core concepts: - Bin edges: Define interval boundaries [a, b, c] creating bins [a,b) and [b,c] - Representatives: Values that represent each bin (typically centers or means)
Parameters:
- clipbool, optional
Whether to clip out-of-range values to the nearest bin boundaries. If None, uses the global configuration default. When True: - Values below minimum edge are assigned to first bin - Values above maximum edge are assigned to last bin When False, out-of-range values get special indices (BELOW_RANGE, ABOVE_RANGE).
- preserve_dataframebool, optional
Inherited from GeneralBinningBase. Whether to preserve DataFrame format.
- fit_jointlybool, optional
Inherited from GeneralBinningBase. Whether to fit columns jointly.
- guidance_columnsGuidanceColumns, optional
Inherited from GeneralBinningBase. Guidance column specification.
- bin_edgesBinEdgesDict, optional
Pre-specified bin edges as a dictionary mapping column identifiers to edge lists. If provided, the fitting process will validate and use these edges instead of computing them from data.
- bin_representativesBinEdgesDict, optional
Pre-specified bin representatives as a dictionary mapping column identifiers to representative value lists. If provided, validates consistency with bin_edges.
Attributes:
- clipbool
Whether to clip out-of-range values to bin boundaries.
- bin_edgesBinEdgesDict | None
Pre-specified bin edges (input parameter).
- bin_representativesBinEdgesDict | None
Pre-specified bin representatives (input parameter).
- bin_edges_BinEdgesDict
Fitted bin edges after calling fit(). Dictionary mapping each column to its list of bin boundary values.
- bin_representatives_BinEdgesDict
Fitted bin representatives after calling fit(). Dictionary mapping each column to its list of representative values.
Note:
This is an abstract base class. Concrete implementations must provide the abstract method _calculate_bins() to define how bin edges are computed from input data for their specific binning algorithm.
- __init__(clip: bool | None = None, preserve_dataframe: bool | None = None, fit_jointly: bool | None = None, guidance_columns: Any | None = None, *, bin_edges: dict[Any, list[float]] | None = None, bin_representatives: dict[Any, list[float]] | None = None)[source]
Initialize interval binning base with configuration and validation.
Sets up the interval binning transformer with the specified parameters, applying configuration defaults and performing early parameter validation to catch configuration errors before fitting.
- Parameters:
clip – Whether to clip out-of-range values to bin boundaries. If None, uses global configuration default.
preserve_dataframe – Whether to preserve DataFrame format in output. Passed to GeneralBinningBase. If None, uses global configuration default.
fit_jointly – Whether to fit all columns jointly rather than independently. Passed to GeneralBinningBase. If None, uses global configuration default.
guidance_columns – Specification of guidance columns for supervised binning. Passed to GeneralBinningBase.
bin_edges – Pre-specified bin edges for manual binning. If provided, the fitting process validates and uses these instead of computing from data.
bin_representatives – Pre-specified bin representatives. If provided, must be consistent with bin_edges.
- Raises:
ValueError – If clip parameter is invalid or pre-specified bins are inconsistent.
ConfigurationError – If parameter validation fails.
Note
Early parameter validation helps catch configuration issues before expensive fitting operations. The bin_edges_ and bin_representatives_ attributes are initialized as empty dictionaries and populated during fitting.
Methods
__init__([clip, preserve_dataframe, ...])Initialize interval binning base with configuration and validation.
check_data_quality(data[, name])Check data quality and issue warnings if needed.
fit(X[, y])Fit the binning transformer with comprehensive orchestration.
fit_transform(X[, y])Fit to data, then transform it.
get_input_columns()Get input columns for data preparation.
get_metadata_routing()Get metadata routing of this object.
get_params([deep])Get parameters for this estimator, including fitted parameters.
inverse_transform(X)Inverse transform from bin indices back to representative values.
set_output(*[, transform])Set output container.
set_params(**params)Set the parameters of this estimator.
transform(X)Transform input data using fitted binning parameters.
validate_array_like(data[, name, allow_none])Validate and convert array-like input to numpy array.
validate_column_specification(columns, ...)Validate column specifications.
validate_guidance_columns(guidance_cols, ...)Validate guidance column specifications.
Attributes
feature_names_in_Get feature names.
n_features_in_Get number of features.
- __init__(clip: bool | None = None, preserve_dataframe: bool | None = None, fit_jointly: bool | None = None, guidance_columns: Any | None = None, *, bin_edges: dict[Any, list[float]] | None = None, bin_representatives: dict[Any, list[float]] | None = None)[source]
Initialize interval binning base with configuration and validation.
Sets up the interval binning transformer with the specified parameters, applying configuration defaults and performing early parameter validation to catch configuration errors before fitting.
- Parameters:
clip – Whether to clip out-of-range values to bin boundaries. If None, uses global configuration default.
preserve_dataframe – Whether to preserve DataFrame format in output. Passed to GeneralBinningBase. If None, uses global configuration default.
fit_jointly – Whether to fit all columns jointly rather than independently. Passed to GeneralBinningBase. If None, uses global configuration default.
guidance_columns – Specification of guidance columns for supervised binning. Passed to GeneralBinningBase.
bin_edges – Pre-specified bin edges for manual binning. If provided, the fitting process validates and uses these instead of computing from data.
bin_representatives – Pre-specified bin representatives. If provided, must be consistent with bin_edges.
- Raises:
ValueError – If clip parameter is invalid or pre-specified bins are inconsistent.
ConfigurationError – If parameter validation fails.
Note
Early parameter validation helps catch configuration issues before expensive fitting operations. The bin_edges_ and bin_representatives_ attributes are initialized as empty dictionaries and populated during fitting.