binlearn.base.SupervisedBinningBase
- class binlearn.base.SupervisedBinningBase(clip: bool | None = None, preserve_dataframe: bool | None = None, guidance_columns: Any | None = None, *, bin_edges: dict[Any, list[float]] | None = None, bin_representatives: dict[Any, list[float]] | None = None)[source]
Base class for supervised binning methods that use target information.
This class extends IntervalBinningBase to provide specialized functionality for supervised binning methods. These methods use target variable information (y) to optimize bin boundaries, typically aiming to create bins with homogeneous target distributions or maximize predictive power.
Supervised binning is particularly effective for: - Binary classification with continuous predictors - Regression tasks where binning should preserve target relationships - Feature selection and engineering based on target correlation - Creating interpretable bins aligned with target behavior
Key Features: - Target-aware bin boundary optimization - Built-in target data validation and preprocessing - Feature-target pair validation for data quality - Automatic handling of supervised learning constraints - Integration with guidance column requirements for targets
Constraints: - Does not support joint fitting across multiple features (fit_jointly=False) - Requires exactly one guidance column to serve as the target variable - Target data must be provided during fit() call - Feature and target data must have compatible shapes and no missing values
- All attributes from IntervalBinningBase plus
- - Target-specific validation and preprocessing capabilities
- - Enhanced error handling for supervised learning scenarios
Example
>>> # Supervised binning for binary classification >>> X = np.array([[1.2, 2.3], [3.4, 4.5], [5.6, 6.7]]) >>> y = np.array([0, 1, 0]) # Binary target >>> >>> binner = ConcreteSupervisedBinner() >>> binner.fit(X, guidance_data=y) >>> X_binned = binner.transform(X)
Note
This is an abstract base class - use concrete implementations like Chi2Binning
Inherits all interval binning functionality (bin edges, representatives, etc.)
Target data is passed via guidance_data parameter in fit() method
Subclasses must implement _calculate_bin_edges with target-aware logic
- __init__(clip: bool | None = None, preserve_dataframe: bool | None = None, guidance_columns: Any | None = None, *, bin_edges: dict[Any, list[float]] | None = None, bin_representatives: dict[Any, list[float]] | None = None)[source]
Initialize supervised binning base class.
- Parameters:
clip – Whether to clip out-of-range values during transformation to the nearest bin boundary. If True, values below the minimum edge are assigned to the first bin, and values above the maximum edge are assigned to the last bin. If False, out-of-range values get special indices (BELOW_RANGE, ABOVE_RANGE). If None, uses global default.
preserve_dataframe – Whether to preserve the original DataFrame format during transformation. If True, returns DataFrame when input is DataFrame. If False, returns numpy array. If None, uses global configuration default.
guidance_columns – Column identifier for the target variable. For supervised binning, this should specify exactly one column that contains the target values. Can be column name/index or None (target passed via guidance_data).
bin_edges – Pre-defined bin edges as a dictionary mapping column identifiers to lists of edge values. If provided, no fitting is performed and these edges are used directly. Must be compatible with supervised binning constraints if provided.
bin_representatives – Pre-defined representative values for each bin as a dictionary mapping column identifiers to lists of values. Must match the structure of bin_edges if provided.
- Raises:
ValidationError – If parameters are incompatible with supervised binning requirements (e.g., multiple guidance columns specified).
Note
Supervised binning does not support fit_jointly=True (always fits independently)
Target data should be provided via guidance_data parameter in fit() method
Only one guidance column is supported (the target variable)
Pre-defined bin edges should be optimized for the target if provided
Methods
__init__([clip, preserve_dataframe, ...])Initialize supervised binning base class.
check_data_quality(data[, name])Check data quality and issue warnings if needed.
fit(X[, y])Fit the binning transformer with comprehensive orchestration.
fit_transform(X[, y])Fit to data, then transform it.
get_input_columns()Get input columns for data preparation.
get_metadata_routing()Get metadata routing of this object.
get_params([deep])Get parameters for this estimator, including fitted parameters.
inverse_transform(X)Inverse transform from bin indices back to representative values.
set_output(*[, transform])Set output container.
set_params(**params)Set the parameters of this estimator.
transform(X)Transform input data using fitted binning parameters.
validate_array_like(data[, name, allow_none])Validate and convert array-like input to numpy array.
validate_column_specification(columns, ...)Validate column specifications.
validate_guidance_columns(guidance_cols, ...)Validate guidance column specifications.
validate_guidance_data(guidance_data[, name])Validate and preprocess guidance data for supervised binning.
Attributes
feature_names_in_Get feature names.
n_features_in_Get number of features.
- __init__(clip: bool | None = None, preserve_dataframe: bool | None = None, guidance_columns: Any | None = None, *, bin_edges: dict[Any, list[float]] | None = None, bin_representatives: dict[Any, list[float]] | None = None)[source]
Initialize supervised binning base class.
- Parameters:
clip – Whether to clip out-of-range values during transformation to the nearest bin boundary. If True, values below the minimum edge are assigned to the first bin, and values above the maximum edge are assigned to the last bin. If False, out-of-range values get special indices (BELOW_RANGE, ABOVE_RANGE). If None, uses global default.
preserve_dataframe – Whether to preserve the original DataFrame format during transformation. If True, returns DataFrame when input is DataFrame. If False, returns numpy array. If None, uses global configuration default.
guidance_columns – Column identifier for the target variable. For supervised binning, this should specify exactly one column that contains the target values. Can be column name/index or None (target passed via guidance_data).
bin_edges – Pre-defined bin edges as a dictionary mapping column identifiers to lists of edge values. If provided, no fitting is performed and these edges are used directly. Must be compatible with supervised binning constraints if provided.
bin_representatives – Pre-defined representative values for each bin as a dictionary mapping column identifiers to lists of values. Must match the structure of bin_edges if provided.
- Raises:
ValidationError – If parameters are incompatible with supervised binning requirements (e.g., multiple guidance columns specified).
Note
Supervised binning does not support fit_jointly=True (always fits independently)
Target data should be provided via guidance_data parameter in fit() method
Only one guidance column is supported (the target variable)
Pre-defined bin edges should be optimized for the target if provided
- validate_guidance_data(guidance_data: Any, name: str = 'guidance_data') ndarray[Any, Any][source]
Validate and preprocess guidance data for supervised binning.
Ensures that the guidance data is appropriate for supervised binning by validating its shape and checking for data quality issues.
- Parameters:
guidance_data – Raw guidance/target data to validate. Should be a 2D array with shape (n_samples, 1) or 1D array with shape (n_samples,).
name – Name used in error messages for better debugging context.
- Returns:
Validated and preprocessed guidance data with shape (n_samples, 1).
- Raises:
ValidationError – If guidance data has invalid shape or format.