binlearn.methods.EqualWidthMinimumWeightBinning

class binlearn.methods.EqualWidthMinimumWeightBinning(n_bins: int | str | None = None, minimum_weight: float | None = None, bin_range: tuple[float, float] | None = None, clip: bool | None = None, preserve_dataframe: bool | None = None, guidance_columns: Any | None = None, *, bin_edges: dict[Any, list[float]] | None = None, bin_representatives: dict[Any, list[float]] | None = None, class_: str | None = None, module_: str | None = None)[source]

Equal-width binning with minimum weight constraint implementation using clean architecture.

Creates bins of equal width across the range of each feature, but adjusts the number of bins to ensure each bin contains at least the specified minimum total weight from the guidance column. This method combines the interpretability of equal-width binning with weight-based constraints for more balanced bins.

This approach is particularly valuable when working with weighted data where statistical significance or minimum sample requirements must be maintained within each bin. The algorithm starts with equal-width bins and then merges adjacent underweight bins until all remaining bins meet the minimum weight requirement.

The weight constraint helps ensure that: - Each bin has sufficient statistical power for analysis - Bins are meaningful for weighted modeling or evaluation - Sparse regions in the data don’t create unreliable bins - The resulting binning respects both spatial (equal-width) and statistical (weight)

considerations

When no bins can meet the minimum weight requirement individually, the algorithm creates a single bin containing all data to maintain functionality.

This implementation follows the clean binlearn architecture with straight inheritance, dynamic column resolution, and parameter reconstruction capabilities.

Parameters:
  • n_bins – Initial number of equal-width bins to create before weight-based merging. Controls the granularity of the initial binning. Can be an integer or a string expression like ‘sqrt’, ‘log2’, etc. for dynamic calculation. Final number of bins may be smaller due to merging. If None, uses configuration default.

  • minimum_weight – Minimum total weight required per bin. Bins with lower total weight will be merged with adjacent bins until this requirement is met. Must be positive. If None, uses configuration default.

  • bin_range – Optional tuple specifying (min, max) range for binning. If provided, bins are created within this range rather than the data’s natural range. Useful for ensuring consistent binning across datasets. If None, uses data’s min/max values.

  • clip – Whether to clip values outside the fitted range to the nearest bin edge. If None, uses configuration default.

  • preserve_dataframe – Whether to preserve pandas DataFrame structure in transform operations. If None, uses configuration default.

  • guidance_columns – Column specification for weight/guidance data used in supervised binning. Should point to weight values for each sample.

  • bin_edges – Pre-computed bin edges for reconstruction. Should not be provided during normal usage.

  • bin_representatives – Pre-computed bin representatives for reconstruction. Should not be provided during normal usage.

  • class – Class name for reconstruction compatibility. Internal use only.

  • module – Module name for reconstruction compatibility. Internal use only.

n_bins

Initial number of bins before merging

minimum_weight

Minimum weight requirement per bin

bin_range

Optional fixed range for binning

Example

>>> import numpy as np
>>> from binlearn.methods import EqualWidthMinimumWeightBinning
>>>
>>> # Create sample data with weights
>>> np.random.seed(42)
>>> X = np.random.uniform(0, 100, 1000).reshape(-1, 1)
>>> weights = np.random.exponential(2.0, 1000)  # Exponentially distributed weights
>>>
>>> # Initialize with minimum weight constraint
>>> binner = EqualWidthMinimumWeightBinning(
...     n_bins=10,
...     minimum_weight=50.0,
...     guidance_columns='weight'
... )
>>>
>>> # Fit with weight data
>>> binner.fit(X, weights.reshape(-1, 1))
>>> X_binned = binner.transform(X)
>>>
>>> # Check bin weights
>>> for i, edges in enumerate(zip(binner.bin_edges_[0][:-1], binner.bin_edges_[0][1:])):
...     left, right = edges
...     mask = (X >= left) & (X < right) if i < len(binner.bin_edges_[0]) - 2
...         else (X >= left) & (X <= right)
...     bin_weight = np.sum(weights[mask.flatten()])
...     print(f"Bin {i}: [{left:.1f}, {right:.1f}] weight: {bin_weight:.1f}")

Note

  • Requires guidance data containing weight values for each sample

  • Final number of bins may be less than n_bins due to merging underweight bins

  • All weights must be non-negative (negative weights raise ValueError)

  • Bins are merged by combining adjacent underweight bins

  • Creates a single bin if no individual bins can meet the weight requirement

  • Each column is processed independently with its corresponding weight data

  • Weight-based merging preserves the equal-width property where possible

See also

EqualWidthBinning: Standard equal-width binning without weight constraints EqualFrequencyBinning: Equal-frequency binning for balanced sample counts SupervisedBinningBase: Base class for supervised binning methods

References

This method extends standard equal-width binning with statistical adequacy constraints commonly used in risk modeling and weighted analysis scenarios.

__init__(n_bins: int | str | None = None, minimum_weight: float | None = None, bin_range: tuple[float, float] | None = None, clip: bool | None = None, preserve_dataframe: bool | None = None, guidance_columns: Any | None = None, *, bin_edges: dict[Any, list[float]] | None = None, bin_representatives: dict[Any, list[float]] | None = None, class_: str | None = None, module_: str | None = None)[source]

Initialize Equal Width Minimum Weight binning with weight constraints.

Sets up equal-width binning with minimum weight constraints, combining spatial and statistical adequacy requirements. Applies configuration defaults for any unspecified parameters and validates the resulting configuration.

Parameters:
  • n_bins – Initial number of equal-width bins to create before weight-based merging. Controls the granularity of the initial binning. Can be: - Integer: Exact initial number of bins - String: Dynamic calculation expression (‘sqrt’, ‘log2’, etc.) Final number of bins may be smaller due to merging. Must be positive. If None, uses configuration default.

  • minimum_weight – Minimum total weight required per bin. Bins with total weight below this threshold will be merged with adjacent bins until the requirement is met. Must be positive. If None, uses configuration default.

  • bin_range – Optional tuple specifying (min_value, max_value) range for binning. If provided, equal-width bins are created within this range regardless of the actual data range. Useful for: - Consistent binning across multiple datasets - Excluding outliers from bin range calculation - Domain-specific range constraints Must be (min, max) where min < max. If None, uses data’s actual range.

  • clip – Whether to clip transformed values outside the fitted range to the nearest bin edge. If None, uses configuration default.

  • preserve_dataframe – Whether to preserve pandas DataFrame structure in transform operations. If None, uses configuration default.

  • guidance_columns – Column specification for weight/guidance data. Should point to columns containing weight values for each sample. Required for supervised binning during fit operations.

  • bin_edges – Pre-computed bin edges dictionary for reconstruction. Internal use only - should not be provided during normal initialization.

  • bin_representatives – Pre-computed representatives dictionary for reconstruction. Internal use only.

  • class – Class name string for reconstruction compatibility. Internal use only.

  • module – Module name string for reconstruction compatibility. Internal use only.

Example

>>> # Standard initialization with weight constraints
>>> binner = EqualWidthMinimumWeightBinning(
...     n_bins=8,
...     minimum_weight=100.0,
...     guidance_columns='sample_weight'
... )
>>>
>>> # Custom range with tighter weight requirements
>>> binner = EqualWidthMinimumWeightBinning(
...     n_bins=12,
...     minimum_weight=50.0,
...     bin_range=(0, 1000),
...     guidance_columns=['weight_column']
... )
>>>
>>> # Use configuration defaults
>>> binner = EqualWidthMinimumWeightBinning(
...     guidance_columns='weights'
... )

Note

  • Parameter validation occurs during initialization

  • Configuration defaults are applied for None parameters

  • The minimum_weight parameter is crucial for determining bin merging behavior

  • bin_range allows for consistent binning across datasets with different ranges

  • Guidance columns must point to weight data for the minimum weight constraint to work

  • Reconstruction parameters should not be provided during normal usage