IsotonicBinning

class binlearn.methods.IsotonicBinning(max_bins: int | str | None = None, min_samples_per_bin: int | None = None, increasing: bool | None = None, y_min: float | None = None, y_max: float | None = None, min_change_threshold: float | None = None, clip: bool | None = None, preserve_dataframe: bool | None = None, guidance_columns: Any | None = None, *, bin_edges: dict[Any, list[float]] | None = None, bin_representatives: dict[Any, list[float]] | None = None, class_: str | None = None, module_: str | None = None)[source]

Bases: SupervisedBinningBase

Isotonic regression-based monotonic binning implementation using clean architecture.

Creates bins using isotonic regression to find optimal cut points that preserve monotonic relationships between features and targets. The transformer fits an isotonic (monotonic, non-decreasing or non-increasing) function to the data and identifies significant changes in this function to determine bin boundaries.

This method is particularly valuable for cases where domain knowledge suggests a monotonic relationship between features and targets, such as risk modeling, credit scoring, or any application where preserving order relationships is critical. The isotonic regression ensures that the average target values within bins maintain the specified monotonic relationship.

The algorithm works by: 1. Sorting data by feature values 2. Fitting an isotonic regression model to preserve monotonicity 3. Identifying cut points where significant changes occur in the fitted function 4. Creating bins that respect both the monotonic constraint and the minimum samples requirement

When insufficient variability is found in the fitted isotonic function, the algorithm creates a single bin or falls back to simple boundary definitions.

This implementation follows the clean binlearn architecture with straight inheritance, dynamic column resolution, and parameter reconstruction capabilities.

Parameters:
  • max_bins – Maximum number of bins to create. Controls the granularity of binning. Can be an integer or a string expression like ‘sqrt’, ‘log2’, etc. for dynamic calculation based on data size. If None, uses configuration default.

  • min_samples_per_bin – Minimum number of samples required per bin. Ensures statistical significance of bins. Must be positive integer. If None, uses configuration default.

  • increasing – Whether to enforce increasing monotonicity (True) or decreasing monotonicity (False). True means higher feature values correspond to higher target values. If None, uses configuration default.

  • y_min – Minimum value for the fitted isotonic function output. Clips the fitted values to be at least this value. If None, no minimum constraint.

  • y_max – Maximum value for the fitted isotonic function output. Clips the fitted values to be at most this value. If None, no maximum constraint.

  • min_change_threshold – Minimum relative change in fitted values required to create a new bin boundary. Controls sensitivity to function changes. Must be positive. If None, uses configuration default.

  • clip – Whether to clip values outside the fitted range to the nearest bin edge. If None, uses configuration default.

  • preserve_dataframe – Whether to preserve pandas DataFrame structure in transform operations. If None, uses configuration default.

  • guidance_columns – Column specification for target/guidance data used in supervised binning. Can be column names, indices, or callable selector.

  • bin_edges – Pre-computed bin edges for reconstruction. Should not be provided during normal usage.

  • bin_representatives – Pre-computed bin representatives for reconstruction. Should not be provided during normal usage.

  • class – Class name for reconstruction compatibility. Internal use only.

  • module – Module name for reconstruction compatibility. Internal use only.

max_bins

Maximum number of bins to create

min_samples_per_bin

Minimum samples required per bin

increasing

Whether monotonicity is increasing or decreasing

y_min

Minimum constraint for fitted values

y_max

Maximum constraint for fitted values

min_change_threshold

Threshold for significant changes in fitted function

Example

>>> import numpy as np
>>> from binlearn.methods import IsotonicBinning
>>>
>>> # Create data with monotonic relationship
>>> np.random.seed(42)
>>> X = np.random.uniform(0, 10, 1000).reshape(-1, 1)
>>> # Target increases monotonically with some noise
>>> y = 2 * X.flatten() + np.random.normal(0, 1, 1000)
>>>
>>> # Initialize isotonic binning
>>> binner = IsotonicBinning(
...     max_bins=5,
...     min_samples_per_bin=50,
...     increasing=True,
...     min_change_threshold=0.05
... )
>>>
>>> # Fit with target data
>>> binner.fit(X, y)
>>> X_binned = binner.transform(X)
>>>
>>> # Check monotonic preservation
>>> bin_means = []
>>> for bin_idx in range(len(binner.bin_edges_[0]) - 1):
...     bin_mask = X_binned[:, 0] == bin_idx
...     bin_means.append(np.mean(y[bin_mask]))
>>> print("Bin target means:", bin_means)  # Should be monotonically increasing

Note

  • Requires target/guidance data for supervised learning of monotonic relationships

  • Preserves monotonic relationship between features and average target values within bins

  • Particularly useful for risk modeling, scoring, and ranking applications

  • Handles constant features and insufficient variability gracefully

  • Each column is processed independently with its corresponding target data

  • The fitted isotonic models are stored and can be used for analysis

See also

Chi2Binning: Statistical significance-based supervised binning TreeBinning: Decision tree-based supervised binning SupervisedBinningBase: Base class for supervised binning methods

References

Robertson, T., Wright, F. T., & Dykstra, R. L. (1988). Order Restricted Statistical

Inference.

__init__(max_bins: int | str | None = None, min_samples_per_bin: int | None = None, increasing: bool | None = None, y_min: float | None = None, y_max: float | None = None, min_change_threshold: float | None = None, clip: bool | None = None, preserve_dataframe: bool | None = None, guidance_columns: Any | None = None, *, bin_edges: dict[Any, list[float]] | None = None, bin_representatives: dict[Any, list[float]] | None = None, class_: str | None = None, module_: str | None = None)[source]

Initialize Isotonic binning with monotonicity and quality parameters.

Sets up isotonic regression-based binning with specified parameters for monotonicity preservation and bin quality control. Applies configuration defaults for any unspecified parameters and validates the resulting configuration.

Parameters:
  • max_bins – Maximum number of bins to create per column. Controls granularity of the binning. Can be: - Integer: Exact maximum number of bins - String: Dynamic calculation expression (‘sqrt’, ‘log2’, etc.) Must be positive. If None, uses configuration default.

  • min_samples_per_bin – Minimum number of samples required per bin. Ensures statistical reliability of each bin. Must be positive integer. If None, uses configuration default.

  • increasing – Whether to enforce increasing monotonicity (True) or decreasing monotonicity (False). True means higher feature values should correspond to higher average target values. If None, uses configuration default.

  • y_min – Minimum value constraint for the fitted isotonic function output. Clips fitted values to be at least this value. Must be numeric. If None, no minimum constraint is applied.

  • y_max – Maximum value constraint for the fitted isotonic function output. Clips fitted values to be at most this value. Must be numeric and greater than y_min if both are specified. If None, no maximum constraint.

  • min_change_threshold – Minimum relative change in fitted values required to create a new bin boundary. Controls sensitivity to changes in the isotonic function. Must be positive float. If None, uses configuration default.

  • clip – Whether to clip transformed values outside the fitted range to the nearest bin edge. If None, uses configuration default.

  • preserve_dataframe – Whether to preserve pandas DataFrame structure in transform operations. If None, uses configuration default.

  • guidance_columns – Column specification for target/guidance data. Can be column names, indices, or callable selector. Required for supervised binning during fit operations.

  • bin_edges – Pre-computed bin edges dictionary for reconstruction. Internal use only - should not be provided during normal initialization.

  • bin_representatives – Pre-computed representatives dictionary for reconstruction. Internal use only.

  • class – Class name string for reconstruction compatibility. Internal use only.

  • module – Module name string for reconstruction compatibility. Internal use only.

Example

>>> # Standard initialization for increasing monotonic relationship
>>> binner = IsotonicBinning(
...     max_bins=8,
...     min_samples_per_bin=30,
...     increasing=True,
...     min_change_threshold=0.02
... )
>>>
>>> # Decreasing monotonic relationship with value constraints
>>> binner = IsotonicBinning(
...     max_bins=6,
...     min_samples_per_bin=50,
...     increasing=False,
...     y_min=0.0,
...     y_max=1.0,
...     guidance_columns=['risk_score']
... )
>>>
>>> # Use configuration defaults
>>> binner = IsotonicBinning(guidance_columns='target')

Note

  • Parameter validation occurs during initialization

  • Configuration defaults are applied for None parameters

  • The increasing parameter is crucial for defining the expected relationship direction

  • y_min and y_max constraints help with numerical stability and domain knowledge

    enforcement

  • Reconstruction parameters should not be provided during normal usage

  • Guidance columns must be specified for supervised binning to work properly

classmethod __init_subclass__(**kwargs)

Set the set_{method}_request methods.

This uses PEP-487 [1] to set the set_{method}_request methods. It looks for the information available in the set default values which are set using __metadata_request__* class attributes, or inferred from method signatures.

The __metadata_request__* class attributes are used when a method does not explicitly accept a metadata through its arguments or if the developer would like to specify a request value for those metadata which are different from the default None.

References

static check_data_quality(data: ndarray[Any, Any], name: str = 'data') None

Check data quality and issue warnings if needed.

property feature_names_in_: list[str] | None

Get feature names.

fit(X: Any, y: Any | None = None, **fit_params: Any) GeneralBinningBase

Fit the binning transformer with comprehensive orchestration.

This method orchestrates the complete fitting process, handling parameter validation, input preprocessing, column separation, and routing to the appropriate fitting strategy (joint vs independent).

Parameters:
  • X – Input data to fit the binning transformer on. Can be: - pandas.DataFrame: Column names are preserved - polars.DataFrame: Column names are preserved - numpy.ndarray: Numeric column indices are used - array-like: Converted to numpy array

  • y – Target values for supervised binning methods. Ignored by unsupervised methods. Can be array-like or None.

  • **fit_params – Additional fitting parameters passed to the specific binning algorithm implementation. Common parameters include: - guidance_data: Alternative guidance data (conflicts with fit_jointly=True)

Returns:

The fitted binning transformer instance.

Return type:

self

Raises:
  • ValueError – If parameter validation fails, inputs are invalid, or conflicting parameters are provided (e.g., fit_jointly=True with guidance_data).

  • BinningError – If the binning algorithm fails to fit the data.

  • RuntimeError – If an unexpected error occurs during fitting.

Example

>>> from binlearn import EqualWidthBinning
>>> import pandas as pd
>>> X = pd.DataFrame({'feature1': [1, 2, 3, 4, 5], 'feature2': [10, 20, 30, 40, 50]})
>>> binner = EqualWidthBinning(n_bins=3)
>>> binner.fit(X)
EqualWidthBinning(...)

Note

The method automatically handles column separation when guidance_columns is specified, routing guidance columns separately from binning columns. The fitting strategy (joint vs independent) is determined by the fit_jointly parameter.

fit_transform(X, y=None, **fit_params)

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

Parameters:
  • X (array-like of shape (n_samples, n_features)) – Input samples.

  • y (array-like of shape (n_samples,) or (n_samples, n_outputs), default=None) – Target values (None for unsupervised transformations).

  • **fit_params (dict) – Additional fit parameters.

Returns:

X_new – Transformed array.

Return type:

ndarray array of shape (n_samples, n_features_new)

get_input_columns() list[Any] | None

Get input columns for data preparation.

This method should be overridden by derived classes to provide appropriate column information without exposing binning-specific concepts.

Returns:

Column information or None if not available

get_metadata_routing()

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns:

routing – A MetadataRequest encapsulating routing information.

Return type:

MetadataRequest

get_params(deep: bool = True) dict[str, Any]

Get parameters for this estimator, including fitted parameters.

This method extends sklearn’s standard get_params to include fitted parameters when the estimator is fitted, enabling complete object reconstruction through the get_params/set_params interface. This is essential for pipeline persistence and model serialization.

Parameters:

deep – If True, returns parameters for sub-estimators (not applicable here but maintained for sklearn compatibility).

Returns:

  • Constructor parameters extracted from __init__ signature

  • Fitted parameters (if estimator is fitted) mapped from attributes

  • Class metadata (class_, module_) for automatic reconstruction

Return type:

Dictionary of parameter names mapped to their values, including

Example

>>> binner = EqualWidthBinning(n_bins=5)
>>> params = binner.get_params()
>>> print(params)
{'n_bins': 5, 'clip': None, ..., 'class_': 'EqualWidthBinning', 'module_': '...'}
>>>
>>> binner.fit(X)
>>> fitted_params = binner.get_params()
>>> # Now includes: {'bin_edges': {...}, 'bin_representatives': {...}, ...}

Note

  • Automatically extracts constructor parameters from __init__ signature

  • Includes fitted parameters only when estimator is fitted

  • Adds class metadata for reconstruction workflows

  • Excludes internal sklearn attributes like n_features_in_

  • class_ and module_ parameters are handled specially during set_params

inverse_transform(X: Any) Any

Inverse transform from bin indices back to representative values.

Converts discrete bin indices back to their representative values, effectively reversing the binning transformation. This is useful for interpreting results or reconstructing approximate original values.

Parameters:

X – Input data containing bin indices to inverse transform. Should contain only binning columns (no guidance columns). Can be: - pandas.DataFrame: Column names should match binning columns - polars.DataFrame: Column names should match binning columns - numpy.ndarray: Must have same number of binning columns - array-like: Converted to numpy array

Returns:

Inverse transformed data where bin indices are replaced with their representative values (typically bin centers). Output format matches the preserve_dataframe setting.

Raises:
  • RuntimeError – If the transformer has not been fitted yet.

  • ValueError – If input data has wrong number of columns or invalid format.

  • BinningError – If inverse transformation fails.

Example

>>> # After fitting and transforming
>>> X_binned = [[0, 1], [1, 0], [2, 2]]  # Bin indices
>>> X_reconstructed = binner.inverse_transform(X_binned)
>>> print(X_reconstructed)
[[0.5, 1.5], [1.5, 0.5], [2.5, 2.5]]  # Representative values

Note

For guided binning (when guidance_columns is specified), the input should only contain the binning columns, not the guidance columns. The number of input columns must match the number of binning columns.

property n_features_in_: int

Get number of features.

set_output(*, transform=None)

Set output container.

See Introducing the set_output API for an example on how to use the API.

Parameters:

transform ({"default", "pandas", "polars"}, default=None) –

Configure output of transform and fit_transform.

  • ”default”: Default output format of a transformer

  • ”pandas”: DataFrame output

  • ”polars”: Polars output

  • None: Transform configuration is unchanged

Added in version 1.4: “polars” option was added.

Returns:

self – Estimator instance.

Return type:

estimator instance

set_params(**params: Any) SklearnIntegrationBase

Set the parameters of this estimator.

This method supports reconstruction workflows by handling fitted parameters that come from get_params() output (without underscores) and setting them as fitted attributes (with underscores).

Parameters:

**params – Parameters to set. Can include: - Regular constructor parameters (n_bins, clip, etc.) - Fitted parameters from get_params (bin_edges, bin_representatives) - Class metadata (ignored during reconstruction)

Returns:

Returns the instance itself.

Return type:

self

transform(X: Any) Any

Transform input data using fitted binning parameters.

Applies the fitted binning transformation to new data, converting continuous values to discrete bin indices or representatives. Handles column separation when guidance columns are present.

Parameters:

X – Input data to transform. Must have the same structure as the data used during fitting (same number of columns). Can be: - pandas.DataFrame: Column names should match training data - polars.DataFrame: Column names should match training data - numpy.ndarray: Must have same number of columns as training - array-like: Converted to numpy array

Returns:

Transformed data where continuous values are replaced with bin indices or representative values. The output format depends on: - preserve_dataframe setting: DataFrame vs array format - binning method: indices vs representatives - guidance_columns: only binning columns are transformed

Raises:
  • RuntimeError – If the transformer has not been fitted yet.

  • ValueError – If the input data has incompatible structure or format.

  • BinningError – If transformation fails due to data issues.

Example

>>> # After fitting
>>> X_new = pd.DataFrame({'feature1': [1.5, 2.5], 'feature2': [15, 25]})
>>> X_binned = binner.transform(X_new)
>>> print(X_binned)
[[0, 0], [1, 1]]  # Bin indices

Note

When guidance_columns is specified, only the binning columns are transformed. Guidance columns are filtered out from the output. The method preserves the original data format when preserve_dataframe=True.

static validate_array_like(data: Any, name: str = 'data', allow_none: bool = False) ndarray[Any, Any] | None

Validate and convert array-like input to numpy array.

This method provides robust validation and conversion of various input formats to numpy arrays, with comprehensive error handling and helpful suggestions for common issues.

Parameters:
  • data – Input data to validate and convert. Can be: - numpy.ndarray: Used directly - pandas.DataFrame/Series: Converted to numpy array - polars.DataFrame: Converted to numpy array - list, tuple: Converted to numpy array - None: Allowed only if allow_none=True

  • name – Name of the data parameter for error messages. Used to provide context in error messages (e.g., “X”, “y”, “guidance_data”).

  • allow_none – Whether to allow None as a valid input. If True, None is returned unchanged; if False, None raises InvalidDataError.

Returns:

Validated numpy array, or None if data is None and allow_none=True. The returned array maintains the same data content but is guaranteed to be a numpy array.

Raises:

InvalidDataError – If validation fails: - data is None when allow_none=False - data cannot be converted to numpy array - Conversion process encounters errors

Example

>>> # Valid inputs
>>> arr = ValidationMixin.validate_array_like([1, 2, 3], "X")
>>> print(type(arr))
<class 'numpy.ndarray'>
>>>
>>> # Allow None
>>> result = ValidationMixin.validate_array_like(None, "y", allow_none=True)
>>> print(result)
None
>>>
>>> # Invalid input
>>> ValidationMixin.validate_array_like(None, "X", allow_none=False)
InvalidDataError: X cannot be None

Note

This method focuses on format validation and conversion. Content validation (like checking for NaN values) should be done separately using other validation methods.

static validate_column_specification(columns: Any, data_shape: tuple[int, ...]) list[Any]

Validate column specifications.

static validate_guidance_columns(guidance_cols: Any, binning_cols: list[Any], data_shape: tuple[int, ...]) list[Any]

Validate guidance column specifications.

validate_guidance_data(guidance_data: Any, name: str = 'guidance_data') ndarray[Any, Any]

Validate and preprocess guidance data for supervised binning.

Ensures that the guidance data is appropriate for supervised binning by validating its shape and checking for data quality issues.

Parameters:
  • guidance_data – Raw guidance/target data to validate. Should be a 2D array with shape (n_samples, 1) or 1D array with shape (n_samples,).

  • name – Name used in error messages for better debugging context.

Returns:

Validated and preprocessed guidance data with shape (n_samples, 1).

Raises:

ValidationError – If guidance data has invalid shape or format.

Overview

IsotonicBinning creates bins using isotonic regression to find optimal cut points that preserve monotonic relationships between features and targets. The transformer fits an isotonic (non-decreasing) function to the data and identifies significant changes in this function to determine bin boundaries.

This method is particularly effective when:

  • There’s a known monotonic relationship between feature and target

  • You want bins that respect monotonic ordering

  • Traditional tree-based methods might create non-monotonic splits

  • You need interpretable bins that maintain logical ordering

Key Features

  • Monotonicity Preservation: Ensures bins respect monotonic relationships

  • Isotonic Regression: Uses sklearn’s IsotonicRegression for optimal fitting

  • Automatic Cut Points: Identifies significant changes in isotonic function

  • Flexible Direction: Supports both increasing and decreasing monotonicity

  • Sample Size Control: Ensures minimum samples per bin for statistical validity

  • Supervised Learning: Uses target variable information for optimal binning

  • Sklearn Compatibility: Full transformer interface with fit/transform methods

  • DataFrame Support: Preserves pandas/polars column names and structure

Basic Usage

import numpy as np
import pandas as pd
from binlearn.methods import IsotonicBinning

# Create sample data with monotonic relationship
np.random.seed(42)
X = np.random.uniform(0, 10, 500).reshape(-1, 1)
y = 2 * X.flatten() + np.random.normal(0, 1, 500)  # Linear + noise

# Apply isotonic binning
binner = IsotonicBinning(
    max_bins=6,
    min_samples_per_bin=20,
    increasing=True
)

# Fit using X and y (sklearn style)
binner.fit(X, y)
X_binned = binner.transform(X)

print(f"Original shape: {X.shape}")
print(f"Binned shape: {X_binned.shape}")
print(f"Bin edges: {binner.bin_edges_[0]}")

Classification Example

from sklearn.datasets import make_classification

# Create classification data with monotonic relationship
X, y = make_classification(
    n_samples=1000,
    n_features=1,
    n_redundant=0,
    n_clusters_per_class=1,
    random_state=42
)

# Sort by feature to create monotonic relationship
sort_idx = np.argsort(X.flatten())
X_sorted = X[sort_idx]
y_sorted = y[sort_idx]

binner = IsotonicBinning(
    max_bins=8,
    min_samples_per_bin=30,
    increasing=True,
    min_change_threshold=0.05
)

binner.fit(X_sorted, y_sorted)
X_binned = binner.transform(X_sorted)

print(f"Created {len(binner.bin_edges_[0]) - 1} bins")
print(f"Bin edges: {binner.bin_edges_[0]}")

DataFrame Example with Guidance Columns

# Create DataFrame with target column
df = pd.DataFrame({
    'age': np.random.uniform(18, 80, 1000),
    'income': np.random.uniform(20000, 150000, 1000),
    'credit_score': np.random.uniform(300, 850, 1000)
})

# Create monotonic target: risk increases with age, decreases with income/credit
df['default_risk'] = (
    0.3 * (df['age'] - 18) / 62 +  # Age increases risk
    -0.4 * (df['income'] - 20000) / 130000 +  # Income decreases risk
    -0.3 * (df['credit_score'] - 300) / 550 +  # Credit decreases risk
    np.random.normal(0, 0.1, 1000)
)

# Bin each feature with appropriate monotonicity
age_binner = IsotonicBinning(
    guidance_columns=['default_risk'],
    max_bins=5,
    increasing=True,  # Risk increases with age
    preserve_dataframe=True
)

income_binner = IsotonicBinning(
    guidance_columns=['default_risk'],
    max_bins=6,
    increasing=False,  # Risk decreases with income
    preserve_dataframe=True
)

# Apply binning
df_age_binned = age_binner.fit_transform(df[['age', 'default_risk']])
df_income_binned = income_binner.fit_transform(df[['income', 'default_risk']])

Advanced Configuration

# Fine-tuned isotonic binning for specific requirements

# Conservative binning (fewer bins, stricter requirements)
conservative_binner = IsotonicBinning(
    max_bins=5,
    min_samples_per_bin=50,     # Larger bins for stability
    min_change_threshold=0.1,   # Require larger changes
    increasing=True,
    y_min=0.0,                  # Bound target values
    y_max=1.0
)

# Granular binning (more bins, sensitive to changes)
granular_binner = IsotonicBinning(
    max_bins=15,
    min_samples_per_bin=10,     # Smaller bins allowed
    min_change_threshold=0.01,  # Sensitive to small changes
    increasing=True
)

Parameter Guide

max_bins (int, default=10)

Maximum number of bins to create. Actual number may be smaller:

  • Higher values: Allow more granular binning

  • Lower values: Force simpler, broader bins

  • Consider your model’s complexity needs

min_samples_per_bin (int, default=5)

Minimum samples required per bin for statistical validity:

  • Higher values: More stable bins, fewer total bins

  • Lower values: More granular binning, potentially less stable

  • Rule of thumb: At least 30 for regression, 10+ per class for classification

increasing (bool, default=True)

Direction of monotonic relationship:

  • True: Higher feature values → higher target values

  • False: Higher feature values → lower target values

  • Must match your domain knowledge

min_change_threshold (float, default=0.01)

Minimum relative change in isotonic function to create new bin:

  • Smaller values: More sensitive, create more bins

  • Larger values: Less sensitive, create fewer bins

  • Typical range: 0.005 to 0.1

y_min, y_max (float, optional)

Bounds for target values in isotonic regression:

  • Helps constrain the isotonic function

  • Useful for normalized targets or known ranges

  • If None, uses data min/max

Scikit-learn Pipeline Integration

from sklearn.pipeline import Pipeline
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error

# Create pipeline with isotonic binning
pipeline = Pipeline([
    ('binning', IsotonicBinning(max_bins=6, increasing=True)),
    ('regressor', RandomForestRegressor(random_state=42))
])

# Use in ML workflow
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

pipeline.fit(X_train, y_train)
y_pred = pipeline.predict(X_test)

mse = mean_squared_error(y_test, y_pred)
print(f"Pipeline MSE: {mse:.4f}")

Tips for Best Results

  1. Validate monotonicity first: Use correlation analysis to confirm monotonic relationships

  2. Choose appropriate direction: Set increasing parameter based on domain knowledge

  3. Balance bin size and count: Larger min_samples_per_bin gives more stable bins

  4. Consider target distribution: Normalize targets if they have extreme ranges

  5. Validate on holdout data: Check that monotonic relationship holds on test data

See Also

Overview

IsotonicBinning creates bins using isotonic regression to find optimal cut points that preserve monotonic relationships between features and targets. The transformer fits an isotonic (non-decreasing) function to the data and identifies significant changes in this function to determine bin boundaries.

This approach is particularly effective when:

  • Monotonic relationships exist between features and targets

  • Ordinal consistency is important in your binning

  • Regulatory requirements mandate monotonic scoring models

  • Risk scoring applications where higher values should consistently indicate higher risk

Key Features

  • Monotonicity Preservation: Ensures bins respect monotonic ordering relationships

  • Regression-Based: Uses isotonic regression for optimal cut point identification

  • Automatic Boundary Detection: Identifies significant changes in fitted isotonic function

  • Flexible Direction: Supports both increasing and decreasing monotonicity

  • Sample Control: Ensures minimum samples per bin for statistical reliability

  • Sklearn Compatibility: Full transformer interface with fit/transform methods

  • DataFrame Support: Preserves pandas/polars column names and structure

Basic Usage

import numpy as np
import pandas as pd
from binlearn.methods import IsotonicBinning

# Create data with monotonic relationship
np.random.seed(42)
X = np.random.rand(1000, 2)

# Create target with monotonic relationship to first feature
y = 2 * X[:, 0] + 0.5 * np.random.randn(1000)

# Apply isotonic binning
binner = IsotonicBinning(
    max_bins=8,
    min_samples_per_bin=50,
    increasing=True
)

# Method 1: Using fit with X and y (sklearn style)
binner.fit(X, y)
X_binned = binner.transform(X)

print(f"Original shape: {X.shape}")
print(f"Binned shape: {X_binned.shape}")
print(f"Bins for feature 0: {len(binner.bin_edges_[0]) - 1}")

DataFrame Example with Target Column

# Create DataFrame with monotonic relationships
df = pd.DataFrame({
    'age': np.random.uniform(18, 80, 1000),
    'income': np.random.uniform(20000, 150000, 1000),
    'experience': np.random.uniform(0, 40, 1000)
})

# Create target with monotonic relationships
df['default_risk'] = (
    0.01 * df['age'] +
    -0.00001 * df['income'] +
    -0.005 * df['experience'] +
    0.2 * np.random.randn(1000)
)

# Method 2: Using guidance_columns (binlearn style)
binner = IsotonicBinning(
    guidance_columns=['default_risk'],
    max_bins=6,
    min_samples_per_bin=100,
    preserve_dataframe=True
)

df_binned = binner.fit_transform(df)

print(f"Bin edges for age: {binner.bin_edges_['age']}")
print(f"Bin edges for income: {binner.bin_edges_['income']}")

Decreasing Monotonicity Example

# Example where higher feature values should lead to lower target values
X_credit = np.random.uniform(0, 100, 500).reshape(-1, 1)  # Credit score
y_default = 1 / (1 + np.exp(0.1 * (X_credit.flatten() - 50)))  # Lower default prob for higher scores

# Use decreasing monotonicity
binner = IsotonicBinning(
    max_bins=5,
    min_samples_per_bin=50,
    increasing=False,  # Higher credit score = lower default probability
    min_change_threshold=0.05
)

binner.fit(X_credit, y_default)
X_credit_binned = binner.transform(X_credit)

# Verify monotonicity: bin representatives should decrease
print("Bin representatives:", binner.bin_representatives_[0])
print("Monotonically decreasing:",
      all(binner.bin_representatives_[0][i] >= binner.bin_representatives_[0][i+1]
          for i in range(len(binner.bin_representatives_[0])-1)))

Advanced Configuration

# Fine-tuned isotonic binning for different scenarios

# High-precision binning (more sensitive to changes)
precise_binner = IsotonicBinning(
    max_bins=12,
    min_samples_per_bin=30,
    min_change_threshold=0.005,  # More sensitive to changes
    increasing=True,
    y_min=0.0,                   # Explicit bounds
    y_max=1.0
)

# Robust binning (less sensitive, larger bins)
robust_binner = IsotonicBinning(
    max_bins=6,
    min_samples_per_bin=100,     # Larger bins for stability
    min_change_threshold=0.1,    # Less sensitive to changes
    increasing=True
)

Classification Example

from sklearn.datasets import make_classification
from sklearn.preprocessing import LabelEncoder

# Create classification data
X_class, y_class = make_classification(
    n_samples=1000,
    n_features=3,
    n_classes=3,
    n_redundant=0,
    random_state=42
)

# Isotonic binning works with classification by treating classes ordinally
binner = IsotonicBinning(
    max_bins=7,
    min_samples_per_bin=50,
    increasing=True
)

binner.fit(X_class, y_class)
X_class_binned = binner.transform(X_class)

print(f"Classification bins: {[len(edges)-1 for edges in binner.bin_edges_.values()]}")

Risk Scoring Pipeline

from sklearn.pipeline import Pipeline
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import roc_auc_score

# Create a risk scoring pipeline with monotonic binning
risk_pipeline = Pipeline([
    ('isotonic_binning', IsotonicBinning(
        max_bins=8,
        min_samples_per_bin=50,
        increasing=True
    )),
    ('logistic_regression', LogisticRegression(random_state=42))
])

# Use in credit risk modeling
X_train, X_test, y_train, y_test = train_test_split(
    X, y > np.median(y), test_size=0.2, random_state=42
)

risk_pipeline.fit(X_train, y_train)
y_proba = risk_pipeline.predict_proba(X_test)[:, 1]
auc_score = roc_auc_score(y_test, y_proba)

print(f"AUC with isotonic binning: {auc_score:.3f}")

Parameter Guide

max_bins (int, default=10)

Maximum number of bins to create per feature:

  • Higher values: More granular risk segments

  • Lower values: Broader, more stable risk categories

  • Consider regulatory requirements and model interpretability

min_samples_per_bin (int, default=5)

Minimum samples required per bin for statistical reliability:

  • Higher values: More stable bins, better statistical power

  • Lower values: More granular binning, potential instability

  • Rule of thumb: At least 30-50 for reliable estimates

increasing (bool, default=True)

Direction of monotonicity to enforce:

  • True: Higher feature values → higher target values

  • False: Higher feature values → lower target values

  • Choose based on domain knowledge and expected relationships

min_change_threshold (float, default=0.01)

Minimum relative change in fitted values to create new bin:

  • Lower values: More sensitive, more bins

  • Higher values: Less sensitive, fewer bins

  • Typical range: 0.005 (sensitive) to 0.1 (robust)

y_min, y_max (float, optional)

Bounds for target values in isotonic regression:

  • Explicit bounds can improve numerical stability

  • Useful for probability targets: y_min=0.0, y_max=1.0

  • Auto-detected from data if not specified

Monotonicity Validation

# Function to validate monotonicity in binned results
def validate_monotonicity(binner, feature_idx=0, increasing=True):
    """Validate that bin representatives follow monotonic order."""
    reps = binner.bin_representatives_[feature_idx]

    if increasing:
        is_monotonic = all(reps[i] <= reps[i+1] for i in range(len(reps)-1))
        direction = "increasing"
    else:
        is_monotonic = all(reps[i] >= reps[i+1] for i in range(len(reps)-1))
        direction = "decreasing"

    print(f"Monotonicity ({direction}): {is_monotonic}")
    print(f"Representatives: {reps}")
    return is_monotonic

# Validate our binning results
validate_monotonicity(binner, feature_idx=0, increasing=True)

Tips for Best Results

  1. Verify monotonic relationships exist in your data before applying

  2. Choose appropriate min_samples_per_bin based on your sample size

  3. Adjust min_change_threshold based on noise level in your data

  4. Consider regulatory constraints for financial/medical applications

  5. Validate results on holdout data to avoid overfitting

Common Use Cases

  • Credit Risk Scoring: Age, income, debt-to-income ratio

  • Medical Risk Assessment: Biomarkers, age, symptom severity

  • Marketing Response: Customer value, engagement metrics

  • Predictive Maintenance: Usage hours, temperature readings

  • Quality Control: Process parameters, environmental conditions

See Also