IsotonicBinning
- class binlearn.methods.IsotonicBinning(max_bins: int | str | None = None, min_samples_per_bin: int | None = None, increasing: bool | None = None, y_min: float | None = None, y_max: float | None = None, min_change_threshold: float | None = None, clip: bool | None = None, preserve_dataframe: bool | None = None, guidance_columns: Any | None = None, *, bin_edges: dict[Any, list[float]] | None = None, bin_representatives: dict[Any, list[float]] | None = None, class_: str | None = None, module_: str | None = None)[source]
Bases:
SupervisedBinningBaseIsotonic regression-based monotonic binning implementation using clean architecture.
Creates bins using isotonic regression to find optimal cut points that preserve monotonic relationships between features and targets. The transformer fits an isotonic (monotonic, non-decreasing or non-increasing) function to the data and identifies significant changes in this function to determine bin boundaries.
This method is particularly valuable for cases where domain knowledge suggests a monotonic relationship between features and targets, such as risk modeling, credit scoring, or any application where preserving order relationships is critical. The isotonic regression ensures that the average target values within bins maintain the specified monotonic relationship.
The algorithm works by: 1. Sorting data by feature values 2. Fitting an isotonic regression model to preserve monotonicity 3. Identifying cut points where significant changes occur in the fitted function 4. Creating bins that respect both the monotonic constraint and the minimum samples requirement
When insufficient variability is found in the fitted isotonic function, the algorithm creates a single bin or falls back to simple boundary definitions.
This implementation follows the clean binlearn architecture with straight inheritance, dynamic column resolution, and parameter reconstruction capabilities.
- Parameters:
max_bins – Maximum number of bins to create. Controls the granularity of binning. Can be an integer or a string expression like ‘sqrt’, ‘log2’, etc. for dynamic calculation based on data size. If None, uses configuration default.
min_samples_per_bin – Minimum number of samples required per bin. Ensures statistical significance of bins. Must be positive integer. If None, uses configuration default.
increasing – Whether to enforce increasing monotonicity (True) or decreasing monotonicity (False). True means higher feature values correspond to higher target values. If None, uses configuration default.
y_min – Minimum value for the fitted isotonic function output. Clips the fitted values to be at least this value. If None, no minimum constraint.
y_max – Maximum value for the fitted isotonic function output. Clips the fitted values to be at most this value. If None, no maximum constraint.
min_change_threshold – Minimum relative change in fitted values required to create a new bin boundary. Controls sensitivity to function changes. Must be positive. If None, uses configuration default.
clip – Whether to clip values outside the fitted range to the nearest bin edge. If None, uses configuration default.
preserve_dataframe – Whether to preserve pandas DataFrame structure in transform operations. If None, uses configuration default.
guidance_columns – Column specification for target/guidance data used in supervised binning. Can be column names, indices, or callable selector.
bin_edges – Pre-computed bin edges for reconstruction. Should not be provided during normal usage.
bin_representatives – Pre-computed bin representatives for reconstruction. Should not be provided during normal usage.
class – Class name for reconstruction compatibility. Internal use only.
module – Module name for reconstruction compatibility. Internal use only.
- max_bins
Maximum number of bins to create
- min_samples_per_bin
Minimum samples required per bin
- increasing
Whether monotonicity is increasing or decreasing
- y_min
Minimum constraint for fitted values
- y_max
Maximum constraint for fitted values
- min_change_threshold
Threshold for significant changes in fitted function
Example
>>> import numpy as np >>> from binlearn.methods import IsotonicBinning >>> >>> # Create data with monotonic relationship >>> np.random.seed(42) >>> X = np.random.uniform(0, 10, 1000).reshape(-1, 1) >>> # Target increases monotonically with some noise >>> y = 2 * X.flatten() + np.random.normal(0, 1, 1000) >>> >>> # Initialize isotonic binning >>> binner = IsotonicBinning( ... max_bins=5, ... min_samples_per_bin=50, ... increasing=True, ... min_change_threshold=0.05 ... ) >>> >>> # Fit with target data >>> binner.fit(X, y) >>> X_binned = binner.transform(X) >>> >>> # Check monotonic preservation >>> bin_means = [] >>> for bin_idx in range(len(binner.bin_edges_[0]) - 1): ... bin_mask = X_binned[:, 0] == bin_idx ... bin_means.append(np.mean(y[bin_mask])) >>> print("Bin target means:", bin_means) # Should be monotonically increasing
Note
Requires target/guidance data for supervised learning of monotonic relationships
Preserves monotonic relationship between features and average target values within bins
Particularly useful for risk modeling, scoring, and ranking applications
Handles constant features and insufficient variability gracefully
Each column is processed independently with its corresponding target data
The fitted isotonic models are stored and can be used for analysis
See also
Chi2Binning: Statistical significance-based supervised binning TreeBinning: Decision tree-based supervised binning SupervisedBinningBase: Base class for supervised binning methods
References
- Robertson, T., Wright, F. T., & Dykstra, R. L. (1988). Order Restricted Statistical
Inference.
- __init__(max_bins: int | str | None = None, min_samples_per_bin: int | None = None, increasing: bool | None = None, y_min: float | None = None, y_max: float | None = None, min_change_threshold: float | None = None, clip: bool | None = None, preserve_dataframe: bool | None = None, guidance_columns: Any | None = None, *, bin_edges: dict[Any, list[float]] | None = None, bin_representatives: dict[Any, list[float]] | None = None, class_: str | None = None, module_: str | None = None)[source]
Initialize Isotonic binning with monotonicity and quality parameters.
Sets up isotonic regression-based binning with specified parameters for monotonicity preservation and bin quality control. Applies configuration defaults for any unspecified parameters and validates the resulting configuration.
- Parameters:
max_bins – Maximum number of bins to create per column. Controls granularity of the binning. Can be: - Integer: Exact maximum number of bins - String: Dynamic calculation expression (‘sqrt’, ‘log2’, etc.) Must be positive. If None, uses configuration default.
min_samples_per_bin – Minimum number of samples required per bin. Ensures statistical reliability of each bin. Must be positive integer. If None, uses configuration default.
increasing – Whether to enforce increasing monotonicity (True) or decreasing monotonicity (False). True means higher feature values should correspond to higher average target values. If None, uses configuration default.
y_min – Minimum value constraint for the fitted isotonic function output. Clips fitted values to be at least this value. Must be numeric. If None, no minimum constraint is applied.
y_max – Maximum value constraint for the fitted isotonic function output. Clips fitted values to be at most this value. Must be numeric and greater than y_min if both are specified. If None, no maximum constraint.
min_change_threshold – Minimum relative change in fitted values required to create a new bin boundary. Controls sensitivity to changes in the isotonic function. Must be positive float. If None, uses configuration default.
clip – Whether to clip transformed values outside the fitted range to the nearest bin edge. If None, uses configuration default.
preserve_dataframe – Whether to preserve pandas DataFrame structure in transform operations. If None, uses configuration default.
guidance_columns – Column specification for target/guidance data. Can be column names, indices, or callable selector. Required for supervised binning during fit operations.
bin_edges – Pre-computed bin edges dictionary for reconstruction. Internal use only - should not be provided during normal initialization.
bin_representatives – Pre-computed representatives dictionary for reconstruction. Internal use only.
class – Class name string for reconstruction compatibility. Internal use only.
module – Module name string for reconstruction compatibility. Internal use only.
Example
>>> # Standard initialization for increasing monotonic relationship >>> binner = IsotonicBinning( ... max_bins=8, ... min_samples_per_bin=30, ... increasing=True, ... min_change_threshold=0.02 ... ) >>> >>> # Decreasing monotonic relationship with value constraints >>> binner = IsotonicBinning( ... max_bins=6, ... min_samples_per_bin=50, ... increasing=False, ... y_min=0.0, ... y_max=1.0, ... guidance_columns=['risk_score'] ... ) >>> >>> # Use configuration defaults >>> binner = IsotonicBinning(guidance_columns='target')
Note
Parameter validation occurs during initialization
Configuration defaults are applied for None parameters
The increasing parameter is crucial for defining the expected relationship direction
- y_min and y_max constraints help with numerical stability and domain knowledge
enforcement
Reconstruction parameters should not be provided during normal usage
Guidance columns must be specified for supervised binning to work properly
- classmethod __init_subclass__(**kwargs)
Set the
set_{method}_requestmethods.This uses PEP-487 [1] to set the
set_{method}_requestmethods. It looks for the information available in the set default values which are set using__metadata_request__*class attributes, or inferred from method signatures.The
__metadata_request__*class attributes are used when a method does not explicitly accept a metadata through its arguments or if the developer would like to specify a request value for those metadata which are different from the defaultNone.References
- static check_data_quality(data: ndarray[Any, Any], name: str = 'data') None
Check data quality and issue warnings if needed.
- fit(X: Any, y: Any | None = None, **fit_params: Any) GeneralBinningBase
Fit the binning transformer with comprehensive orchestration.
This method orchestrates the complete fitting process, handling parameter validation, input preprocessing, column separation, and routing to the appropriate fitting strategy (joint vs independent).
- Parameters:
X – Input data to fit the binning transformer on. Can be: - pandas.DataFrame: Column names are preserved - polars.DataFrame: Column names are preserved - numpy.ndarray: Numeric column indices are used - array-like: Converted to numpy array
y – Target values for supervised binning methods. Ignored by unsupervised methods. Can be array-like or None.
**fit_params – Additional fitting parameters passed to the specific binning algorithm implementation. Common parameters include: - guidance_data: Alternative guidance data (conflicts with fit_jointly=True)
- Returns:
The fitted binning transformer instance.
- Return type:
self
- Raises:
ValueError – If parameter validation fails, inputs are invalid, or conflicting parameters are provided (e.g., fit_jointly=True with guidance_data).
BinningError – If the binning algorithm fails to fit the data.
RuntimeError – If an unexpected error occurs during fitting.
Example
>>> from binlearn import EqualWidthBinning >>> import pandas as pd >>> X = pd.DataFrame({'feature1': [1, 2, 3, 4, 5], 'feature2': [10, 20, 30, 40, 50]}) >>> binner = EqualWidthBinning(n_bins=3) >>> binner.fit(X) EqualWidthBinning(...)
Note
The method automatically handles column separation when guidance_columns is specified, routing guidance columns separately from binning columns. The fitting strategy (joint vs independent) is determined by the fit_jointly parameter.
- fit_transform(X, y=None, **fit_params)
Fit to data, then transform it.
Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.
- Parameters:
X (array-like of shape (n_samples, n_features)) – Input samples.
y (array-like of shape (n_samples,) or (n_samples, n_outputs), default=None) – Target values (None for unsupervised transformations).
**fit_params (dict) – Additional fit parameters.
- Returns:
X_new – Transformed array.
- Return type:
ndarray array of shape (n_samples, n_features_new)
- get_input_columns() list[Any] | None
Get input columns for data preparation.
This method should be overridden by derived classes to provide appropriate column information without exposing binning-specific concepts.
- Returns:
Column information or None if not available
- get_metadata_routing()
Get metadata routing of this object.
Please check User Guide on how the routing mechanism works.
- Returns:
routing – A
MetadataRequestencapsulating routing information.- Return type:
MetadataRequest
- get_params(deep: bool = True) dict[str, Any]
Get parameters for this estimator, including fitted parameters.
This method extends sklearn’s standard get_params to include fitted parameters when the estimator is fitted, enabling complete object reconstruction through the get_params/set_params interface. This is essential for pipeline persistence and model serialization.
- Parameters:
deep – If True, returns parameters for sub-estimators (not applicable here but maintained for sklearn compatibility).
- Returns:
- Return type:
Dictionary of parameter names mapped to their values, including
Example
>>> binner = EqualWidthBinning(n_bins=5) >>> params = binner.get_params() >>> print(params) {'n_bins': 5, 'clip': None, ..., 'class_': 'EqualWidthBinning', 'module_': '...'} >>> >>> binner.fit(X) >>> fitted_params = binner.get_params() >>> # Now includes: {'bin_edges': {...}, 'bin_representatives': {...}, ...}
Note
Automatically extracts constructor parameters from __init__ signature
Includes fitted parameters only when estimator is fitted
Adds class metadata for reconstruction workflows
Excludes internal sklearn attributes like n_features_in_
class_ and module_ parameters are handled specially during set_params
- inverse_transform(X: Any) Any
Inverse transform from bin indices back to representative values.
Converts discrete bin indices back to their representative values, effectively reversing the binning transformation. This is useful for interpreting results or reconstructing approximate original values.
- Parameters:
X – Input data containing bin indices to inverse transform. Should contain only binning columns (no guidance columns). Can be: - pandas.DataFrame: Column names should match binning columns - polars.DataFrame: Column names should match binning columns - numpy.ndarray: Must have same number of binning columns - array-like: Converted to numpy array
- Returns:
Inverse transformed data where bin indices are replaced with their representative values (typically bin centers). Output format matches the preserve_dataframe setting.
- Raises:
RuntimeError – If the transformer has not been fitted yet.
ValueError – If input data has wrong number of columns or invalid format.
BinningError – If inverse transformation fails.
Example
>>> # After fitting and transforming >>> X_binned = [[0, 1], [1, 0], [2, 2]] # Bin indices >>> X_reconstructed = binner.inverse_transform(X_binned) >>> print(X_reconstructed) [[0.5, 1.5], [1.5, 0.5], [2.5, 2.5]] # Representative values
Note
For guided binning (when guidance_columns is specified), the input should only contain the binning columns, not the guidance columns. The number of input columns must match the number of binning columns.
- set_output(*, transform=None)
Set output container.
See Introducing the set_output API for an example on how to use the API.
- Parameters:
transform ({"default", "pandas", "polars"}, default=None) –
Configure output of transform and fit_transform.
”default”: Default output format of a transformer
”pandas”: DataFrame output
”polars”: Polars output
None: Transform configuration is unchanged
Added in version 1.4: “polars” option was added.
- Returns:
self – Estimator instance.
- Return type:
estimator instance
- set_params(**params: Any) SklearnIntegrationBase
Set the parameters of this estimator.
This method supports reconstruction workflows by handling fitted parameters that come from get_params() output (without underscores) and setting them as fitted attributes (with underscores).
- Parameters:
**params – Parameters to set. Can include: - Regular constructor parameters (n_bins, clip, etc.) - Fitted parameters from get_params (bin_edges, bin_representatives) - Class metadata (ignored during reconstruction)
- Returns:
Returns the instance itself.
- Return type:
self
- transform(X: Any) Any
Transform input data using fitted binning parameters.
Applies the fitted binning transformation to new data, converting continuous values to discrete bin indices or representatives. Handles column separation when guidance columns are present.
- Parameters:
X – Input data to transform. Must have the same structure as the data used during fitting (same number of columns). Can be: - pandas.DataFrame: Column names should match training data - polars.DataFrame: Column names should match training data - numpy.ndarray: Must have same number of columns as training - array-like: Converted to numpy array
- Returns:
Transformed data where continuous values are replaced with bin indices or representative values. The output format depends on: - preserve_dataframe setting: DataFrame vs array format - binning method: indices vs representatives - guidance_columns: only binning columns are transformed
- Raises:
RuntimeError – If the transformer has not been fitted yet.
ValueError – If the input data has incompatible structure or format.
BinningError – If transformation fails due to data issues.
Example
>>> # After fitting >>> X_new = pd.DataFrame({'feature1': [1.5, 2.5], 'feature2': [15, 25]}) >>> X_binned = binner.transform(X_new) >>> print(X_binned) [[0, 0], [1, 1]] # Bin indices
Note
When guidance_columns is specified, only the binning columns are transformed. Guidance columns are filtered out from the output. The method preserves the original data format when preserve_dataframe=True.
- static validate_array_like(data: Any, name: str = 'data', allow_none: bool = False) ndarray[Any, Any] | None
Validate and convert array-like input to numpy array.
This method provides robust validation and conversion of various input formats to numpy arrays, with comprehensive error handling and helpful suggestions for common issues.
- Parameters:
data – Input data to validate and convert. Can be: - numpy.ndarray: Used directly - pandas.DataFrame/Series: Converted to numpy array - polars.DataFrame: Converted to numpy array - list, tuple: Converted to numpy array - None: Allowed only if allow_none=True
name – Name of the data parameter for error messages. Used to provide context in error messages (e.g., “X”, “y”, “guidance_data”).
allow_none – Whether to allow None as a valid input. If True, None is returned unchanged; if False, None raises InvalidDataError.
- Returns:
Validated numpy array, or None if data is None and allow_none=True. The returned array maintains the same data content but is guaranteed to be a numpy array.
- Raises:
InvalidDataError – If validation fails: - data is None when allow_none=False - data cannot be converted to numpy array - Conversion process encounters errors
Example
>>> # Valid inputs >>> arr = ValidationMixin.validate_array_like([1, 2, 3], "X") >>> print(type(arr)) <class 'numpy.ndarray'> >>> >>> # Allow None >>> result = ValidationMixin.validate_array_like(None, "y", allow_none=True) >>> print(result) None >>> >>> # Invalid input >>> ValidationMixin.validate_array_like(None, "X", allow_none=False) InvalidDataError: X cannot be None
Note
This method focuses on format validation and conversion. Content validation (like checking for NaN values) should be done separately using other validation methods.
- static validate_column_specification(columns: Any, data_shape: tuple[int, ...]) list[Any]
Validate column specifications.
- static validate_guidance_columns(guidance_cols: Any, binning_cols: list[Any], data_shape: tuple[int, ...]) list[Any]
Validate guidance column specifications.
- validate_guidance_data(guidance_data: Any, name: str = 'guidance_data') ndarray[Any, Any]
Validate and preprocess guidance data for supervised binning.
Ensures that the guidance data is appropriate for supervised binning by validating its shape and checking for data quality issues.
- Parameters:
guidance_data – Raw guidance/target data to validate. Should be a 2D array with shape (n_samples, 1) or 1D array with shape (n_samples,).
name – Name used in error messages for better debugging context.
- Returns:
Validated and preprocessed guidance data with shape (n_samples, 1).
- Raises:
ValidationError – If guidance data has invalid shape or format.
Overview
IsotonicBinning creates bins using isotonic regression to find optimal cut points that preserve
monotonic relationships between features and targets. The transformer fits an isotonic (non-decreasing)
function to the data and identifies significant changes in this function to determine bin boundaries.
This method is particularly effective when:
There’s a known monotonic relationship between feature and target
You want bins that respect monotonic ordering
Traditional tree-based methods might create non-monotonic splits
You need interpretable bins that maintain logical ordering
Key Features
Monotonicity Preservation: Ensures bins respect monotonic relationships
Isotonic Regression: Uses sklearn’s IsotonicRegression for optimal fitting
Automatic Cut Points: Identifies significant changes in isotonic function
Flexible Direction: Supports both increasing and decreasing monotonicity
Sample Size Control: Ensures minimum samples per bin for statistical validity
Supervised Learning: Uses target variable information for optimal binning
Sklearn Compatibility: Full transformer interface with fit/transform methods
DataFrame Support: Preserves pandas/polars column names and structure
Basic Usage
import numpy as np
import pandas as pd
from binlearn.methods import IsotonicBinning
# Create sample data with monotonic relationship
np.random.seed(42)
X = np.random.uniform(0, 10, 500).reshape(-1, 1)
y = 2 * X.flatten() + np.random.normal(0, 1, 500) # Linear + noise
# Apply isotonic binning
binner = IsotonicBinning(
max_bins=6,
min_samples_per_bin=20,
increasing=True
)
# Fit using X and y (sklearn style)
binner.fit(X, y)
X_binned = binner.transform(X)
print(f"Original shape: {X.shape}")
print(f"Binned shape: {X_binned.shape}")
print(f"Bin edges: {binner.bin_edges_[0]}")
Classification Example
from sklearn.datasets import make_classification
# Create classification data with monotonic relationship
X, y = make_classification(
n_samples=1000,
n_features=1,
n_redundant=0,
n_clusters_per_class=1,
random_state=42
)
# Sort by feature to create monotonic relationship
sort_idx = np.argsort(X.flatten())
X_sorted = X[sort_idx]
y_sorted = y[sort_idx]
binner = IsotonicBinning(
max_bins=8,
min_samples_per_bin=30,
increasing=True,
min_change_threshold=0.05
)
binner.fit(X_sorted, y_sorted)
X_binned = binner.transform(X_sorted)
print(f"Created {len(binner.bin_edges_[0]) - 1} bins")
print(f"Bin edges: {binner.bin_edges_[0]}")
DataFrame Example with Guidance Columns
# Create DataFrame with target column
df = pd.DataFrame({
'age': np.random.uniform(18, 80, 1000),
'income': np.random.uniform(20000, 150000, 1000),
'credit_score': np.random.uniform(300, 850, 1000)
})
# Create monotonic target: risk increases with age, decreases with income/credit
df['default_risk'] = (
0.3 * (df['age'] - 18) / 62 + # Age increases risk
-0.4 * (df['income'] - 20000) / 130000 + # Income decreases risk
-0.3 * (df['credit_score'] - 300) / 550 + # Credit decreases risk
np.random.normal(0, 0.1, 1000)
)
# Bin each feature with appropriate monotonicity
age_binner = IsotonicBinning(
guidance_columns=['default_risk'],
max_bins=5,
increasing=True, # Risk increases with age
preserve_dataframe=True
)
income_binner = IsotonicBinning(
guidance_columns=['default_risk'],
max_bins=6,
increasing=False, # Risk decreases with income
preserve_dataframe=True
)
# Apply binning
df_age_binned = age_binner.fit_transform(df[['age', 'default_risk']])
df_income_binned = income_binner.fit_transform(df[['income', 'default_risk']])
Advanced Configuration
# Fine-tuned isotonic binning for specific requirements
# Conservative binning (fewer bins, stricter requirements)
conservative_binner = IsotonicBinning(
max_bins=5,
min_samples_per_bin=50, # Larger bins for stability
min_change_threshold=0.1, # Require larger changes
increasing=True,
y_min=0.0, # Bound target values
y_max=1.0
)
# Granular binning (more bins, sensitive to changes)
granular_binner = IsotonicBinning(
max_bins=15,
min_samples_per_bin=10, # Smaller bins allowed
min_change_threshold=0.01, # Sensitive to small changes
increasing=True
)
Parameter Guide
- max_bins (int, default=10)
Maximum number of bins to create. Actual number may be smaller:
Higher values: Allow more granular binning
Lower values: Force simpler, broader bins
Consider your model’s complexity needs
- min_samples_per_bin (int, default=5)
Minimum samples required per bin for statistical validity:
Higher values: More stable bins, fewer total bins
Lower values: More granular binning, potentially less stable
Rule of thumb: At least 30 for regression, 10+ per class for classification
- increasing (bool, default=True)
Direction of monotonic relationship:
True: Higher feature values → higher target values
False: Higher feature values → lower target values
Must match your domain knowledge
- min_change_threshold (float, default=0.01)
Minimum relative change in isotonic function to create new bin:
Smaller values: More sensitive, create more bins
Larger values: Less sensitive, create fewer bins
Typical range: 0.005 to 0.1
- y_min, y_max (float, optional)
Bounds for target values in isotonic regression:
Helps constrain the isotonic function
Useful for normalized targets or known ranges
If None, uses data min/max
Scikit-learn Pipeline Integration
from sklearn.pipeline import Pipeline
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
# Create pipeline with isotonic binning
pipeline = Pipeline([
('binning', IsotonicBinning(max_bins=6, increasing=True)),
('regressor', RandomForestRegressor(random_state=42))
])
# Use in ML workflow
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
pipeline.fit(X_train, y_train)
y_pred = pipeline.predict(X_test)
mse = mean_squared_error(y_test, y_pred)
print(f"Pipeline MSE: {mse:.4f}")
Tips for Best Results
Validate monotonicity first: Use correlation analysis to confirm monotonic relationships
Choose appropriate direction: Set increasing parameter based on domain knowledge
Balance bin size and count: Larger min_samples_per_bin gives more stable bins
Consider target distribution: Normalize targets if they have extreme ranges
Validate on holdout data: Check that monotonic relationship holds on test data
See Also
TreeBinning- Decision tree-based supervised binningChi2Binning- Chi-square statistic-based supervised binningEqualFrequencyBinning- Quantile-based unsupervised binningKMeansBinning- K-means clustering-based binning
Overview
IsotonicBinning creates bins using isotonic regression to find optimal cut points that preserve
monotonic relationships between features and targets. The transformer fits an isotonic (non-decreasing)
function to the data and identifies significant changes in this function to determine bin boundaries.
This approach is particularly effective when:
Monotonic relationships exist between features and targets
Ordinal consistency is important in your binning
Regulatory requirements mandate monotonic scoring models
Risk scoring applications where higher values should consistently indicate higher risk
Key Features
Monotonicity Preservation: Ensures bins respect monotonic ordering relationships
Regression-Based: Uses isotonic regression for optimal cut point identification
Automatic Boundary Detection: Identifies significant changes in fitted isotonic function
Flexible Direction: Supports both increasing and decreasing monotonicity
Sample Control: Ensures minimum samples per bin for statistical reliability
Sklearn Compatibility: Full transformer interface with fit/transform methods
DataFrame Support: Preserves pandas/polars column names and structure
Basic Usage
import numpy as np
import pandas as pd
from binlearn.methods import IsotonicBinning
# Create data with monotonic relationship
np.random.seed(42)
X = np.random.rand(1000, 2)
# Create target with monotonic relationship to first feature
y = 2 * X[:, 0] + 0.5 * np.random.randn(1000)
# Apply isotonic binning
binner = IsotonicBinning(
max_bins=8,
min_samples_per_bin=50,
increasing=True
)
# Method 1: Using fit with X and y (sklearn style)
binner.fit(X, y)
X_binned = binner.transform(X)
print(f"Original shape: {X.shape}")
print(f"Binned shape: {X_binned.shape}")
print(f"Bins for feature 0: {len(binner.bin_edges_[0]) - 1}")
DataFrame Example with Target Column
# Create DataFrame with monotonic relationships
df = pd.DataFrame({
'age': np.random.uniform(18, 80, 1000),
'income': np.random.uniform(20000, 150000, 1000),
'experience': np.random.uniform(0, 40, 1000)
})
# Create target with monotonic relationships
df['default_risk'] = (
0.01 * df['age'] +
-0.00001 * df['income'] +
-0.005 * df['experience'] +
0.2 * np.random.randn(1000)
)
# Method 2: Using guidance_columns (binlearn style)
binner = IsotonicBinning(
guidance_columns=['default_risk'],
max_bins=6,
min_samples_per_bin=100,
preserve_dataframe=True
)
df_binned = binner.fit_transform(df)
print(f"Bin edges for age: {binner.bin_edges_['age']}")
print(f"Bin edges for income: {binner.bin_edges_['income']}")
Decreasing Monotonicity Example
# Example where higher feature values should lead to lower target values
X_credit = np.random.uniform(0, 100, 500).reshape(-1, 1) # Credit score
y_default = 1 / (1 + np.exp(0.1 * (X_credit.flatten() - 50))) # Lower default prob for higher scores
# Use decreasing monotonicity
binner = IsotonicBinning(
max_bins=5,
min_samples_per_bin=50,
increasing=False, # Higher credit score = lower default probability
min_change_threshold=0.05
)
binner.fit(X_credit, y_default)
X_credit_binned = binner.transform(X_credit)
# Verify monotonicity: bin representatives should decrease
print("Bin representatives:", binner.bin_representatives_[0])
print("Monotonically decreasing:",
all(binner.bin_representatives_[0][i] >= binner.bin_representatives_[0][i+1]
for i in range(len(binner.bin_representatives_[0])-1)))
Advanced Configuration
# Fine-tuned isotonic binning for different scenarios
# High-precision binning (more sensitive to changes)
precise_binner = IsotonicBinning(
max_bins=12,
min_samples_per_bin=30,
min_change_threshold=0.005, # More sensitive to changes
increasing=True,
y_min=0.0, # Explicit bounds
y_max=1.0
)
# Robust binning (less sensitive, larger bins)
robust_binner = IsotonicBinning(
max_bins=6,
min_samples_per_bin=100, # Larger bins for stability
min_change_threshold=0.1, # Less sensitive to changes
increasing=True
)
Classification Example
from sklearn.datasets import make_classification
from sklearn.preprocessing import LabelEncoder
# Create classification data
X_class, y_class = make_classification(
n_samples=1000,
n_features=3,
n_classes=3,
n_redundant=0,
random_state=42
)
# Isotonic binning works with classification by treating classes ordinally
binner = IsotonicBinning(
max_bins=7,
min_samples_per_bin=50,
increasing=True
)
binner.fit(X_class, y_class)
X_class_binned = binner.transform(X_class)
print(f"Classification bins: {[len(edges)-1 for edges in binner.bin_edges_.values()]}")
Risk Scoring Pipeline
from sklearn.pipeline import Pipeline
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import roc_auc_score
# Create a risk scoring pipeline with monotonic binning
risk_pipeline = Pipeline([
('isotonic_binning', IsotonicBinning(
max_bins=8,
min_samples_per_bin=50,
increasing=True
)),
('logistic_regression', LogisticRegression(random_state=42))
])
# Use in credit risk modeling
X_train, X_test, y_train, y_test = train_test_split(
X, y > np.median(y), test_size=0.2, random_state=42
)
risk_pipeline.fit(X_train, y_train)
y_proba = risk_pipeline.predict_proba(X_test)[:, 1]
auc_score = roc_auc_score(y_test, y_proba)
print(f"AUC with isotonic binning: {auc_score:.3f}")
Parameter Guide
- max_bins (int, default=10)
Maximum number of bins to create per feature:
Higher values: More granular risk segments
Lower values: Broader, more stable risk categories
Consider regulatory requirements and model interpretability
- min_samples_per_bin (int, default=5)
Minimum samples required per bin for statistical reliability:
Higher values: More stable bins, better statistical power
Lower values: More granular binning, potential instability
Rule of thumb: At least 30-50 for reliable estimates
- increasing (bool, default=True)
Direction of monotonicity to enforce:
True: Higher feature values → higher target values
False: Higher feature values → lower target values
Choose based on domain knowledge and expected relationships
- min_change_threshold (float, default=0.01)
Minimum relative change in fitted values to create new bin:
Lower values: More sensitive, more bins
Higher values: Less sensitive, fewer bins
Typical range: 0.005 (sensitive) to 0.1 (robust)
- y_min, y_max (float, optional)
Bounds for target values in isotonic regression:
Explicit bounds can improve numerical stability
Useful for probability targets: y_min=0.0, y_max=1.0
Auto-detected from data if not specified
Monotonicity Validation
# Function to validate monotonicity in binned results
def validate_monotonicity(binner, feature_idx=0, increasing=True):
"""Validate that bin representatives follow monotonic order."""
reps = binner.bin_representatives_[feature_idx]
if increasing:
is_monotonic = all(reps[i] <= reps[i+1] for i in range(len(reps)-1))
direction = "increasing"
else:
is_monotonic = all(reps[i] >= reps[i+1] for i in range(len(reps)-1))
direction = "decreasing"
print(f"Monotonicity ({direction}): {is_monotonic}")
print(f"Representatives: {reps}")
return is_monotonic
# Validate our binning results
validate_monotonicity(binner, feature_idx=0, increasing=True)
Tips for Best Results
Verify monotonic relationships exist in your data before applying
Choose appropriate min_samples_per_bin based on your sample size
Adjust min_change_threshold based on noise level in your data
Consider regulatory constraints for financial/medical applications
Validate results on holdout data to avoid overfitting
Common Use Cases
Credit Risk Scoring: Age, income, debt-to-income ratio
Medical Risk Assessment: Biomarkers, age, symptom severity
Marketing Response: Customer value, engagement metrics
Predictive Maintenance: Usage hours, temperature readings
Quality Control: Process parameters, environmental conditions
See Also
TreeBinning- Decision tree-based supervised binningChi2Binning- Chi-square statistic-based supervised binningEqualFrequencyBinning- Quantile-based unsupervised binningKMeansBinning- K-means clustering-based binning