ManualIntervalBinning

class binlearn.methods.ManualIntervalBinning(bin_edges: dict[Any, list[float]], bin_representatives: dict[Any, list[float]] | None = None, clip: bool | None = None, preserve_dataframe: bool | None = None, *, class_: str | None = None, module_: str | None = None)[source]

Bases: IntervalBinningBase

Manual interval binning implementation for user-defined bin boundaries.

This class provides complete control over binning boundaries by allowing users to specify exact bin edges for each column. Unlike automatic binning methods that infer boundaries from data, manual interval binning uses pre-defined edges, making it ideal for standardized binning schemes, domain-specific requirements, or ensuring consistent binning across different datasets.

Manual interval binning is particularly useful for: - Implementing domain-specific binning rules (e.g., age groups, income brackets) - Ensuring consistent binning across training and test sets - Regulatory or business requirements with fixed bin boundaries - Comparative analysis requiring standardized bins across datasets

Key Features: - Complete user control over bin boundaries for each column - No data-dependent bin edge calculation - uses provided edges exactly - Support for different binning schemes per column - Automatic generation of bin representatives if not provided - Integration with binlearn’s clipping and format preservation features

Algorithm: 1. Validate and store user-provided bin edges 2. Generate default representatives (bin centers) if not provided 3. During transformation, assign values to bins based on interval membership 4. Handle out-of-range values according to clipping configuration

Parameters:

bin_edges – Required dictionary mapping column identifiers to lists of bin edge values. Each edge list must contain at least 2 values and be sorted in ascending order. For example: {0: [0, 10, 20], ‘age’: [0, 18, 65, 100]}
bin_representatives – Optional dictionary mapping column identifiers to lists of representative values for each bin. If not provided, representatives are automatically calculated as bin midpoints.

bin_edges_: Dictionary containing the provided bin edges (same as input)

bin_representatives_: Dictionary containing bin representatives (provided or auto-generated)

Example

>>> import numpy as np
>>> from binlearn.methods import ManualIntervalBinning
>>>
>>> # Define custom bin edges for different features
>>> bin_edges = {
...     'age': [0, 18, 35, 50, 65, 100],          # Age groups
...     'income': [0, 30000, 60000, 100000, float('inf')]  # Income brackets
... }
>>>
>>> # Create binner with custom edges
>>> binner = ManualIntervalBinning(bin_edges=bin_edges)
>>>
>>> # Sample data
>>> X = np.array([[25, 45000], [60, 80000], [30, 25000]])
>>> X_binned = binner.fit_transform(X)  # fit() is no-op, transform() uses edges
>>>
>>> # With custom representatives
>>> bin_reps = {
...     'age': [9, 26.5, 42.5, 57.5, 82.5],      # Custom age representatives
...     'income': [15000, 45000, 80000, 150000]    # Custom income representatives
... }
>>> binner_custom = ManualIntervalBinning(
...     bin_edges=bin_edges,
...     bin_representatives=bin_reps
... )

Note

bin_edges is required and cannot be None
fit() method is essentially a no-op since edges are predefined
Each column can have different numbers of bins and edge values
Out-of-range values are handled according to the clip parameter
Inherits all interval binning capabilities from IntervalBinningBase

Initialize manual interval binning with user-defined bin edges.

Sets up manual interval binning with explicitly provided bin boundaries and optional representatives. This method requires complete bin edge specification upfront and integrates with binlearn’s configuration system for other parameters.

Parameters:

bin_edges – Required dictionary mapping column identifiers to lists of bin edge values. Each edge list must: - Contain at least 2 values (to define at least 1 bin) - Be sorted in ascending order - Contain only finite numeric values For example: {0: [0, 10, 20, 30], ‘feature1’: [0.0, 0.5, 1.0]}
bin_representatives – Optional dictionary mapping column identifiers to lists of representative values for each bin. If provided, must have the same column keys as bin_edges and appropriate counts (one representative per bin). If None, representatives are automatically generated as bin midpoints.
clip – Whether to clip out-of-range values to the nearest bin boundary during transformation. If True, values outside the defined range are assigned to the nearest edge bin. If False, they receive special out-of-range indices. If None, uses global configuration default.
preserve_dataframe – Whether to preserve DataFrame format in outputs when input is a DataFrame. If None, uses global configuration default.
class – Class name for reconstruction compatibility (ignored during normal initialization).
module – Module name for reconstruction compatibility (ignored during normal initialization).

Raises:

ConfigurationError – If bin_edges is None or not provided, with helpful suggestions for proper usage.

Example

>>> # Basic manual binning with auto-generated representatives
>>> bin_edges = {
...     'feature1': [0, 10, 20, 30, 40],
...     'feature2': [-1.0, 0.0, 1.0, 2.0]
... }
>>> binner = ManualIntervalBinning(bin_edges=bin_edges)
>>>
>>> # With custom representatives
>>> bin_reps = {
...     'feature1': [5, 15, 25, 35],           # Custom values
...     'feature2': [-0.5, 0.5, 1.5]          # Custom values
... }
>>> binner_custom = ManualIntervalBinning(
...     bin_edges=bin_edges,
...     bin_representatives=bin_reps
... )
>>>
>>> # With clipping enabled
>>> binner_clip = ManualIntervalBinning(
...     bin_edges=bin_edges,
...     clip=True
... )

Note

bin_edges is the only required parameter and cannot be None
Validation of bin_edges format occurs during initialization
The fit() method will be essentially a no-op since edges are predefined
Each column can have different numbers of bins
Integration with global configuration for clip and preserve_dataframe

fit(X: Any, y: Any | None = None, **fit_params: Any) → ManualIntervalBinning[source]

Fit the Manual Interval binning (no-op since bins are pre-defined).

For manual binning, the object is already fitted during initialization. This method only performs validation.

Parameters:

X – Input data (used only for validation)
y – Target values (ignored for manual binning)
**fit_params – Additional fit parameters (ignored)

Returns:

Self (fitted transformer)

classmethod __init_subclass__(**kwargs)

Set the set_{method}_request methods.

This uses PEP-487 [1] to set the set_{method}_request methods. It looks for the information available in the set default values which are set using __metadata_request__* class attributes, or inferred from method signatures.

The __metadata_request__* class attributes are used when a method does not explicitly accept a metadata through its arguments or if the developer would like to specify a request value for those metadata which are different from the default None.

References

static check_data_quality(data: ndarray[Any, Any], name: str = 'data') → None: Check data quality and issue warnings if needed.

property feature_names_in_: list[str] | None: Get feature names.

fit_transform(X, y=None, **fit_params)

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

Parameters:

X (array-like of shape (n_samples, n_features)) – Input samples.
y (array-like of shape (n_samples,) or (n_samples, n_outputs), default=None) – Target values (None for unsupervised transformations).
**fit_params (dict) – Additional fit parameters.

Returns:

X_new – Transformed array.

Return type:

ndarray array of shape (n_samples, n_features_new)

get_input_columns() → list[Any] | None

Get input columns for data preparation.

This method should be overridden by derived classes to provide appropriate column information without exposing binning-specific concepts.

Returns:: Column information or None if not available

get_metadata_routing()

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns:: routing – A MetadataRequest encapsulating routing information.
Return type:: MetadataRequest

get_params(deep: bool = True) → dict[str, Any]

Get parameters for this estimator, including fitted parameters.

This method extends sklearn’s standard get_params to include fitted parameters when the estimator is fitted, enabling complete object reconstruction through the get_params/set_params interface. This is essential for pipeline persistence and model serialization.

Parameters:

deep – If True, returns parameters for sub-estimators (not applicable here but maintained for sklearn compatibility).

Returns:

Constructor parameters extracted from __init__ signature
Fitted parameters (if estimator is fitted) mapped from attributes
Class metadata (class_, module_) for automatic reconstruction

Return type:

Dictionary of parameter names mapped to their values, including

Example

>>> binner = EqualWidthBinning(n_bins=5)
>>> params = binner.get_params()
>>> print(params)
{'n_bins': 5, 'clip': None, ..., 'class_': 'EqualWidthBinning', 'module_': '...'}
>>>
>>> binner.fit(X)
>>> fitted_params = binner.get_params()
>>> # Now includes: {'bin_edges': {...}, 'bin_representatives': {...}, ...}

Note

Automatically extracts constructor parameters from __init__ signature
Includes fitted parameters only when estimator is fitted
Adds class metadata for reconstruction workflows
Excludes internal sklearn attributes like n_features_in_
class_ and module_ parameters are handled specially during set_params

inverse_transform(X: Any) → Any

Inverse transform from bin indices back to representative values.

Converts discrete bin indices back to their representative values, effectively reversing the binning transformation. This is useful for interpreting results or reconstructing approximate original values.

Parameters:

X – Input data containing bin indices to inverse transform. Should contain only binning columns (no guidance columns). Can be: - pandas.DataFrame: Column names should match binning columns - polars.DataFrame: Column names should match binning columns - numpy.ndarray: Must have same number of binning columns - array-like: Converted to numpy array

Returns:

Inverse transformed data where bin indices are replaced with their representative values (typically bin centers). Output format matches the preserve_dataframe setting.

Raises:

RuntimeError – If the transformer has not been fitted yet.
ValueError – If input data has wrong number of columns or invalid format.
BinningError – If inverse transformation fails.

Example

>>> # After fitting and transforming
>>> X_binned = [[0, 1], [1, 0], [2, 2]]  # Bin indices
>>> X_reconstructed = binner.inverse_transform(X_binned)
>>> print(X_reconstructed)
[[0.5, 1.5], [1.5, 0.5], [2.5, 2.5]]  # Representative values

Note

For guided binning (when guidance_columns is specified), the input should only contain the binning columns, not the guidance columns. The number of input columns must match the number of binning columns.

property n_features_in_: int: Get number of features.

set_output(*, transform=None)

Set output container.

See Introducing the set_output API for an example on how to use the API.

Parameters:

transform ({"default", "pandas", "polars"}, default=None) –

Configure output of transform and fit_transform.

”default”: Default output format of a transformer
”pandas”: DataFrame output
”polars”: Polars output
None: Transform configuration is unchanged

Added in version 1.4: “polars” option was added.

Returns:

self – Estimator instance.

Return type:

estimator instance

set_params(**params: Any) → SklearnIntegrationBase

Set the parameters of this estimator.

This method supports reconstruction workflows by handling fitted parameters that come from get_params() output (without underscores) and setting them as fitted attributes (with underscores).

Parameters:: **params – Parameters to set. Can include: - Regular constructor parameters (n_bins, clip, etc.) - Fitted parameters from get_params (bin_edges, bin_representatives) - Class metadata (ignored during reconstruction)
Returns:: Returns the instance itself.
Return type:: self

transform(X: Any) → Any

Transform input data using fitted binning parameters.

Applies the fitted binning transformation to new data, converting continuous values to discrete bin indices or representatives. Handles column separation when guidance columns are present.

Parameters:

X – Input data to transform. Must have the same structure as the data used during fitting (same number of columns). Can be: - pandas.DataFrame: Column names should match training data - polars.DataFrame: Column names should match training data - numpy.ndarray: Must have same number of columns as training - array-like: Converted to numpy array

Returns:

Transformed data where continuous values are replaced with bin indices or representative values. The output format depends on: - preserve_dataframe setting: DataFrame vs array format - binning method: indices vs representatives - guidance_columns: only binning columns are transformed

Raises:

RuntimeError – If the transformer has not been fitted yet.
ValueError – If the input data has incompatible structure or format.
BinningError – If transformation fails due to data issues.

Example

>>> # After fitting
>>> X_new = pd.DataFrame({'feature1': [1.5, 2.5], 'feature2': [15, 25]})
>>> X_binned = binner.transform(X_new)
>>> print(X_binned)
[[0, 0], [1, 1]]  # Bin indices

Note

When guidance_columns is specified, only the binning columns are transformed. Guidance columns are filtered out from the output. The method preserves the original data format when preserve_dataframe=True.

static validate_array_like(data: Any, name: str = 'data', allow_none: bool = False) → ndarray[Any, Any] | None

Validate and convert array-like input to numpy array.

This method provides robust validation and conversion of various input formats to numpy arrays, with comprehensive error handling and helpful suggestions for common issues.

Parameters:

data – Input data to validate and convert. Can be: - numpy.ndarray: Used directly - pandas.DataFrame/Series: Converted to numpy array - polars.DataFrame: Converted to numpy array - list, tuple: Converted to numpy array - None: Allowed only if allow_none=True
name – Name of the data parameter for error messages. Used to provide context in error messages (e.g., “X”, “y”, “guidance_data”).
allow_none – Whether to allow None as a valid input. If True, None is returned unchanged; if False, None raises InvalidDataError.

Returns:

Validated numpy array, or None if data is None and allow_none=True. The returned array maintains the same data content but is guaranteed to be a numpy array.

Raises:

InvalidDataError – If validation fails: - data is None when allow_none=False - data cannot be converted to numpy array - Conversion process encounters errors

Example

>>> # Valid inputs
>>> arr = ValidationMixin.validate_array_like([1, 2, 3], "X")
>>> print(type(arr))
<class 'numpy.ndarray'>
>>>
>>> # Allow None
>>> result = ValidationMixin.validate_array_like(None, "y", allow_none=True)
>>> print(result)
None
>>>
>>> # Invalid input
>>> ValidationMixin.validate_array_like(None, "X", allow_none=False)
InvalidDataError: X cannot be None

Note

This method focuses on format validation and conversion. Content validation (like checking for NaN values) should be done separately using other validation methods.

static validate_column_specification(columns: Any, data_shape: tuple[int, ...]) → list[Any]: Validate column specifications.

static validate_guidance_columns(guidance_cols: Any, binning_cols: list[Any], data_shape: tuple[int, ...]) → list[Any]: Validate guidance column specifications.

Overview

ManualIntervalBinning creates bins using explicitly provided bin edges, giving users complete control over binning boundaries. Unlike automatic binning methods, this transformer never infers bin edges from data - they must always be provided by the user.

This approach is ideal for:

Standardized binning across multiple datasets
Domain-specific binning requirements with business rules
Reproducible binning with known boundaries
Integration with external binning specifications
Regulatory compliance where specific bins are mandated

Key Features

Complete Control: User defines all bin boundaries explicitly
Consistency: Same bins across different datasets and time periods
Validation: Comprehensive validation of user-provided bin edges
Auto-Representatives: Automatic generation of bin center representatives
Flexible Keys: Supports both column names and indices as keys
Out-of-Range Handling: Configurable clipping for values outside bin ranges
Sklearn Compatibility: Full transformer interface with fit/transform methods
DataFrame Support: Preserves pandas/polars column names and structure

Basic Usage

import numpy as np
import pandas as pd
from binlearn.methods import ManualIntervalBinning

# Create sample data
np.random.seed(42)
X = np.random.uniform(0, 100, 200).reshape(-1, 2)

# Define custom bin edges for each feature
custom_edges = {
    0: [0, 20, 40, 60, 80, 100],      # Feature 0: quintiles
    1: [0, 25, 50, 75, 100]           # Feature 1: quartiles
}

# Apply manual binning
binner = ManualIntervalBinning(bin_edges=custom_edges)
X_binned = binner.fit_transform(X)

print(f"Original shape: {X.shape}")
print(f"Binned shape: {X_binned.shape}")
print(f"Bin edges for feature 0: {binner.bin_edges_[0]}")
print(f"Bin edges for feature 1: {binner.bin_edges_[1]}")
print(f"Representatives for feature 0: {binner.bin_representatives_[0]}")

DataFrame Example with Named Columns

# Create DataFrame with named columns
df = pd.DataFrame({
    'age': np.random.uniform(18, 80, 1000),
    'income': np.random.uniform(20000, 200000, 1000),
    'credit_score': np.random.uniform(300, 850, 1000)
})

# Define business-relevant bin edges
business_edges = {
    'age': [18, 25, 35, 50, 65, 80],          # Life stages
    'income': [0, 30000, 60000, 100000, 200000],  # Income brackets
    'credit_score': [300, 580, 670, 740, 850]     # Credit categories
}

# Optional: Define custom representatives
representatives = {
    'age': [21, 30, 42, 57, 72],              # Midpoint ages
    'income': [15000, 45000, 80000, 150000],  # Representative incomes
    'credit_score': [440, 625, 705, 795]      # Representative scores
}

binner = ManualIntervalBinning(
    bin_edges=business_edges,
    bin_representatives=representatives,
    preserve_dataframe=True,
    clip=True  # Clip outliers to bin boundaries
)

df_binned = binner.fit_transform(df)

print("Age bins:")
for i, (start, end) in enumerate(zip(business_edges['age'][:-1], business_edges['age'][1:])):
    count = ((df['age'] >= start) & (df['age'] < end)).sum()
    print(f"  Bin {i}: [{start}, {end}) - {count} samples")

Financial Risk Example

# Financial data with regulatory-defined risk categories
financial_df = pd.DataFrame({
    'debt_to_income': np.random.uniform(0, 1.5, 5000),
    'loan_to_value': np.random.uniform(0.3, 1.2, 5000),
    'fico_score': np.random.uniform(300, 850, 5000)
})

# Regulatory risk categories (example)
risk_edges = {
    'debt_to_income': [0, 0.28, 0.36, 0.43, 1.5],     # DTI risk categories
    'loan_to_value': [0, 0.8, 0.9, 0.95, 1.2],        # LTV risk categories
    'fico_score': [300, 580, 620, 680, 740, 850]       # Credit score tiers
}

# Risk level names as representatives
risk_representatives = {
    'debt_to_income': ['Low', 'Moderate', 'High', 'Very High'],
    'loan_to_value': ['Conservative', 'Standard', 'Aggressive', 'High Risk'],
    'fico_score': ['Poor', 'Fair', 'Good', 'Very Good', 'Excellent']
}

risk_binner = ManualIntervalBinning(
    bin_edges=risk_edges,
    bin_representatives=risk_representatives,
    preserve_dataframe=True,
    clip=True
)

financial_binned = risk_binner.fit_transform(financial_df)

# Show distribution across risk categories
for col in ['debt_to_income', 'loan_to_value', 'fico_score']:
    print(f"\\n{col.replace('_', ' ').title()} Distribution:")
    for i, rep in enumerate(risk_representatives[col]):
        mask = financial_binned[col] == i
        count = mask.sum()
        percentage = count / len(financial_df) * 100
        print(f"  {rep}: {count} ({percentage:.1f}%)")

Medical/Clinical Example

# Medical data with clinical thresholds
medical_df = pd.DataFrame({
    'bmi': np.random.normal(25, 5, 2000),
    'blood_pressure_systolic': np.random.normal(120, 20, 2000),
    'cholesterol': np.random.normal(200, 40, 2000),
    'age': np.random.uniform(18, 90, 2000)
})

# Clinical classification thresholds
clinical_edges = {
    'bmi': [0, 18.5, 25, 30, 40],                    # BMI categories
    'blood_pressure_systolic': [0, 120, 130, 140, 180, 300],  # BP stages
    'cholesterol': [0, 200, 240, 300],               # Cholesterol levels
    'age': [0, 18, 40, 65, 100]                      # Age groups
}

clinical_labels = {
    'bmi': ['Underweight', 'Normal', 'Overweight', 'Obese'],
    'blood_pressure_systolic': ['Normal', 'Elevated', 'Stage 1', 'Stage 2', 'Crisis'],
    'cholesterol': ['Desirable', 'Borderline', 'High'],
    'age': ['Child', 'Adult', 'Middle Age', 'Senior']
}

clinical_binner = ManualIntervalBinning(
    bin_edges=clinical_edges,
    bin_representatives=clinical_labels,
    preserve_dataframe=True,
    clip=True
)

medical_binned = clinical_binner.fit_transform(medical_df)

# Clinical summary
print("Clinical Distribution Summary:")
for condition in ['bmi', 'blood_pressure_systolic', 'cholesterol']:
    print(f"\\n{condition.replace('_', ' ').title()}:")
    for i, label in enumerate(clinical_labels[condition]):
        count = (medical_binned[condition] == i).sum()
        print(f"  {label}: {count} patients ({count/len(medical_df)*100:.1f}%)")

Cross-Dataset Consistency

# Ensure consistent binning across training and test sets

# Training data
train_data = pd.DataFrame({
    'feature1': np.random.normal(50, 15, 1000),
    'feature2': np.random.exponential(2, 1000)
})

# Test data (different distribution)
test_data = pd.DataFrame({
    'feature1': np.random.normal(45, 20, 500),  # Different mean/std
    'feature2': np.random.exponential(3, 500)   # Different scale
})

# Fixed bin edges ensure consistency
standard_edges = {
    'feature1': [0, 25, 40, 55, 70, 100],
    'feature2': [0, 1, 3, 6, 10, 20]
}

binner = ManualIntervalBinning(
    bin_edges=standard_edges,
    preserve_dataframe=True,
    clip=True
)

# Same binning applied to both datasets
train_binned = binner.fit_transform(train_data)
test_binned = binner.transform(test_data)  # No fitting needed

print("Training data distribution:")
print(train_binned['feature1'].value_counts().sort_index())
print("\\nTest data distribution:")
print(test_binned['feature1'].value_counts().sort_index())

Advanced Bin Edge Validation

# Example of comprehensive bin edge validation

def validate_custom_edges(edges_dict, data_ranges):
    \"\"\"Validate that bin edges cover expected data ranges.\"\"\"
    for col, edges in edges_dict.items():
        if col in data_ranges:
            data_min, data_max = data_ranges[col]
            edge_min, edge_max = min(edges), max(edges)

            if edge_min > data_min:
                print(f"Warning: {col} edges start at {edge_min}, data starts at {data_min}")
            if edge_max < data_max:
                print(f"Warning: {col} edges end at {edge_max}, data ends at {data_max}")

            # Check for reasonable bin sizes
            bin_widths = np.diff(edges)
            if max(bin_widths) / min(bin_widths) > 10:
                print(f"Warning: {col} has very uneven bin sizes")

# Usage example
data_ranges = {
    'age': (df['age'].min(), df['age'].max()),
    'income': (df['income'].min(), df['income'].max())
}

validate_custom_edges(business_edges, data_ranges)

Scikit-learn Pipeline Integration

from sklearn.pipeline import Pipeline
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.datasets import make_classification

# Create classification data
X, y = make_classification(n_samples=2000, n_features=3, n_classes=2, random_state=42)

# Define standardized bin edges for each feature
pipeline_edges = {
    0: [-3, -1, 0, 1, 3],
    1: [-3, -1.5, 0, 1.5, 3],
    2: [-3, -1, 1, 3]
}

# Create pipeline with manual binning
pipeline = Pipeline([
    ('binning', ManualIntervalBinning(
        bin_edges=pipeline_edges,
        clip=True
    )),
    ('classifier', RandomForestClassifier(random_state=42))
])

# Train and evaluate
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
pipeline.fit(X_train, y_train)
accuracy = pipeline.score(X_test, y_test)

print(f"Pipeline accuracy with manual binning: {accuracy:.3f}")

Parameter Guide

bin_edges (dict, required)

Dictionary mapping column identifiers to bin edge lists:

Keys: Column names (str) or indices (int)
Values: Sorted lists/arrays of bin boundaries
Must have at least 2 elements per list
Will create len(edges)-1 bins

bin_representatives (dict, optional)

Dictionary mapping columns to bin representative values:

Keys: Must match bin_edges keys
Values: Lists with len(edges)-1 elements
If None, uses bin centers as representatives
Can be numeric values or category names

clip (bool, optional)

Whether to clip out-of-range values:

True: Clip to nearest bin edge
False: Assign special out-of-range indicators
None: Use global configuration default

Edge Case Handling

# Handling data outside bin ranges

# Data with outliers
outlier_data = pd.DataFrame({
    'normal_feature': np.concatenate([
        np.random.normal(50, 10, 900),  # Normal data
        [5, 95, 120, -10]               # Outliers
    ])
})

normal_edges = {'normal_feature': [20, 40, 60, 80]}

# With clipping
clipper = ManualIntervalBinning(bin_edges=normal_edges, clip=True)
clipped_result = clipper.fit_transform(outlier_data)

# Without clipping (outliers get special values)
no_clipper = ManualIntervalBinning(bin_edges=normal_edges, clip=False)
unclipped_result = no_clipper.fit_transform(outlier_data)

print("With clipping - unique values:", np.unique(clipped_result))
print("Without clipping - unique values:", np.unique(unclipped_result))

Tips for Best Results

Validate edge coverage: Ensure edges cover your expected data range
Consider domain knowledge: Use meaningful boundaries from your field
Check bin balance: Avoid bins that are too small or too large
Plan for outliers: Decide on clipping strategy early
Document edge rationale: Keep records of why specific edges were chosen
Test across datasets: Validate that edges work across different data samples

Common Use Cases

Age Groups: [18, 25, 35, 50, 65, 80] for life stage analysis
Income Brackets: [0, 25000, 50000, 100000, 200000] for economic segments
Test Scores: [0, 60, 70, 80, 90, 100] for grade boundaries
Medical Thresholds: Disease-specific clinical cutoffs
Risk Categories: Regulatory or business-defined risk levels

ManualIntervalBinning

Overview

Key Features

Basic Usage

DataFrame Example with Named Columns

Financial Risk Example

Medical/Clinical Example

Cross-Dataset Consistency

Advanced Bin Edge Validation

Scikit-learn Pipeline Integration

Parameter Guide

Edge Case Handling

Tips for Best Results

Common Use Cases

See Also