Fitted State Reconstruction
binlearn provides a powerful fitted state reconstruction system that allows complete restoration of fitted estimators from their parameters. This page explains how get_params() and set_params() work together to enable model persistence, pipeline serialization, and state reconstruction, with all parameter dictionaries being JSON serializable.
The Reconstruction Philosophy
binlearn follows the “complete state reconstruction” principle:
get_params()captures ALL information needed to reconstruct a fitted estimatorset_params(params)orConstructor(**get_params())completely restores the fitted stateParameter dictionaries are always JSON serializable for easy persistence
No separate
fit()call is needed after reconstruction
This enables seamless model persistence, distributed computing, and stateless transformations.
Basic Reconstruction Example
import numpy as np
from binlearn import EqualWidthBinning
import json
# Create and fit original estimator
X = np.array([[1, 10], [2, 20], [3, 30], [4, 40], [5, 50]])
original_binner = EqualWidthBinning(n_bins=3, preserve_dataframe=False)
original_binner.fit(X)
# Transform some data
X_test = np.array([[2.5, 25], [4.5, 45]])
original_result = original_binner.transform(X_test)
print(original_result) # [[1 1], [2 2]]
# Method 1: Reconstruction via get_params/set_params
params = original_binner.get_params()
reconstructed_binner = EqualWidthBinning()
reconstructed_binner.set_params(**params)
# Method 2: Reconstruction via constructor
params = original_binner.get_params()
reconstructed_binner2 = EqualWidthBinning(**params)
# Both methods produce identical behavior
reconstructed_result1 = reconstructed_binner.transform(X_test)
reconstructed_result2 = reconstructed_binner2.transform(X_test)
np.testing.assert_array_equal(original_result, reconstructed_result1)
np.testing.assert_array_equal(original_result, reconstructed_result2)
print("✓ Perfect reconstruction achieved!")
JSON Serialization Support
All parameter dictionaries are guaranteed to be JSON serializable:
# Get fitted parameters
fitted_binner = EqualWidthBinning(n_bins=4, clip=True)
fitted_binner.fit(X)
params = fitted_binner.get_params()
# Serialize to JSON (always works)
json_string = json.dumps(params)
print("✓ JSON serialization successful")
# Deserialize and reconstruct
loaded_params = json.loads(json_string)
restored_binner = EqualWidthBinning(**loaded_params)
# Test equivalence
test_data = np.array([[1.5, 15], [3.5, 35]])
original_output = fitted_binner.transform(test_data)
restored_output = restored_binner.transform(test_data)
np.testing.assert_array_equal(original_output, restored_output)
print("✓ JSON round-trip reconstruction successful!")
What get_params() Returns
The get_params() method returns a comprehensive dictionary containing:
Constructor Parameters
All parameters from the __init__ method signature:
binner = EqualWidthBinning(n_bins=5, clip=True, preserve_dataframe=False)
params = binner.get_params()
# Constructor parameters
print(params['n_bins']) # 5
print(params['clip']) # True
print(params['preserve_dataframe']) # False
print(params['bin_range']) # None (default value)
Fitted Parameters (When Fitted)
When the estimator is fitted, get_params() includes fitted state:
binner.fit(X)
fitted_params = binner.get_params()
# Fitted parameters (without trailing underscores)
print(fitted_params.keys())
# Includes: 'bin_edges', 'bin_representatives', etc.
print(fitted_params['bin_edges'])
# {'0': [1.0, 2.33, 3.67, 5.0], '1': [10.0, 23.33, 36.67, 50.0]}
print(fitted_params['bin_representatives'])
# {'0': [1.665, 3.0, 4.335], '1': [16.665, 30.0, 43.335]}
Class Metadata
For automatic reconstruction without explicit class references:
params = binner.get_params()
print(params['class_']) # 'EqualWidthBinning'
print(params['module_']) # 'binlearn.methods._equal_width_binning'
# This enables dynamic class loading:
class_name = params['class_']
module_name = params['module_']
# Can be used to dynamically import and reconstruct
Parameter Type Conversion
binlearn automatically converts numpy types to Python types for JSON compatibility:
import numpy as np
# Example of internal type conversion
raw_edges = {'0': np.array([1.0, 2.0, 3.0])} # NumPy array
params = binner.get_params()
json_compatible_edges = params['bin_edges'] # Python list
print(type(raw_edges['0'])) # <class 'numpy.ndarray'>
print(type(json_compatible_edges['0'])) # <class 'list'>
# Both represent the same data
np.testing.assert_array_equal(raw_edges['0'], json_compatible_edges['0'])
The conversion handles nested structures:
# Complex nested structure with numpy types
complex_data = {
'edges': {
'feature1': np.array([1.0, 2.0, 3.0]),
'feature2': np.array([10, 20, 30])
},
'representatives': {
'feature1': [np.float64(1.5), np.float64(2.5)],
'feature2': [np.int64(15), np.int64(25)]
}
}
# After get_params(), everything becomes JSON-serializable Python types
json.dumps(complex_data) # This would work after conversion
How set_params() Works
The set_params() method intelligently handles parameter restoration:
Regular Parameters
Constructor parameters are set through sklearn’s standard mechanism:
binner = EqualWidthBinning()
binner.set_params(
n_bins=5,
clip=True,
preserve_dataframe=False
)
# These become: binner.n_bins, binner.clip, binner.preserve_dataframe
Fitted Parameters
Fitted parameters are set as attributes with trailing underscores:
binner.set_params(
bin_edges={'0': [1, 2, 3, 4]}, # Becomes: binner.bin_edges_
bin_representatives={'0': [1.5, 2.5, 3.5]} # Becomes: binner.bin_representatives_
)
# After set_params(), the binner is immediately ready for transform()
print(binner._fitted) # True - no need to call fit()
Class Metadata Handling
Class and module parameters are ignored during reconstruction:
params_with_metadata = {
'n_bins': 3,
'bin_edges': {'0': [1, 2, 3, 4]},
'class_': 'EqualWidthBinning', # Ignored
'module_': 'binlearn.methods...', # Ignored
}
binner.set_params(**params_with_metadata)
# class_ and module_ are silently ignored
Advanced Reconstruction Scenarios
Pipeline Persistence
Fitted pipelines can be completely reconstructed:
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
# Create and fit pipeline
pipeline = Pipeline([
('scaler', StandardScaler()),
('binner', EqualWidthBinning(n_bins=4))
])
pipeline.fit(X)
# Get complete pipeline parameters
pipeline_params = pipeline.get_params(deep=True)
# Reconstruct pipeline
new_pipeline = Pipeline([
('scaler', StandardScaler()),
('binner', EqualWidthBinning())
])
new_pipeline.set_params(**pipeline_params)
# Test pipeline equivalence
original_prediction = pipeline.predict(X_test[:5])
reconstructed_prediction = new_pipeline.predict(X_test[:5])
np.testing.assert_array_equal(original_prediction, reconstructed_prediction)
Cross-Process Communication
Perfect for distributed computing scenarios:
# Worker process 1: Train and serialize
def train_and_serialize(X_train):
binner = EqualWidthBinning(n_bins=5)
binner.fit(X_train)
params = binner.get_params()
return json.dumps(params)
# Worker process 2: Deserialize and predict
def load_and_predict(json_params, X_test):
params = json.loads(json_params)
binner = EqualWidthBinning(**params) # Immediately ready
return binner.transform(X_test)
# Example usage
serialized_model = train_and_serialize(X_train)
predictions = load_and_predict(serialized_model, X_test)
Database Storage
Store fitted models in databases as JSON:
import sqlite3
# Store fitted model
def store_model(model_name, fitted_binner):
params = fitted_binner.get_params()
params_json = json.dumps(params)
conn = sqlite3.connect('models.db')
conn.execute('''
CREATE TABLE IF NOT EXISTS models
(name TEXT PRIMARY KEY, params TEXT)
''')
conn.execute('INSERT OR REPLACE INTO models VALUES (?, ?)',
(model_name, params_json))
conn.commit()
conn.close()
# Load fitted model
def load_model(model_name):
conn = sqlite3.connect('models.db')
cursor = conn.execute('SELECT params FROM models WHERE name = ?', (model_name,))
row = cursor.fetchone()
conn.close()
if row:
params = json.loads(row[0])
return EqualWidthBinning(**params) # Ready to use
return None
Implementation Details
Parameter Extraction Process
The get_params() method uses introspection to extract parameters:
# Pseudo-code for get_params() logic
def get_params(self, deep=True):
# 1. Extract constructor signature
init_signature = inspect.signature(self.__class__.__init__)
# 2. Get constructor parameter values
params = {}
for param_name in init_signature.parameters:
if param_name != 'self':
params[param_name] = getattr(self, param_name, default_value)
# 3. Add fitted parameters if fitted
if self._fitted:
fitted_params = self._extract_fitted_params()
params.update(fitted_params)
# 4. Add class metadata
params['class_'] = self.__class__.__name__
params['module_'] = self.__class__.__module__
# 5. Convert numpy types to Python types
return convert_to_python_types(params)
Fitted State Detection
The system automatically detects fitted state:
# Each binning class defines fitted attributes
class EqualWidthBinning(IntervalBinningBase):
def __init__(self, ...):
# ...
self._fitted_attributes = ['bin_edges_', 'bin_representatives_']
@property
def _fitted(self):
# Check if any fitted attribute contains data
for attr_name in self._fitted_attributes:
attr_value = getattr(self, attr_name, None)
if attr_value is not None and attr_value:
return True
return False
Best Practices
Always use get_params() for serialization:
# Good: Complete state capture params = fitted_binner.get_params() # Bad: Manual parameter extraction (incomplete) manual_params = {'n_bins': fitted_binner.n_bins}
Prefer constructor reconstruction for new instances:
# Good: Clean instantiation new_binner = EqualWidthBinning(**params) # Also good: Explicit reconstruction new_binner = EqualWidthBinning() new_binner.set_params(**params)
Use JSON serialization for persistence:
# Store model with open('model.json', 'w') as f: json.dump(fitted_binner.get_params(), f) # Load model with open('model.json', 'r') as f: params = json.load(f) restored_binner = EqualWidthBinning(**params)
Test reconstruction in critical applications:
# Validation function def validate_reconstruction(original_estimator, test_data): # Get parameters and reconstruct params = original_estimator.get_params() reconstructed = original_estimator.__class__(**params) # Test equivalence original_output = original_estimator.transform(test_data) reconstructed_output = reconstructed.transform(test_data) np.testing.assert_array_equal(original_output, reconstructed_output) return True
Common Use Cases
Model Serving: Serialize models to JSON for web services
Batch Processing: Store fitted models between batch jobs
A/B Testing: Compare different model configurations
Model Versioning: Track model parameters over time
Distributed Computing: Share fitted models across workers
Pipeline Checkpointing: Save intermediate pipeline states
The fitted state reconstruction system makes binlearn estimators fully stateless and serializable, enabling flexible deployment and integration patterns.