Data Type Enforcement

mloda supports optional data type declarations on Features, enabling runtime validation that computed data matches declared types.

Declaring Feature Types

Use typed constructors to declare the expected data type:

from mloda.user import Feature

# Typed features - will be validated at runtime
feature_int = Feature.int32_of("user_count")
feature_double = Feature.double_of("price")
feature_str = Feature.str_of("name")

# Untyped feature - no validation
feature_any = Feature.not_typed("legacy_column")

Available typed constructors: - int32_of(), int64_of() - Integer types - float_of(), double_of() - Floating point types - str_of() - String type - boolean_of() - Boolean type - date_of(), timestamp_millis_of(), timestamp_micros_of() - Date/time types - decimal_of(), binary_of() - Other types

Validation Behavior

Default (Lenient) Mode

By default, validation allows compatible type conversions within categories:

Declared Type Compatible Actual Types
INT64 INT32, INT64
DOUBLE INT32, INT64, FLOAT, DOUBLE
TIMESTAMP_MICROS TIMESTAMP_MILLIS, TIMESTAMP_MICROS

Cross-category mismatches (e.g., STRING declared but INT64 returned) raise DataTypeMismatchError.

Strict Mode

Enable strict validation per-feature via options:

feature = Feature.int32_of(
    "exact_count",
    options={"strict_type_enforcement": True}
)

In strict mode, only exact type matches or standard widening conversions are allowed.

Execution Plan Grouping

Features with different explicit data types are separated into different execution groups at plan time. This allows type-specific processing paths.

Untyped features (data_type=None) are "lenient" and can be grouped with any typed features, preserving compatibility with index columns and legacy code.

# These will be in DIFFERENT execution groups
Feature.int32_of("amount")
Feature.int64_of("amount")

# This can join ANY group (lenient)
Feature.not_typed("id")

Database Reader Type Awareness

When reading from databases (e.g., SQLite), declared types are used to build the PyArrow schema:

# Declared type is used for schema, not inferred from data
feature = Feature.int64_of(
    "user_id",
    options={"sqlite": "/path/to/db.sqlite"}
)

Error Handling

Type mismatches raise DataTypeMismatchError:

```python title="Error handling example" from mloda.user import mloda from mloda.user import Feature from mloda.provider import DataTypeMismatchError

try: result = mloda.run_all([Feature.str_of("numeric_column")]) except DataTypeMismatchError as e: print(f"Feature '{e.feature_name}': declared {e.declared.name}, got {e.actual.name}") ```