Feature Configuration from JSON

Overview

The load_features_from_config function enables loading feature configurations from JSON strings. This is the primary interface for AI agents and LLMs to request data from mloda - agents generate JSON, mloda executes it.

Use cases:

LLM Tool Functions - LLMs generate JSON feature requests without writing Python code
Feature configurations stored externally (files, databases, APIs)
Dynamic feature definitions at runtime
Configuration-driven pipelines

Basic Usage

from mloda.user import load_features_from_config, mloda

config = '''
[
    "simple_feature",
    {"name": "configured_feature", "options": {"param": "value"}}
]
'''

features = load_features_from_config(config)
result = mloda.run_all(features, compute_frameworks=["PandasDataFrame"])

JSON Format

The configuration must be a JSON array. Each item can be:

1. Simple String

A plain feature name string:

["feature_name"]

2. Feature Object

An object with name and optional configuration:

[
    {
        "name": "feature_name",
        "options": {"key": "value"}
    }
]

3. Mixed Configuration

Combine strings and objects:

[
    "simple_feature",
    {"name": "configured_feature", "in_features": ["source_feature"]}
]

FeatureConfig Fields

Field	Type	Required	Description
`name`	string	Yes	Feature name
`options`	object	No	Simple options dict (cannot be used with group_options/context_options)
`in_features`	array	No	Source feature names for chained features
`group_options`	object	No	Group parameters (affect Feature Group resolution)
`context_options`	object	No	Context parameters (metadata, doesn't affect resolution)
`propagate_context_keys`	array	No	Context keys that propagate to dependent features
`column_index`	integer	No	Index for multi-output features (adds `~N` suffix)

Configuration Approaches

Simple Options

Use options for simple key-value configuration:

[
    {
        "name": "my_feature",
        "options": {
            "window_size": 7,
            "aggregation": "sum"
        }
    }
]

Modern Group/Context Options

For explicit separation of group and context parameters:

[
    {
        "name": "my_feature",
        "group_options": {
            "data_source": "production"
        },
        "context_options": {
            "aggregation_type": "sum"
        }
    }
]

Note: options and group_options/context_options are mutually exclusive.

Worked Example: Window, Rank, and Percentile Features

Row-preserving operations (window aggregation, rank, percentile) cannot be requested by a bare name: the Feature Group only matches when the request also carries the partition/order options its matcher needs. The feature name encodes the operation ({source}__{operation}); the matcher then requires those options to be present. It reads them via options.get (group first, then context), so they resolve from either side, but context_options is the right home.

[
    {
        "name": "steps__sum_window",
        "context_options": {"partition_by": ["subject_id"]}
    },
    {
        "name": "price__last_window",
        "context_options": {"partition_by": ["region"], "order_by": "timestamp"}
    },
    {
        "name": "sales__row_number_ranked",
        "context_options": {"partition_by": ["region"], "order_by": "sales"}
    },
    {
        "name": "sales__p95_percentile",
        "context_options": {"partition_by": ["region"]}
    }
]

Key names per operation (from the registry data_operations packages):

Operation	Name pattern	Required `context_options`
Window aggregation	`{source}__{agg}_window` (`sum`, `avg`, `first`, `last`, ...)	`partition_by` (list); `order_by` (string) is required for order-dependent aggregations like `first`/`last`
Rank	`{source}__{rank_type}_ranked` (`row_number`, `dense_rank`, `ntile_N`, ...)	`partition_by` (list), `order_by` (string)
Percentile	`{source}__p{N}_percentile` (e.g. `p50`, `p95`)	`partition_by` (list)

Use context_options (not group_options) for these: the partition/order are operation parameters, not identity that should split the Feature Group.

Feature Chaining with in_features

Define dependent features using in_features:

[
    {
        "name": "aggregated_sales",
        "in_features": ["raw_sales"],
        "context_options": {
            "aggregation_type": "sum"
        }
    }
]

Multiple source features:

[
    {
        "name": "distance_feature",
        "in_features": ["point_a", "point_b"]
    }
]

Multi-Column Features

Access specific columns from multi-output features using column_index:

[
    {
        "name": "pca_result",
        "column_index": 0
    }
]

This produces a feature named pca_result~0.

Context Propagation

By default, context parameters are local to each feature and do not propagate through feature chains. Use propagate_context_keys to specify which context keys should flow to dependent features:

[
    {
        "name": "my_feature",
        "context_options": {
            "session_id": "abc123",
            "window_function": "sum"
        },
        "propagate_context_keys": ["session_id"]
    }
]

In this example, session_id propagates to any features that depend on my_feature, while window_function stays local.

Complete Example

from mloda.user import load_features_from_config, mloda

config = '''
[
    "customer_id",
    {
        "name": "sales_aggregated",
        "in_features": ["daily_sales"],
        "context_options": {
            "aggregation_type": "sum",
            "window_days": 7
        }
    },
    {
        "name": "encoded_category",
        "in_features": ["category"],
        "column_index": 0
    }
]
'''

features = load_features_from_config(config)

result = mloda.run_all(
    features,
    compute_frameworks=["PandasDataFrame"],
    api_data={"customer_data": {"customer_id": [1, 2, 3]}}
)