Creating a Custom Feature Group
AI-Friendly Design: Feature groups are small, template-like structures - easy to test, easy to debug, and easy for AI to generate. The declarative pattern makes them ideal for AI-assisted development.
In this example, we'll create a custom feature group that multiplies the results of each feature by 2. We'll implement a new feature group and then use it within mloda.
1. Import the Required Modules and Set File References
Start by importing the necessary modules to define the custom feature group and perform calculations:
import pyarrow.compute as pc
import pyarrow as pa
from mloda.provider import FeatureGroup
from mloda.user import DataAccessCollection
file_path = "tests/test_plugins/feature_group/src/dataset/creditcard_2023_short.csv"
data_access_collection = DataAccessCollection(files={file_path})
feature_list = ["id","V1","V2","V3"]
example_feature_list = [f"Example_{f}" for f in feature_list]
2. Define the Feature Group
The custom feature group, Example, operates on a set of input features. It depends on the root features (e.g., "id", "V1", etc.) and renames them with the prefix "Example_".
The calculation logic for multiplying each feature by 2 is implemented in the calculate_feature function.
class Example(FeatureGroup):
def input_features(self, _, feature_name):
return {feature_name.name.split("_")[1]}
@classmethod
def calculate_feature(cls, data, _):
multiplied_columns = [pc.multiply(data[column], 2) for column in data.column_names]
col_names = [f"{cls.get_class_name()}_{col_names}" for col_names in data.column_names]
multiplied_table = pa.table(multiplied_columns, names=col_names)
return multiplied_table
3. Execute the Request Using the New Feature Group
To use the newly defined feature group, simply add the "Example_" prefix to each feature name. mloda will automatically resolve the dependency between the CsvReader and the Example feature group.
from mloda.user import mloda
result = mloda.run_all(
example_feature_list,
compute_frameworks=["PyArrowTable"],
data_access_collection=data_access_collection
)
result[0]
Expected output:
pyarrow.Table
Example_V28: double
Example_id: int64
...
Example_V28: [[-0.26000604758867731,-0.26311827417649086,-0....]]
Example_id: [[0,2,4,...]]
....
4. Summary
In this example, we implemented a custom feature group, Example, that multiplies each feature value by 2. By defining a straightforward input_features method and a calculate_feature method, we were able to extend mloda's feature engineering capabilities with custom transformations. We then executed the request by simply modifying the feature names with a prefix ("Example_"), allowing mloda to handle dependencies and computations automatically.
5. Advanced Feature Group Topics
For more in-depth information about feature groups, check out these advanced topics:
- Feature Chain Parser - How feature groups work with chained feature names
- Feature Group Matching - How the system determines which feature group handles a feature
- Feature Group Testing - Best practices for testing feature groups
- Feature Group Versioning - How versioning works in feature groups
- Compute Framework Integration - How feature groups integrate with compute frameworks
- Multiple Result Columns - Working with multi-column features using automatic discovery utilities
- Data Type Enforcement - Runtime validation of declared feature data types
6. Discovering Feature Groups
To list all available feature groups and their documentation, use the get_feature_group_docs() function from mloda.steward.