Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to load historical features directly into spark dataframe #71

Open
VincentPe opened this issue Sep 26, 2022 · 0 comments
Open

How to load historical features directly into spark dataframe #71

VincentPe opened this issue Sep 26, 2022 · 0 comments

Comments

@VincentPe
Copy link

We have been using Feast with a SQL db as an offline store and used JDBC to append features from a Spark dataframe directly to a table in SQL. Now for a recommender we'd like to build a historical dataset to train models on which will use a couple hundred-millions rows. Each is a customer with a timestamp. Feast's get_historical_features only takes a pandas dataframe as entity or a SQL query, so a workaround has been to store the entity df in the SQL db and use the query to fetch the features like so:

sql_job = fs.get_historical_features(
    entity_df="SELECT * FROM test_entitity_df",
    features=[
        'feature_view1:feature1',
        'feature_view1:feature2',
    ]
)

However, the sql_job only has to_df, to_arrow, or persist functionality. My question is, how to load features efficiently into a Spark DF for training? One solution would be to store the result of the Feast query in a sql table and use JDBC again to load that into Spark, however, I cannot seem to get the persist functionality to work as the docs on SavedDatasetStorage is very limited. Please advice.

Resources:
https://docs.feast.dev/reference/offline-stores/overview#functionality
https://docs.feast.dev/getting-started/concepts/dataset#creating-a-saved-dataset-from-historical-retrieval

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant