Hello, hope I can have some help here. Issue: The pipeline that is running feature_group.insert(value.df) to the online_enabled = True feature store is not working when start_offline_materialization is set to False. I can’t see any Data in the UI. But the connection is done properly as the same service has created the feature_group.
Thank you in advance.
I am also having the same issue & have created the feature group in exact same way(i.e. online_enabled = True). The data gets uploaded successfully & gets shown in “Data Preview” but “Inspect Data” link shows a blank page. My role is set as “Data Owner”.
I also tried to get the data programmatically using featuregroup.read() call but I get the following error,
Connected. Call .close()
to terminate connection gracefully.
Logged in to project, explore it here https://c.app.hopsworks.ai:443/p/1101975
2024-10-10 11:03:14,131 WARNING: using legacy validation callback
Connected. Call .close()
to terminate connection gracefully.
2024-10-10 11:03:15,857 ERROR: [Errno 2] Opening HDFS file ‘/apps/hive/warehouse/rt_ml_featurestore.db/ohlcv_feature_group_1_1/.hoodie/hoodie.properties’ failed. Detail: [errno 2] No such file or directory. Detail: Python exception: FlyingDuckException. gRPC client debug context: UNKNOWN:Error received from peer ipv4:3.19.160.248:5005 {grpc_message:“[Errno 2] Opening HDFS file '/apps/hive/warehouse/rt_ml_featurestore.db/ohlcv_feature_group_1_1/.hoodie/hoodie.properties' failed. Detail: [errno 2] No such file or directory. Detail: Python exception: FlyingDuckException”, grpc_status:2, created_time:“2024-10-10T11:03:15.855960789-07:00”}. Client context: IOError: Server never sent a data message. Detail: Internal
Please let me know if I am missing anything or if any addiitonal information is needed.
Thanks.
Hej Oscar!
From your screenshot it is hard to judge whether or not the insertion has worked. Something I should clarify regarding the UI and the number of commits.
- Commits are only created by inserts to the OFFLINE feature store. Each commit correspond to the set of operations done by the materialization job on the offline table.
- Whether or not online_enabled is True/False, if start materialization job is false you do not write any data to the offline store, and therefore no commit is created.
- When using online_enabled=True, Hopsworks OnlineFS service takes responsibility for updating the online table as soon as you insert new data. It does not try to insert all the data in the offline store, but rather reads every row coming in and inserts it in the SQL database.
- When selecting the data preview tab of your feature group, you can choose if you want to see a slice of the data from the offline or from the online store. If you have selected online_enabled for your FG and not started the offline materialization, it will not find any data in the offline store. But it will be able to show you data from the online store
Have a good day,
Victor
Hey aejazm,
I think you simply need to set online=True to be able to read your data from the online store. It is an expensive operations to return all rows in the table, so it is preferable to create a feature view and use get_feature_vector(s) to get values only for the primary keys you are interested in.
Have a good day!