Hi,
I am trying to use the storage connector from hopsworks demo version - demo project to create online feature group.
I am receiving the following error. Can you please help me on where I am going wrong?
Shouldn’t I use the demo storage connector?
import com.logicalclocks.hsfs._
val connection = HopsworksConnection.builder().build();
val fs = connection.getFeatureStore();
val redshiftConn = fs.getRedshiftConnector(“demo_fs_mpratyus_mpratyus_onlinefeaturestore”)
val telcoOnDmd = (fs.createOnDemandFeatureGroup()
.name(“telco_redshift_scala”)
.version(2)
.query(“select 4”)
.description(“On-demand feature group for telecom customer data”)
.storageConnector(redshiftConn)
.statisticsEnabled(true)
.build())
telcoOnDmd.save()
An error was encountered:
java.lang.ClassCastException: com.logicalclocks.hsfs.StorageConnector$JdbcConnector cannot be cast to com.logicalclocks.hsfs.StorageConnector$RedshiftConnector
at com.logicalclocks.hsfs.FeatureStore.getRedshiftConnector(FeatureStore.java:177)
… 53 elided
Documentation followed :On-demand (External) Feature Group - Hopsworks Documentation
Thanks,
Pratyusha
1 Like
Hi Pratyusha,
let me give you some background knowledge before going into detail with your error
So OnDemandFeatureGroups are in essence like an external table to the feature store. So one use case for this would be that you have some legacy processes engineering features and saving them for example in a Redshift table and you don’t want to move that data to Hopsworks Feature Store itself, but have the metadata in the feature store and only pull in the actual data when you need it (e.g. to create training datasets). For that you would create a redshift connector in Hopsworks pointing to your Redshift cluster/table that you then use to create the group.
There are some disadvantages to On Demand Feature Groups even though they work like normal Feature Groups in most parts. Since the data stays external to the feature store, you cannot serve on demand feature groups online, since there is no update process to recognize when data in the external table has been updated and should also be updated in the online storage of the feature store. Additionally, and with the same argument, you cannot use data validation for on demand feature groups.
To avoid this, you would need to create a regular feature group and have a job which periodically reads from the On Demand feature group and write the data to the online enabled “regular” feature group in Hopsworks, thereby making the data available for online serving.
To create a regular feature group with online serving, you simply set the “online_enabled” flag to True
.
Now coming to your particular issue. The online storage within Hopsworks is a JDBC compatible database, so by default we create a storage connector for that, which is the one you tried to use. This storage connector can be used if you want to access the online storage connector directly through a MySQL interface without having to use our HSFS developer SDK. You do not need to use it to create an online feature group, it is only meant for reading from the online storage.
Let me know if this answers your questions.