Training dataset from postgresql on-demand feature group

Krzysztof · December 2, 2021, 12:43pm

Hi, I am creating training datasets with use of JDBC connector to postgresql database with use of on-demand feature groups. When using Jupyter notebook I configure the Spark environment and attach a JAR postgresql JDBC connector file from the /Resources directory:

In case of Jupyter everything goes well and the datasets are created. However if I start to create the same training dataset manually in Feature Store UI (by just picking features and adding them to the basket) a job is being created. The job however fails, I assume that is due to lack of JDBC driver.

Should I modify the Hopsworks environment somehow and attach the driver somewhere to make it work? Ideally would be to execute the code from PyCharm on my local machine, but I guess that is the same environment which lacks postgresql driver.

Kind regards,
Krzysztof

Theo · December 2, 2021, 4:10pm

Hi @Krzysztof

You can add the postgresql jar file as a jar file dependency on the job, similarly to how you did it for Jupyter, and run the job again.

Krzysztof · December 3, 2021, 10:06am

@Theo Thank you for the quick response. I get it, but is it possible to add the driver in some way to the Hops environment? If I understand correctly, each time we create a job, we need to manually modify the job’s environment and attach the jar file.

Theo · December 3, 2021, 10:58am

@Krzysztof For the moment setting the dependency jar file through Hopsworks is not supported but it is a useful improvement which should be released in the near future. Until then, setting the jar file manually is the way to go.

Krzysztof · December 3, 2021, 12:57pm

@Theo - ok, thank you for the explanation!

Kind regards
Krzysztof