Connect to SQL server via Hopsworks

KNP · May 20, 2024, 9:19pm

Hi Team,

I have started exploring the tool and finished a turorial.

My tech stack is all in Azure and my DWH is SQL server.

I tried the JDBC connector:
First it says JDBC connector not supported for test connection.
I went ahead and created a stporage connection anyway. When I tried to create a simple feature group, it created the group but when I got to data preview section it is throwing error as “Arrow Flight doesn’t support connector of type: JDBC” I am just trying to import 3 columns from a table in my sql server database .

I am trying to create the connection via UI. Is connection to sql server supported ?

If you dont support sql server do you support Azure ADLS gen2 connector?

Victor_Jouffrey · May 27, 2024, 9:47pm

Hej KNP,

Thanks for reaching out and describing your issue. I’ll write first below a couple of bullet point regarding what I am guessing of the process that lead to getting this error message. That way you can correct me if I am misunderstanding your situation. I added after a couple of links to resources that I think will give you a path forward to solve the problem.

I assume that you have:

Created an account on app.hopsworks.ai and used the UI to create your JDBC connector
Created an External Feature Group (FG) using a client machine running python
On creation of this FG you passed the storage connector as well as a string representing the SQL query to be executed against the database
You called the .read method on this FG object

The error is due to the JDBC storage connector not being supported in the python client. As described in the JDBC user guide in the Hopsworks documentation, additional JAR must be provided to connect to external MySQL:

We support reading in data via the JDBC connector only via the pyspark/spark clients.
Unfortunately providing additional configuration for the spark and pyspark jobs is an enterprise feature and is not available on app.hopsworks.ai at the moment. An alternative would be reaching out to our team to setup a demo cluster for you. We can help you setup the connector to make it easy to get started on your project :)!

You can find more info about how to use the storage connector using pyspark and spark in the usage guide we have in the docs : Usage - Hopsworks Documentation

Victor_Jouffrey · May 27, 2024, 9:48pm

To address your last question we do support Azure ADLS gen2 connectors on Azure as described here:

This feature is unfortunately not available on the public app.hopsworks.ai but we can help you setting up on the demo cluster as well. Similarly, reading data will only be available via (Py)Spark.

You can always use pandas to read from your mysql database and insert it in a Feature Group (not an external one). You would get the data in app.hopsworks.ai and be able to try out the rest of the functionalities without relying on external reading from azure ADLS or going through JDBC which is overkill for a small amount of data.

One final possibility I should mention if you are keen on living at the edge is to try out the polars support by installing the feature-store-api directly from our github master branch. Polars supports reading from azure mysql server via the blazing fast connector-x library. You can insert the polars dataframe in a Feature Group as you would with a pandas dataframe. I would love to know if you try it out!

I hope this answer your questions. Please be in touch if you require further assistance or have additional questions!

Best regards,
Victor