"certificate verify failed" when running "fs.setup_databricks" in Azure/Databricks

Hi,

we are testing the integration of Hopsworks FS into Databricks and therefore installed Hopsworks using the cloud installer on a single VM in Azure. We also setup a databricks instance (both same RG and region). Now we are trying to connect from Databricks to Hopsworks FS using the python hops module. The call of “fs.setup_databricks” in a python notebook fails with an error:

SSLError: HTTPSConnectionPool(host=‘10.0.0.4’, port=443): Max retries exceeded with url: /hopsworks-api/api/project/getProjectInfo/Test (Caused by SSLError(SSLError(“bad handshake: Error([(‘SSL routines’, ‘tls_process_server_certificate’, ‘certificate verify failed’)])”)))

the code is

fs.setup_databricks(
   '10.0.0.4',   # Hopsworks.ai address of your Feature Store instance
   'Test',                       # Name of your Hopsworks Feature Store project
   port=443,
   secrets_store='local',
   api_key_file='/dbfs/fs_apikey.txt', # This should point to a path in you Databricks cluster
   hostname_verification=False) 

10.0.0.4 is the internal IP of the network adapter associated to the Hopsworks VM.

Until this we went along the documentation under https://hopsworks.readthedocs.io/en/latest/getting_started/hopsworksai/guides/databricks_quick_start_azure.html# successfully.
VNet peering was successfull showing the peering status as “Connected”. A API key was also created and added to ‘/dbfs/fs_apikey.txt’.

The above statment fails with either “hostname_verification” set to True or False.

Help appreciated.

Thanks and best, Roberto

Hi Roberto,

You should use the address of your Hopsworks.ai cluster instead of the internal ip. The address can be found in the General tab of your cluster in Hopsworks.ai.

Hi Mahmoud,

thanks for the answer.
We dont use the SaaS offering at hopsworks.ai.

We use the dedicated installation described in the documentation here https://hopsworks.readthedocs.io/en/latest/getting_started/installation_guide/platforms/hopsworks-cloud-installer.html to install hopsworks in Azure

Hi @robroe-tsi,

Did you run the cell first without hostname_verification and then add it later? If so, we have an issue there that the property is set as global and persisted across API calls. You can try to restart the Databricks cluster.

However, I was also looking at your other post (Installing Hopsworks using cloud installer always installs master version (2.0.0) instead of wanted 1.4) - it seems that you installed the community edition. Databricks integration won’t work with the community edition, you need the enterprise one.

Additionally in the next release (the one you are trying right now, but the enterprise version) we have drastically simplified Databricks cluster configuration. It’s now a UI based process so that it’s way easier to configure clusters. The documentation for this is coming asap.
We’ll be rolling out the new release on hopsworks.ai in the next week, or you can drop a message to sales@logicalclocks.com if you want to try out and deploy your own.


Fabio

Hi Fabio,

thanks for the extensive information.
From your writing I understand that we need to go with the enterprise version of hopsworks to use it with Databricks. Is that right? To be specific - we are interested in the Feature Store component and want to try it out. This will not work with Databricks and the free version.

I also have to say that you main internet page at https://www.hopsworks.ai/versus states that

  • Use the Feature Store from Sagemaker, Databricks and other ML platforms
  • Support for AWS and Microsoft Azure

are part of the free version.

Best, Roberto

Hi @robroe-tsi,

I understand the confusion. As of today you can get Hopsworks in several flavors:

  • Hopsworks community - It’s the open source version, self-deployed and self-managed. We do have some helper scripts that allow you to deploy Hopsworks. This is what you have installed. It comes with most of the features, except some enterprise oriented ones like integrations with other tools (Databricks, EMR, Cloudera), SSO, integration with K8s for Jupyter notebooks.

  • Hopsworks enterprise - It provides all the features of the community edition + the additional features I mentioned above. You can get the enterprise edition as licenses by contacting sales or by using hopsworks.ai.

When you use hopsworks.ai, as of today you can: try it out for free for 30 days without connecting any AWS/Azure account and by just using the demo instance we provide (With it you won’t be able to test the Databricks integration as the instance runs on our account). If you connect your AWS/Azure account then you can start using the free version or contact sales and get your hopsworks.ai account upgraded to enterprise.
The differences between free and enterprise on hopsworks.ai are the one listed above. Hopsworks.ai enterprise edition gives you access to yet more cloud-oriented features, like S3/Azure blob storage for storing features and elastic clusters. But, no matter which hopsworks.ai plan you chose, you will be using Hopsworks enterprise.


Fabio