we are testing the integration of Hopsworks FS into Databricks and therefore installed Hopsworks using the cloud installer on a single VM in Azure. We also setup a databricks instance (both same RG and region). Now we are trying to connect from Databricks to Hopsworks FS using the python hops module. The call of “fs.setup_databricks” in a python notebook fails with an error:
SSLError: HTTPSConnectionPool(host=‘10.0.0.4’, port=443): Max retries exceeded with url: /hopsworks-api/api/project/getProjectInfo/Test (Caused by SSLError(SSLError(“bad handshake: Error([(‘SSL routines’, ‘tls_process_server_certificate’, ‘certificate verify failed’)])”)))
the code is
fs.setup_databricks(
'10.0.0.4', # Hopsworks.ai address of your Feature Store instance
'Test', # Name of your Hopsworks Feature Store project
port=443,
secrets_store='local',
api_key_file='/dbfs/fs_apikey.txt', # This should point to a path in you Databricks cluster
hostname_verification=False)
10.0.0.4 is the internal IP of the network adapter associated to the Hopsworks VM.
You should use the address of your Hopsworks.ai cluster instead of the internal ip. The address can be found in the General tab of your cluster in Hopsworks.ai.
Did you run the cell first without hostname_verification and then add it later? If so, we have an issue there that the property is set as global and persisted across API calls. You can try to restart the Databricks cluster.
Additionally in the next release (the one you are trying right now, but the enterprise version) we have drastically simplified Databricks cluster configuration. It’s now a UI based process so that it’s way easier to configure clusters. The documentation for this is coming asap.
We’ll be rolling out the new release on hopsworks.ai in the next week, or you can drop a message to sales@logicalclocks.com if you want to try out and deploy your own.
thanks for the extensive information.
From your writing I understand that we need to go with the enterprise version of hopsworks to use it with Databricks. Is that right? To be specific - we are interested in the Feature Store component and want to try it out. This will not work with Databricks and the free version.
I understand the confusion. As of today you can get Hopsworks in several flavors:
Hopsworks community - It’s the open source version, self-deployed and self-managed. We do have some helper scripts that allow you to deploy Hopsworks. This is what you have installed. It comes with most of the features, except some enterprise oriented ones like integrations with other tools (Databricks, EMR, Cloudera), SSO, integration with K8s for Jupyter notebooks.
Hopsworks enterprise - It provides all the features of the community edition + the additional features I mentioned above. You can get the enterprise edition as licenses by contacting sales or by using hopsworks.ai.
When you use hopsworks.ai, as of today you can: try it out for free for 30 days without connecting any AWS/Azure account and by just using the demo instance we provide (With it you won’t be able to test the Databricks integration as the instance runs on our account). If you connect your AWS/Azure account then you can start using the free version or contact sales and get your hopsworks.ai account upgraded to enterprise.
The differences between free and enterprise on hopsworks.ai are the one listed above. Hopsworks.ai enterprise edition gives you access to yet more cloud-oriented features, like S3/Azure blob storage for storing features and elastic clusters. But, no matter which hopsworks.ai plan you chose, you will be using Hopsworks enterprise.