How to connect to a web instance of hopsworks.ai from local python environment?

How to connect to a web instance of hopsworks.ai from local python environment?

You can connect to the Feature Store by
(1) creating an API key in the “settings” option (top-right-hand corner)
(2) saving the API key in a file on your local computer
(2) call hops.featurestore.connect(…, api_key_file="/path/to/api-key-you-downloaded", secrets_store=“local”)

http://hops-py.logicalclocks.com/hops.html#module-hops.featurestore

I tried this -
featurestore.connect(‘https://a46df480-aa4b-11ea-a29d-291324b872b4.aws.hopsworks.ai/’, ‘demo_featurestore_amrites1’,
port = 443, secrets_store=‘local’,
region_name=DEFAULT_REGION,
api_key_file=os.path.abspath(‘apikey.txt’),
hostname_verification=False)

I am getting connection errors

Ensure you opened up the feature store to the internet: https://hopsworks.readthedocs.io/en/latest/_images/open-ports.png

Where in the web version is this settings?

It’s part of the cluster in the dashboard. Or are you using the 30 days demo? In that case the instance is handled by us and the ports are automatically made available.

Could you post the connection errors you are getting?

Traceback (most recent call last):
File “/home/subexgpu5/anaconda3/lib/python3.7/site-packages/urllib3/connection.py”, line 160, in _new_conn
(self._dns_host, self.port), self.timeout, **extra_kw
File “/home/subexgpu5/anaconda3/lib/python3.7/site-packages/urllib3/util/connection.py”, line 61, in create_connection
for res in socket.getaddrinfo(host, port, family, socket.SOCK_STREAM):
File “/home/subexgpu5/anaconda3/lib/python3.7/socket.py”, line 748, in getaddrinfo
for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
socket.gaierror: [Errno -2] Name or service not known

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File “/home/subexgpu5/anaconda3/lib/python3.7/site-packages/urllib3/connectionpool.py”, line 677, in urlopen
chunked=chunked,
File “/home/subexgpu5/anaconda3/lib/python3.7/site-packages/urllib3/connectionpool.py”, line 381, in _make_request
self._validate_conn(conn)
File “/home/subexgpu5/anaconda3/lib/python3.7/site-packages/urllib3/connectionpool.py”, line 978, in _validate_conn
conn.connect()
File “/home/subexgpu5/anaconda3/lib/python3.7/site-packages/urllib3/connection.py”, line 309, in connect
conn = self._new_conn()
File “/home/subexgpu5/anaconda3/lib/python3.7/site-packages/urllib3/connection.py”, line 172, in _new_conn
self, “Failed to establish a new connection: %s” % e
urllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPSConnection object at 0x7fdef02d9e10>: Failed to establish a new connection: [Errno -2] Name or service not known

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File “/home/subexgpu5/anaconda3/lib/python3.7/site-packages/requests/adapters.py”, line 449, in send
timeout=timeout
File “/home/subexgpu5/anaconda3/lib/python3.7/site-packages/urllib3/connectionpool.py”, line 727, in urlopen
method, url, error=e, _pool=self, _stacktrace=sys.exc_info()[2]
File “/home/subexgpu5/anaconda3/lib/python3.7/site-packages/urllib3/util/retry.py”, line 439, in increment
raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host=‘https’, port=443): Max retries exceeded with url: //a46df480-aa4b-11ea-a29d-291324b872b4.aws.hopsworks.ai/:443/hopsworks-api/api/project/getProjectInfo/demo_featurestore_amrites1 (Caused by NewConnectionError(’<urllib3.connection.HTTPSConnection object at 0x7fdef02d9e10>: Failed to establish a new connection: [Errno -2] Name or service not known’))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File “”, line 5, in
File “/home/subexgpu5/anaconda3/lib/python3.7/site-packages/hops/featurestore.py”, line 1910, in connect
project_info = project.get_project_info(project_name)
File “/home/subexgpu5/anaconda3/lib/python3.7/site-packages/hops/project.py”, line 87, in get_project_info
project_name)
File “/home/subexgpu5/anaconda3/lib/python3.7/site-packages/hops/util.py”, line 60, in http
response = send_request(method, resource_url, headers=headers, data=data)
File “/home/subexgpu5/anaconda3/lib/python3.7/site-packages/hops/util.py”, line 198, in send_request
response = session.send(prepped, verify=verify, stream=stream)
File “/home/subexgpu5/anaconda3/lib/python3.7/site-packages/requests/sessions.py”, line 643, in send
r = adapter.send(request, **kwargs)
File “/home/subexgpu5/anaconda3/lib/python3.7/site-packages/requests/adapters.py”, line 516, in send
raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: HTTPSConnectionPool(host=‘https’, port=443): Max retries exceeded with url: //a46df480-aa4b-11ea-a29d-291324b872b4.aws.hopsworks.ai/:443/hopsworks-api/api/project/getProjectInfo/demo_featurestore_amrites1 (Caused by NewConnectionError(’<urllib3.connection.HTTPSConnection object at 0x7fdef02d9e10>: Failed to establish a new connection: [Errno -2] Name or service not known’))

Instead connecting to https://a46df480-aa4b-11ea-a29d-291324b872b4.aws.hopsworks.ai/, try using a46df480-aa4b-11ea-a29d-291324b872b4.aws.hopsworks.ai.

got new error-
File “/home/subexgpu5/anaconda3/lib/python3.7/pathlib.py”, line 1258, in mkdir
self._accessor.mkdir(self, mode)
FileNotFoundError: [Errno 2] No such file or directory: ‘/dbfs/hops’

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File “”, line 5, in
File “/home/subexgpu5/anaconda3/lib/python3.7/site-packages/hops/featurestore.py”, line 1914, in connect
Path(dbfs_folder).mkdir(parents=True, exist_ok=True)
File “/home/subexgpu5/anaconda3/lib/python3.7/pathlib.py”, line 1262, in mkdir
self.parent.mkdir(parents=True, exist_ok=True)
File “/home/subexgpu5/anaconda3/lib/python3.7/pathlib.py”, line 1258, in mkdir
self._accessor.mkdir(self, mode)
PermissionError: [Errno 13] Permission denied: ‘/dbfs’

I believe you might be using the wrong library. Are you using hops or hopsworks-cloud-sdk? The hops library is for a Spark environment, see https://hopsworks.readthedocs.io/en/latest/featurestore/integrations/guides/custom.html for how to use hopsworks-cloud-sdk.

Ensure to remove the hops library before installing hopsworks-cloud-sdk.

from the new module-
AttributeError: module ‘hops’ has no attribute ‘featurestore’

Did you import and run connect like this?

import hops.featurestore as fs
fs.connect(
‘my_instance’, # DNS of your Feature Store instance
‘my_project’, # Name of your Hopsworks Feature Store project
api_key_file=‘featurestore.key’, # The file with the api key
secrets_store = ‘local’)

yes, with the cloud sdk - I get-
Traceback (most recent call last):
File “”, line 1, in
ModuleNotFoundError: No module named ‘hops.featurestore’

This looks like you installed hopsworks-cloud-sdk before uninstalling hops.

Run these commands in order and try again:
pip uninstall hops
pip uninstall hopsworks-cloud-sdk
pip install hopsworks-cloud-sdk

that worked-
Now can’t get feature groups-
Could not connect to any of [(‘18.191.4.26’, 9085)]
SQL string for the query created successfully
Running sql: SELECT * FROM teams_features_1 against the offline feature store
Could not connect to any of [(‘18.191.4.26’, 9085)]
Traceback (most recent call last):
File “/home/subexgpu5/anaconda3/lib/python3.7/site-packages/pyhive/hive.py”, line 213, in init
self._transport.open()
File “/home/subexgpu5/anaconda3/lib/python3.7/site-packages/thrift/transport/TTransport.py”, line 155, in open
return self.__trans.open()
File “/home/subexgpu5/anaconda3/lib/python3.7/site-packages/thrift/transport/TSSLSocket.py”, line 301, in open
super(TSSLSocket, self).open()
File “/home/subexgpu5/anaconda3/lib/python3.7/site-packages/thrift/transport/TSocket.py”, line 122, in open
raise TTransportException(type=TTransportException.NOT_OPEN, message=msg)
thrift.transport.TTransport.TTransportException: Could not connect to any of [(‘18.191.4.26’, 9085)]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File “/home/subexgpu5/anaconda3/lib/python3.7/site-packages/hops/featurestore.py”, line 173, in get_featuregroup
online=online)
File “/home/subexgpu5/anaconda3/lib/python3.7/site-packages/hops/featurestore_impl/core.py”, line 303, in _do_get_featuregroup
return _do_get_cached_featuregroup(featuregroup_name, featurestore, featuregroup_version, online)
File “/home/subexgpu5/anaconda3/lib/python3.7/site-packages/hops/featurestore_impl/core.py”, line 336, in _do_get_cached_featuregroup
logical_query_plan.sql_str, featurestore=featurestore, online=online)
File “/home/subexgpu5/anaconda3/lib/python3.7/site-packages/hops/featurestore_impl/core.py”, line 196, in _run_and_log_sql
hive_conn = util._create_hive_connection(featurestore)
File “/home/subexgpu5/anaconda3/lib/python3.7/site-packages/hops/util.py”, line 189, in _create_hive_connection
keystore_password=os.environ[constants.ENV_VARIABLES.CERT_KEY_ENV_VAR])
File “/home/subexgpu5/anaconda3/lib/python3.7/site-packages/pyhive/hive.py”, line 247, in init
self._transport.close()
File “/home/subexgpu5/anaconda3/lib/python3.7/site-packages/thrift/transport/TTransport.py”, line 158, in close
return self.__trans.close()
File “/home/subexgpu5/anaconda3/lib/python3.7/site-packages/thrift/transport/TSSLSocket.py”, line 271, in close
self.handle.settimeout(0.001)
AttributeError: ‘NoneType’ object has no attribute ‘settimeout’

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File “/home/subexgpu5/anaconda3/lib/python3.7/site-packages/pyhive/hive.py”, line 213, in init
self._transport.open()
File “/home/subexgpu5/anaconda3/lib/python3.7/site-packages/thrift/transport/TTransport.py”, line 155, in open
return self.__trans.open()
File “/home/subexgpu5/anaconda3/lib/python3.7/site-packages/thrift/transport/TSSLSocket.py”, line 301, in open
super(TSSLSocket, self).open()
File “/home/subexgpu5/anaconda3/lib/python3.7/site-packages/thrift/transport/TSocket.py”, line 122, in open
raise TTransportException(type=TTransportException.NOT_OPEN, message=msg)
thrift.transport.TTransport.TTransportException: Could not connect to any of [(‘18.191.4.26’, 9085)]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File “”, line 1, in
File “/home/subexgpu5/anaconda3/lib/python3.7/site-packages/hops/featurestore.py”, line 178, in get_featuregroup
online=online)
File “/home/subexgpu5/anaconda3/lib/python3.7/site-packages/hops/featurestore_impl/core.py”, line 303, in _do_get_featuregroup
return _do_get_cached_featuregroup(featuregroup_name, featurestore, featuregroup_version, online)
File “/home/subexgpu5/anaconda3/lib/python3.7/site-packages/hops/featurestore_impl/core.py”, line 336, in _do_get_cached_featuregroup
logical_query_plan.sql_str, featurestore=featurestore, online=online)
File “/home/subexgpu5/anaconda3/lib/python3.7/site-packages/hops/featurestore_impl/core.py”, line 196, in _run_and_log_sql
hive_conn = util._create_hive_connection(featurestore)
File “/home/subexgpu5/anaconda3/lib/python3.7/site-packages/hops/util.py”, line 189, in _create_hive_connection
keystore_password=os.environ[constants.ENV_VARIABLES.CERT_KEY_ENV_VAR])
File “/home/subexgpu5/anaconda3/lib/python3.7/site-packages/pyhive/hive.py”, line 247, in init
self._transport.close()
File “/home/subexgpu5/anaconda3/lib/python3.7/site-packages/thrift/transport/TTransport.py”, line 158, in close
return self.__trans.close()
File “/home/subexgpu5/anaconda3/lib/python3.7/site-packages/thrift/transport/TSSLSocket.py”, line 271, in close
self.handle.settimeout(0.001)
AttributeError: ‘NoneType’ object has no attribute ‘settimeout’

@amritsh seems like you can’t connect correctly to Hive, which is the backend service that manages features and feature groups.

I’m not able to reproduce the error. I’m connecting to the same instance so I’d exclude a configuration issue on the instance side.

This is the snippet of code I’m running:

>>> from hops import featurestore
>>> featurestore.connect('a46df480-aa4b-11ea-a29d-291324b872b4.aws.hopsworks.ai', 'demo_featurestore_fabio001', secrets_store='local', api_key_file='api.key')
>>> featurestore.get_featuregroup('teams_features')
SQL string for the query created successfully
    team_budget  team_id  team_position
0   12957.07600        1              1
1    2403.37040        2              2
2    3390.37550        3              3
3   13547.42900        4              4
4    9678.33300        5              5
5    7307.94000        6              6
6    9469.99100        7              7
7    2248.77600        8              8
8   12474.41900        9              9
9   16107.08000       10             10
10   4888.23240       11             11
11   6101.97200       12             12
12  21319.53300       13             13
13  11698.13900       14             14
14   7683.72270       15             15
15   7326.09200       16             16
16   1621.19360       17             17
17  10477.92900       18             18
18  13022.44100       19             19
19   3555.23500       20             20
20  12494.65600       21             21
21  12433.23800       22             22
22  10290.32300       23             23
23    760.87290       24             24
24    930.39740       25             25

Where api.key is the file containing my api key.
Are you running behind corporate firewall or something which might block the connection to port 9085?


Fabio

disabling firewall worked, however I am not able to use - “insert_into_featuregroup”

hopsworks-cloud-sdk is read only as writing depends on Spark. We recommend that you schedule a job in hopsworks that imports your data from an external storage. You can have a look at connectors: https://hopsworks.readthedocs.io/en/latest/featurestore/guides/featurestore.html#configuring-storage-connectors-for-the-feature-store