How to connect to a web instance of hopsworks.ai from local python environment?
You can connect to the Feature Store by
(1) creating an API key in the “settings” option (top-right-hand corner)
(2) saving the API key in a file on your local computer
(2) call hops.featurestore.connect(…, api_key_file="/path/to/api-key-you-downloaded", secrets_store=“local”)
http://hops-py.logicalclocks.com/hops.html#module-hops.featurestore
I tried this -
featurestore.connect(‘https://a46df480-aa4b-11ea-a29d-291324b872b4.aws.hopsworks.ai/’, ‘demo_featurestore_amrites1’,
port = 443, secrets_store=‘local’,
region_name=DEFAULT_REGION,
api_key_file=os.path.abspath(‘apikey.txt’),
hostname_verification=False)
I am getting connection errors
Ensure you opened up the feature store to the internet: https://hopsworks.readthedocs.io/en/latest/_images/open-ports.png
Where in the web version is this settings?
It’s part of the cluster in the dashboard. Or are you using the 30 days demo? In that case the instance is handled by us and the ports are automatically made available.
Could you post the connection errors you are getting?
Traceback (most recent call last):
File “/home/subexgpu5/anaconda3/lib/python3.7/site-packages/urllib3/connection.py”, line 160, in _new_conn
(self._dns_host, self.port), self.timeout, **extra_kw
File “/home/subexgpu5/anaconda3/lib/python3.7/site-packages/urllib3/util/connection.py”, line 61, in create_connection
for res in socket.getaddrinfo(host, port, family, socket.SOCK_STREAM):
File “/home/subexgpu5/anaconda3/lib/python3.7/socket.py”, line 748, in getaddrinfo
for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
socket.gaierror: [Errno -2] Name or service not known
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File “/home/subexgpu5/anaconda3/lib/python3.7/site-packages/urllib3/connectionpool.py”, line 677, in urlopen
chunked=chunked,
File “/home/subexgpu5/anaconda3/lib/python3.7/site-packages/urllib3/connectionpool.py”, line 381, in _make_request
self._validate_conn(conn)
File “/home/subexgpu5/anaconda3/lib/python3.7/site-packages/urllib3/connectionpool.py”, line 978, in _validate_conn
conn.connect()
File “/home/subexgpu5/anaconda3/lib/python3.7/site-packages/urllib3/connection.py”, line 309, in connect
conn = self._new_conn()
File “/home/subexgpu5/anaconda3/lib/python3.7/site-packages/urllib3/connection.py”, line 172, in _new_conn
self, “Failed to establish a new connection: %s” % e
urllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPSConnection object at 0x7fdef02d9e10>: Failed to establish a new connection: [Errno -2] Name or service not known
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File “/home/subexgpu5/anaconda3/lib/python3.7/site-packages/requests/adapters.py”, line 449, in send
timeout=timeout
File “/home/subexgpu5/anaconda3/lib/python3.7/site-packages/urllib3/connectionpool.py”, line 727, in urlopen
method, url, error=e, _pool=self, _stacktrace=sys.exc_info()[2]
File “/home/subexgpu5/anaconda3/lib/python3.7/site-packages/urllib3/util/retry.py”, line 439, in increment
raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host=‘https’, port=443): Max retries exceeded with url: //a46df480-aa4b-11ea-a29d-291324b872b4.aws.hopsworks.ai/:443/hopsworks-api/api/project/getProjectInfo/demo_featurestore_amrites1 (Caused by NewConnectionError(’<urllib3.connection.HTTPSConnection object at 0x7fdef02d9e10>: Failed to establish a new connection: [Errno -2] Name or service not known’))
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File “”, line 5, in
File “/home/subexgpu5/anaconda3/lib/python3.7/site-packages/hops/featurestore.py”, line 1910, in connect
project_info = project.get_project_info(project_name)
File “/home/subexgpu5/anaconda3/lib/python3.7/site-packages/hops/project.py”, line 87, in get_project_info
project_name)
File “/home/subexgpu5/anaconda3/lib/python3.7/site-packages/hops/util.py”, line 60, in http
response = send_request(method, resource_url, headers=headers, data=data)
File “/home/subexgpu5/anaconda3/lib/python3.7/site-packages/hops/util.py”, line 198, in send_request
response = session.send(prepped, verify=verify, stream=stream)
File “/home/subexgpu5/anaconda3/lib/python3.7/site-packages/requests/sessions.py”, line 643, in send
r = adapter.send(request, **kwargs)
File “/home/subexgpu5/anaconda3/lib/python3.7/site-packages/requests/adapters.py”, line 516, in send
raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: HTTPSConnectionPool(host=‘https’, port=443): Max retries exceeded with url: //a46df480-aa4b-11ea-a29d-291324b872b4.aws.hopsworks.ai/:443/hopsworks-api/api/project/getProjectInfo/demo_featurestore_amrites1 (Caused by NewConnectionError(’<urllib3.connection.HTTPSConnection object at 0x7fdef02d9e10>: Failed to establish a new connection: [Errno -2] Name or service not known’))
Instead connecting to https://a46df480-aa4b-11ea-a29d-291324b872b4.aws.hopsworks.ai/, try using a46df480-aa4b-11ea-a29d-291324b872b4.aws.hopsworks.ai.
got new error-
File “/home/subexgpu5/anaconda3/lib/python3.7/pathlib.py”, line 1258, in mkdir
self._accessor.mkdir(self, mode)
FileNotFoundError: [Errno 2] No such file or directory: ‘/dbfs/hops’
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File “”, line 5, in
File “/home/subexgpu5/anaconda3/lib/python3.7/site-packages/hops/featurestore.py”, line 1914, in connect
Path(dbfs_folder).mkdir(parents=True, exist_ok=True)
File “/home/subexgpu5/anaconda3/lib/python3.7/pathlib.py”, line 1262, in mkdir
self.parent.mkdir(parents=True, exist_ok=True)
File “/home/subexgpu5/anaconda3/lib/python3.7/pathlib.py”, line 1258, in mkdir
self._accessor.mkdir(self, mode)
PermissionError: [Errno 13] Permission denied: ‘/dbfs’
I believe you might be using the wrong library. Are you using hops or hopsworks-cloud-sdk? The hops library is for a Spark environment, see https://hopsworks.readthedocs.io/en/latest/featurestore/integrations/guides/custom.html for how to use hopsworks-cloud-sdk.
Ensure to remove the hops library before installing hopsworks-cloud-sdk.
from the new module-
AttributeError: module ‘hops’ has no attribute ‘featurestore’
Did you import and run connect like this?
import hops.featurestore as fs
fs.connect(
‘my_instance’, # DNS of your Feature Store instance
‘my_project’, # Name of your Hopsworks Feature Store project
api_key_file=‘featurestore.key’, # The file with the api key
secrets_store = ‘local’)
yes, with the cloud sdk - I get-
Traceback (most recent call last):
File “”, line 1, in
ModuleNotFoundError: No module named ‘hops.featurestore’
This looks like you installed hopsworks-cloud-sdk before uninstalling hops.
Run these commands in order and try again:
pip uninstall hops
pip uninstall hopsworks-cloud-sdk
pip install hopsworks-cloud-sdk
that worked-
Now can’t get feature groups-
Could not connect to any of [(‘18.191.4.26’, 9085)]
SQL string for the query created successfully
Running sql: SELECT * FROM teams_features_1 against the offline feature store
Could not connect to any of [(‘18.191.4.26’, 9085)]
Traceback (most recent call last):
File “/home/subexgpu5/anaconda3/lib/python3.7/site-packages/pyhive/hive.py”, line 213, in init
self._transport.open()
File “/home/subexgpu5/anaconda3/lib/python3.7/site-packages/thrift/transport/TTransport.py”, line 155, in open
return self.__trans.open()
File “/home/subexgpu5/anaconda3/lib/python3.7/site-packages/thrift/transport/TSSLSocket.py”, line 301, in open
super(TSSLSocket, self).open()
File “/home/subexgpu5/anaconda3/lib/python3.7/site-packages/thrift/transport/TSocket.py”, line 122, in open
raise TTransportException(type=TTransportException.NOT_OPEN, message=msg)
thrift.transport.TTransport.TTransportException: Could not connect to any of [(‘18.191.4.26’, 9085)]
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File “/home/subexgpu5/anaconda3/lib/python3.7/site-packages/hops/featurestore.py”, line 173, in get_featuregroup
online=online)
File “/home/subexgpu5/anaconda3/lib/python3.7/site-packages/hops/featurestore_impl/core.py”, line 303, in _do_get_featuregroup
return _do_get_cached_featuregroup(featuregroup_name, featurestore, featuregroup_version, online)
File “/home/subexgpu5/anaconda3/lib/python3.7/site-packages/hops/featurestore_impl/core.py”, line 336, in _do_get_cached_featuregroup
logical_query_plan.sql_str, featurestore=featurestore, online=online)
File “/home/subexgpu5/anaconda3/lib/python3.7/site-packages/hops/featurestore_impl/core.py”, line 196, in _run_and_log_sql
hive_conn = util._create_hive_connection(featurestore)
File “/home/subexgpu5/anaconda3/lib/python3.7/site-packages/hops/util.py”, line 189, in _create_hive_connection
keystore_password=os.environ[constants.ENV_VARIABLES.CERT_KEY_ENV_VAR])
File “/home/subexgpu5/anaconda3/lib/python3.7/site-packages/pyhive/hive.py”, line 247, in init
self._transport.close()
File “/home/subexgpu5/anaconda3/lib/python3.7/site-packages/thrift/transport/TTransport.py”, line 158, in close
return self.__trans.close()
File “/home/subexgpu5/anaconda3/lib/python3.7/site-packages/thrift/transport/TSSLSocket.py”, line 271, in close
self.handle.settimeout(0.001)
AttributeError: ‘NoneType’ object has no attribute ‘settimeout’
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File “/home/subexgpu5/anaconda3/lib/python3.7/site-packages/pyhive/hive.py”, line 213, in init
self._transport.open()
File “/home/subexgpu5/anaconda3/lib/python3.7/site-packages/thrift/transport/TTransport.py”, line 155, in open
return self.__trans.open()
File “/home/subexgpu5/anaconda3/lib/python3.7/site-packages/thrift/transport/TSSLSocket.py”, line 301, in open
super(TSSLSocket, self).open()
File “/home/subexgpu5/anaconda3/lib/python3.7/site-packages/thrift/transport/TSocket.py”, line 122, in open
raise TTransportException(type=TTransportException.NOT_OPEN, message=msg)
thrift.transport.TTransport.TTransportException: Could not connect to any of [(‘18.191.4.26’, 9085)]
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File “”, line 1, in
File “/home/subexgpu5/anaconda3/lib/python3.7/site-packages/hops/featurestore.py”, line 178, in get_featuregroup
online=online)
File “/home/subexgpu5/anaconda3/lib/python3.7/site-packages/hops/featurestore_impl/core.py”, line 303, in _do_get_featuregroup
return _do_get_cached_featuregroup(featuregroup_name, featurestore, featuregroup_version, online)
File “/home/subexgpu5/anaconda3/lib/python3.7/site-packages/hops/featurestore_impl/core.py”, line 336, in _do_get_cached_featuregroup
logical_query_plan.sql_str, featurestore=featurestore, online=online)
File “/home/subexgpu5/anaconda3/lib/python3.7/site-packages/hops/featurestore_impl/core.py”, line 196, in _run_and_log_sql
hive_conn = util._create_hive_connection(featurestore)
File “/home/subexgpu5/anaconda3/lib/python3.7/site-packages/hops/util.py”, line 189, in _create_hive_connection
keystore_password=os.environ[constants.ENV_VARIABLES.CERT_KEY_ENV_VAR])
File “/home/subexgpu5/anaconda3/lib/python3.7/site-packages/pyhive/hive.py”, line 247, in init
self._transport.close()
File “/home/subexgpu5/anaconda3/lib/python3.7/site-packages/thrift/transport/TTransport.py”, line 158, in close
return self.__trans.close()
File “/home/subexgpu5/anaconda3/lib/python3.7/site-packages/thrift/transport/TSSLSocket.py”, line 271, in close
self.handle.settimeout(0.001)
AttributeError: ‘NoneType’ object has no attribute ‘settimeout’
@amritsh seems like you can’t connect correctly to Hive, which is the backend service that manages features and feature groups.
I’m not able to reproduce the error. I’m connecting to the same instance so I’d exclude a configuration issue on the instance side.
This is the snippet of code I’m running:
>>> from hops import featurestore
>>> featurestore.connect('a46df480-aa4b-11ea-a29d-291324b872b4.aws.hopsworks.ai', 'demo_featurestore_fabio001', secrets_store='local', api_key_file='api.key')
>>> featurestore.get_featuregroup('teams_features')
SQL string for the query created successfully
team_budget team_id team_position
0 12957.07600 1 1
1 2403.37040 2 2
2 3390.37550 3 3
3 13547.42900 4 4
4 9678.33300 5 5
5 7307.94000 6 6
6 9469.99100 7 7
7 2248.77600 8 8
8 12474.41900 9 9
9 16107.08000 10 10
10 4888.23240 11 11
11 6101.97200 12 12
12 21319.53300 13 13
13 11698.13900 14 14
14 7683.72270 15 15
15 7326.09200 16 16
16 1621.19360 17 17
17 10477.92900 18 18
18 13022.44100 19 19
19 3555.23500 20 20
20 12494.65600 21 21
21 12433.23800 22 22
22 10290.32300 23 23
23 760.87290 24 24
24 930.39740 25 25
Where api.key
is the file containing my api key.
Are you running behind corporate firewall or something which might block the connection to port 9085?
–
Fabio
disabling firewall worked, however I am not able to use - “insert_into_featuregroup”
hopsworks-cloud-sdk is read only as writing depends on Spark. We recommend that you schedule a job in hopsworks that imports your data from an external storage. You can have a look at connectors: https://hopsworks.readthedocs.io/en/latest/featurestore/guides/featurestore.html#configuring-storage-connectors-for-the-feature-store
Could not visit this site: http://hops-py.logicalclocks.com/hops.html#module-hops.featurestore
Is the site still running?
What is the format of this file? JSON or txt? Can it be an .env file too?
Does this method return the same object as hopsworks.login()?