I’m trying to execute an API call from zeppline notebook: featurestore.get_featuregroup(“teams_features”).head(5)
I get the following exception.
Running sql: use demo_featurestore_msycho21_featurestore against offline feature store
Fail to execute line 1: featurestore.get_featuregroup(“teams_features”).head(5)
Traceback (most recent call last):
File “/usr/hdp/current/spark2-client/python/lib/pyspark.zip/pyspark/sql/utils.py”, line 63, in deco
return f(*a, **kw)
File “/usr/hdp/current/spark2-client/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py”, line 328, in get_return_value
format(target_id, “.”, name), value)
py4j.protocol.Py4JJavaError: An error occurred while calling o93.sql.
: org.apache.spark.sql.catalyst.analysis.NoSuchDatabaseException: Database ‘~~~~~~~’ not found;
are you trying to use Zeppelin with a Python kernel, like SageMaker, or are you using it to start and interact with a Spark application? If it’s the former, than you should follow the same instructions as for the SageMaker integration. However, from the logs it seems that it’s the latter.
API key. If you are running the Spark cluster on EC2 machines with access to Secrets Manager/Paramter store, then you can use them to store it (as with the SageMaker integration). Otherwise you can store it on a file.
The client jars. The feature store uses Hive Metastore to manage the feature groups. You need our client to be able to access it. If you run the setup_databricks() method of the hops-util-py library, you should be able to download a tar.gz file with both our Metastore client and the client to HopsFS.
Here things start to differ from the Databricks guide. You need to add the content of that tar.gz file to your classpath. This depends on what you are using to run Spark.
Certificates. The setup_databricks() method also downloads the keyStore.jks, trustStore.jks and a file called material_passwd. You need to make sure they are on the Spark executors. Again, here the instructions differ from the Databricks documentation, as you probably don’t have dbfs:// available.
Configuration. You need to add some configuration properties to your Spark configuration.
spark.hadoop.fs.hopsfs.impl io.hops.hopsfs.client.HopsFileSystem
spark.hadoop.hops.ipc.server.ssl.enabled true
spark.hadoop.hops.ssl.hostname.verifier ALLOW_ALL
spark.hadoop.hops.rpc.socket.factory.class.default io.hops.hadoop.shaded.org.apache.hadoop.net.HopsSSLSocketFactory
spark.hadoop.client.rpc.ssl.enabled.protocol TLSv1.2
spark.hadoop.hops.ssl.keystores.passwd.name [Path to material_passwd within the Spark executors]
spark.hadoop.hops.ssl.keystore.name [Path to keyStore.jks within the Spark executors]
spark.hadoop.hops.ssl.trustore.name [Path to trustStore.jks within the Spark executors]
spark.sql.hive.metastore.jars [Path to the jar files for the Hive metastore (the ones from the `tar.gz` you downloaded)]
spark.hadoop.hive.metastore.uris thrift://[hopsworks.ai URL]:9083