I have a Hopswork 1.3.0-SNAPSHOT with project with feature store up and running.
As I understand, the dataframe pass into insert_into_featuregroup(df, featuregroup_name) will be converted to Spark dataframe, then inserted into Feature Store, so I should connect to remote Spark single node cluster with Hive support enabled.
So from a local machine, I’m trying to do the following steps:
import hops.featurestore as fs
from pyspark.sql import SparkSession
fs.connect(host=, project_name=“test”, api_key_file="/path/to/generated/api.key", secrets_store= “local”, hostname_verification=False)
#successfully connect to remote feature store, I’m able to call get_featuregroups() and see the result list
spark_session = SparkSession.builder.master(“spark://< hopswork-instance-ip >:7077”).appName(“test”).enableHiveSupport().getOrCreate()
data = cv2.imread("/path/img.jpg", cv2.IMREAD_GRAYSCALE) # load image as gray scale to have 2d numpy array so it can be convert to spark dataframe later
Above code will failed and it says “Connection refused: < hopswork-instance-ip >:7077”
I’m sure the port is configured correctly.
Do I understand the steps correctly? If not would you help how can I insert a feature to Feature Store from a remote machine?
I check listening port on the Hopswork instance, but no service is listening on port 7077.
What is the port to connect to Spark remotely with SparkSession?