No attribute 'create_featuregroup' on Sagemaker Jupyter Notebook

Kamal_Keswani · June 19, 2020, 12:36pm

I have connected my hopswork demo instance to sagemaker using API key.
But when I try to create a future group using API functions it is saying

module ‘hops.featurestore’ has no attribute ‘create_featuregroup’.

Attaching ss of the same for reference.

Fabio · June 19, 2020, 9:29pm

Hi @Kamal_Keswani,

We currently don’t support creating feature groups from SageMaker. Currently you can write data into a feature group only from a Spark environment using the hops-util library.

We are working on adding the possibility to create and write feature groups containing small amounts of data to the feature store also from SageMaker, but it’s not there yet.

You can see the documentation for the hopsworks-cloud-sdk library here: http://hopsworks-cloud-sdk.logicalclocks.com/hops.html#module-hops.featurestore

–
Fabio

Kamal_Keswani · June 19, 2020, 10:22pm

Even if I use pyspark kernel on Sagemaker Notebook instance?
So what exactly does the integration of Sagemaker with feature store provides? If I take raw data from S3 on my Sagemaker notebook instance and then want to create feature group from that , I can’t?

Fabio · June 21, 2020, 12:49pm

If you are using the PySpark kernel in SageMaker, then you have an EMR cluster running somewhere. (https://docs.aws.amazon.com/sagemaker/latest/dg/nbi-lifecycle-config-emr.html)
If that’s the case, then you have a Spark environment an so you can use the hops-util-py library (https://pypi.org/project/hops/) to connect to Hopsworks Enterprise and write data into a feature group.

The rationale behind the SageMaker integration is to let data scientists explore the feature store, from their environment. You can explore feature groups and you can also create training dataset. This last operation however, starts a Spark application in Hopsworks as, again, we need a Spark environment to be able to join features together and create a training dataset.

As I said in my previous message, we are working on adding support for writing small datasets also from SageMaker. The reason is not there yet is that our users don’t use SageMaker to do feature engineering, as SageMaker provides just a simple python process. They either use Hopsworks itself, or Databricks (https://hopsworks.readthedocs.io/en/latest/featurestore/integrations/guides/databricks.html) or EMR.

Kamal_Keswani · June 21, 2020, 1:28pm

Got it. Makes sense. Thank you @Fabio