We currently don’t support creating feature groups from SageMaker. Currently you can write data into a feature group only from a Spark environment using the hops-util library.
We are working on adding the possibility to create and write feature groups containing small amounts of data to the feature store also from SageMaker, but it’s not there yet.
Even if I use pyspark kernel on Sagemaker Notebook instance?
So what exactly does the integration of Sagemaker with feature store provides? If I take raw data from S3 on my Sagemaker notebook instance and then want to create feature group from that , I can’t?
The rationale behind the SageMaker integration is to let data scientists explore the feature store, from their environment. You can explore feature groups and you can also create training dataset. This last operation however, starts a Spark application in Hopsworks as, again, we need a Spark environment to be able to join features together and create a training dataset.
As I said in my previous message, we are working on adding support for writing small datasets also from SageMaker. The reason is not there yet is that our users don’t use SageMaker to do feature engineering, as SageMaker provides just a simple python process. They either use Hopsworks itself, or Databricks (https://hopsworks.readthedocs.io/en/latest/featurestore/integrations/guides/databricks.html) or EMR.