If you are using the PySpark kernel in SageMaker, then you have an EMR cluster running somewhere. (https://docs.aws.amazon.com/sagemaker/latest/dg/nbi-lifecycle-config-emr.html)
If that’s the case, then you have a Spark environment an so you can use the
hops-util-py library (https://pypi.org/project/hops/) to connect to Hopsworks Enterprise and write data into a feature group.
The rationale behind the SageMaker integration is to let data scientists explore the feature store, from their environment. You can explore feature groups and you can also create training dataset. This last operation however, starts a Spark application in Hopsworks as, again, we need a Spark environment to be able to join features together and create a training dataset.
As I said in my previous message, we are working on adding support for writing small datasets also from SageMaker. The reason is not there yet is that our users don’t use SageMaker to do feature engineering, as SageMaker provides just a simple python process. They either use Hopsworks itself, or Databricks (https://hopsworks.readthedocs.io/en/latest/featurestore/integrations/guides/databricks.html) or EMR.