Loading Data from python dataframe into feature store

Arumugaguru_M · March 8, 2021, 6:56am

Hi Team,

We are trying to push a pandas dataframe from our local machine to the Feature Store in hopswork.ai, does it require any spark cluster in the local environment or we just need the hsfs[hive] library in python?

Also, we are able to retrieve the data from the feature but when trying to load the data into feature store ending up with following error “Engine’ object has no attribute 'convert_to_default_dataframe”, if we change the library version it gives some different error

not sure what is going wrong here? but we can able to store the data from csv to feature store inside hopswork.ai with the same steps used in the local jupyter book.

hopswork.ai version 2.0.0
hsfs[hive] library version 2.0.12

Thanks,
Guru

moritzmeister · March 8, 2021, 9:29am

Hi Guru,

unfortunately, data ingestion from arbitrary Python environments wasn’t supported in version 2.0 yet. If you want to ingest data, you will need to either use Spark directly on Hopsworks, or configure your own Spark instance to write to be able to write to the Feature Store.

That said, we greatly improved the support for pure Python in version 2.1, so if you can, I recommend you deploy a new instance with the newer version of Hopsworks on hopsworks.ai, which you should be able to do with the free credits we are currently giving to new users. If you are using the demo instance and don’t want to deploy your own cluster, you will have to be a bit patient, until we upgraded that instance to 2.1.

Sorry for the inconveniences!

Arumugaguru_M · March 11, 2021, 10:37am

Thanks a lot for this valuable information, we were able to upgrade our demo instance to 2.1 and now the save is working fine.