Create and Manage by own Feature Store

I have a EMR cluster. I want to set up feature store for my ETL/ML Pipelines. Can I use Hopsworks in this case ? I don’t want to pay anything apart from my EMR costing. If I manage it myself does Hospworks have any free support for it? I am still confused about your Community Edition vs Enterprise Edition. What is free to use for public as out of box approach and what is not? Please suggest.

Hi Kamal,

The native integration with external Hadoop clusters is not in the Community Edition which means that you will not be able to run Spark jobs on EMR and connect to the Feature Store using the Spark API. What you can do is using the free-tier of Hopsworks.ai which is based on the Enterprise version and is deployed to your own AWS account. It comes with some limitations such as that you can only run a single node and not a cluster and you cannot store the feature data on S3 but aside of that offers the full functionality of Hopsworks Enterprise.

No I am asking this from a client opportunity point of view. Free-Tier will be only for 30 days right? I have to make the solution and deployment not for some limited time. Also what is the community edition? What all features it provides from Feature Store Point of View. You have mentioned in your document about Karamel Installation of EC2 Instances on AWS Cloud. If I do that what functionalities will it provide me ? Is that free or some license fee will be charged?

The free-tier has no time limitation. The 30 days period you are referring to is for the demo access which provides a quick way of experimenting with a cluster hosted by us. In the free-tier, the cluster will be deployed to your own account. Check out our guide: Getting started with Hopsworks.ai.

The Community Edition provides the features listed on our product page except:

  • Feature storage based on S3
  • Native connectivity from external Hadoop/Spark clusters
  • Automatic import of features from external products such as S3 and Redshift
  • Automatic export of training data to S3
  • Tagging of feature groups and training datasets
  • Kubernetes support
  • Support for version control with Git

(This list may not be complete and might change over time)

We will publish a more detailed feature comparison shortly and reference it here.

Yeah, but I was told that going forward, there will be cost in terms of Hopsworks units for managing the platform for us. Also where I can try your community edition?

There are currently no plans to change the status of the free-tier. For advanced features, we offer an enterprise plan.