Benefits of on prem vs managed install

rdedhia · September 29, 2022, 3:51pm

Hi, I am considering Hopsworks as a feature store solution for a ML pipeline that currently lives in GCP.

I’ve read through the installation guides for GCP and on premise installs, and was wondering if there’s a succinct summary on the benefits of the two. My understanding is that the managed installation would require less maintenance, but are there any benefits from a data privacy/security perspective (if the ML pipeline contains very sensitive data) with an on premise install?

Steffen · September 30, 2022, 8:01am

The managed version’s main features are cluster management (creation, starting, stopping and termination and resizing of clusters), compute autoscaling, backup/restore, upgrades and organization management via the web ui or its REST APIs.

The data is in any case stored and processed entirely inside your own cloud account and we don’t gain access to it. To lock up the cluster further, you can disable our user management and manage cluster access entirely yourself: Cluster Creation - Hopsworks Documentation. You can also limit the permissions as much as possible but it doesn’t change the access to the data: Limiting Permissions - Hopsworks Documentation.

In contrast to on-premise, the managed version is reporting usage statistics back to us which are used for cluster sizing and billing. This requires outgoing network connectivity to our API. If required, outgoing traffic can be restricted to a specific endpoint.

rdedhia · September 30, 2022, 4:48pm

Thanks for the quick response! If we only want to use the standalone feature store, are there are downsides of using the managed installation?

Steffen · October 3, 2022, 7:03am

No, I would only recommend on-premise if you don’t have the choice to use the managed version.