I saw a video by Jim Dowling where he states that the managed cloud version of Hopsworks writes Hudi tables to S3 (or a similar Hudi-compatible cloud object store I presume), and the community version writes to local disk. That video is here: Hopsworks Live Coding: Installing Hopsworks Open Source - YouTube
I’m not sure whether that’s just the default, or whether it’s mandatory for some reason. So, my question is: Can a Hopsworks instance be configured to use any Hadoop-compatible storage? And does this answer change if it’s the community, managed, or Kubernetes version?
For example, if I run a big on-prem HDFS cluster for my data already, can I point an instance of Hopsworks to use a subdirectory of that filesystem as its offline store?
Also, if it’s possible to do this, is there anyone out there who’s configured it this way in production? I’d love to hear some stories.