Onpremises-cloudera hadoop

Hi,

I am working on a POC for feature store and was checking in the documentation if we can connect to onpremises- cloudera hadoop.

Can someone confirm it and help me with details.

Hi Pratyusha.
Yes, you can integrate Cloudera with Hopsworks.
In Spark, you can do feature engineering in Cloudera and write your features to Hopsworks.
In Spark, you can also join features together to creating training data or to do batch scoring in your Cloudera cluster.
Details here on how to setup cloudera:
https://docs.hopsworks.ai/feature-store-api/latest/integrations/spark/

Hi @Jim_Dowling ,

Quick question -
Does Hopsworks need a dedicated hadoop env to maintain feature store ? Or can it leverage the existing Cloudera Env both as a datasource and feature store ?
Please share some examples of reading data and features from Cloudera Hive for performing model training.

Thanks much!