With the increase of the visits, the Model Serving of Hopsworks cluster will have to face more and more pressure.
By reading the document, I know that the Model Serving of Hopsworks is based on the TensorFlow Serving and the SkLearn Model Serving. But I didn’t find the relevant documents on how to configure the Model Serving to respond to this challenge. I noticed that the Kubernetes was mentioned in the official documents, but I didn’t find that any related services had been started on the local servers.This is an issue for auto-scaling of server resources, actually. Is it feasible to solve this issue by adding the worker nodes?
Does anyone have relevant experience? Any suggestions will be much appreciated.