Delta Lake Hive integration issue

Hi,
we are trying to use Delta Lake primitive saveAsTable from a Jupyter Notebook with pyspark kernel. When we call the method, we receive the following exception:

An error occurred while calling o223.saveAsTable.
: java.io.IOException: The error typically occurs when the default LogStore implementation, that
is, HDFSLogStore, is used to write into a Delta table on a non-HDFS storage system.
In order to get the transactional ACID guarantees on table updates, you have to use the
correct implementation of LogStore that is appropriate for your storage system.

Is there any option to set or is it an incompatibility with hopsfs / hive?

We noticed also this limitation trying to use Hive ACID table:

metastore.RetryingHMSHandler: MetaException(message:Unable to update transaction database java.sql.SQLSyntaxErrorException: The storage engine for the table doesn’t support SAVEPOINT

We are using Hopsworks 2.3 on prem

Regards

Hi @arosc

Can you please provide more information from which environment are you trying to Delta Lake primitive saveAsTable? Is this Hopsworks itself or are you trying to connect Hopsworks?

If you are using Hopsworks itself then we currently don’t have implicit integration with DeltaLake. But we do support Apache Hudi. When you create new Feature Group it will be by default Hudi time travel enabled and it will give you all ACID guarantees, and in the upcoming release point in time joins as well. You can find examples here hops-examples/notebooks/featurestore/hsfs/time_travel at master · logicalclocks/hops-examples · GitHub

Let me know if this was helpful.

/Davit

Hi @Davit_Bzhalava,
About your first question, I’m trying the primitive in Hopsworks itself.
I have been following the Hudi example and I confirm you that ACID works well.
Thank you for the support,

Regards