Feature Store Time Travel

Hi,

I have a created a feature with time travel HUDI enabled as shown below

store_fg_meta = fs.create_feature_group(name=“profitability_labels_fg”,
version=2,
primary_key=[“date”, “key”],
description=“feature eng by uploading data from resources”,
time_travel_format=“HUDI”,
online_enabled = True,
statistics_config={“enabled”: True,
“histograms”: False,
“correlations”: True
})

and completed the bulk insert by saving the data from source.

If i load the same data again using the insert function rather save(),i was expecting the data should be updated but it is showing error as duplicate

n error was encountered:
An error occurred while calling o358.save.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 61.0 failed 4 times, most recent failure: Lost task 0.3 in stage 61.0 (TID 18181, scbdemo-worker-1.internal.cloudapp.net, executor 3): java.sql.BatchUpdateException: Duplicate entry ‘2014-06-01-10019’ for key ‘profitability_labels_fg_2.PRIMARY’
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at com.mysql.cj.util.Util.handleNewInstance(Util.java:192)
at com.mysql.cj.util.Util.getInstance(Util.java:167)
at com.mysql.cj.util.Util.getInstance(Util.java:174)
at com.mysql.cj.jdbc.exceptions.SQLError.createBatchUpdateException(SQLError.java:224)
at com.mysql.cj.jdbc.ClientPreparedStatement.executeBatchSerially(ClientPreparedStatement.java:853)
at com.mysql.cj.jdbc.ClientPreparedStatement.executeBatchInternal(ClientPreparedStatement.java:435)
at com.mysql.cj.jdbc.StatementImpl.executeBatch(StatementImpl.java:796)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.savePartition(JdbcUtils.scala:672)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$saveTable$1.apply(JdbcUtils.scala:834)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$saveTable$1.apply(JdbcUtils.scala:834)
at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$28.apply(RDD.scala:935)
at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$28.apply(RDD.scala:935)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2101)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2101)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
at org.apache.spark.scheduler.Task.run(Task.scala:121)
at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)

Can anyone please show some light here? Not clear how this occurs in time travel enabled feature group

@Arumugaguru_M, the primary key constraint in this case is enforced by the online feature store. Indeed if you remove the online_enabled option from the create_feature_group method, the ingestion should work fine.

Improvements in this area are coming with the next release (Hopsworks 2.2). My sugestion for the moment is to have 2 feature groups, one online_enabled and the other just HUDI.