Hi,
We are facing a problem with the 2.2 Hopsworks platform when restarting the RonDB master data node.
We have a 3 nodes RonDB cluster:
[ndbd(NDB)] 3 node(s)
id=1 @10.206.197.54 (RonDB-21.04.0, Nodegroup: 0, *)
id=2 @10.206.197.55 (RonDB-21.04.0, Nodegroup: 0)
id=3 @10.206.197.58 (RonDB-21.04.0, Nodegroup: 0)
Since we have a ndb cluster we expect that glassfish and hdfs continue to work also in the case of a node restart. Unfortunally, master RonDB node restart causes the following issues:
Glassfish prints:
Caused by: Exception [EclipseLink-4002] (Eclipse Persistence Services - 2.6.4.qualifier): org.eclipse.persistence.exceptions.DatabaseException
Internal Exception: java.sql.SQLException: Got temporary error 1204 ‘Temporary failure, distribution changed’ from NDBCLUSTER
Error Code: 1297
Call: SELECTTIMERID
,APPLICATIONID
,BLOB
,CONTAINERID
,CREATIONTIMERAW
,INITIALEXPIRATIONRAW
,INTERVALDURATION
,LASTEXPIRATIONRAW
,OWNERID
,PKHASHCODE
,SCHEDULE
,STATE
FROMEJB__TIMER__TBL
WHERE (TIMERID
= ?)
bind => [1 parameter bound]
Query: ReadObjectQuery(name=“readTimerState” referenceClass=TimerState sql=“SELECTTIMERID
,APPLICATIONID
,BLOB
,CONTAINERID
,CREATIONTIMERAW
,INITIALEXPIRATIONRAW
,INTERVALDURATION
,LASTEXPIRATIONRAW
,OWNERID
,PKHASHCODE
,SCHEDULE
,STATE
FROMEJB__TIMER__TBL
WHERE (TIMERID
= ?)”)
Hadoop prints:
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getListing(ClientNamenodeProtocolServerSideTranslatorPB.java:740)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:868)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:814)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1821)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2900)
Caused by: com.mysql.clusterj.ClusterJDatastoreException: Datastore exception. Return code: -1 Error code: 1,204 MySQL code: -1 Status: 1 Classification: 8 Message: unique key hdfs_users
Is there any misconfiguration or maybe a bug?